[jira] [Updated] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2023-09-21 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HDFS-16540:
-
Fix Version/s: (was: 3.3.5)

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2023-09-21 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767668#comment-17767668
 ] 

Michael Stack commented on HDFS-16540:
--

Let me do the latter [~dmanning] ...  I'll let folks ask for the backport 
before doing it for branch-3.3. Thanks for finding this one.

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2023-09-21 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17767640#comment-17767640
 ] 

Michael Stack commented on HDFS-16540:
--

[~dmanning] You are right. Looks like I messed up the cherry-pick back to 3.3. 
I could open a new issue and retry the backport there?

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>  Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16755) Unit test can fail due to unexpected host resolution

2022-08-31 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598612#comment-17598612
 ] 

Michael Stack commented on HDFS-16755:
--

{quote}Could someone transform HADOOP-18431 to HDFS-*?
{quote}
Done (I think you need to reset 'fix version')

> Unit test can fail due to unexpected host resolution
> 
>
> Key: HDFS-16755
> URL: https://issues.apache.org/jira/browse/HDFS-16755
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running using both Maven Surefire and an IDE results in 
> a test failure.  Switching the name to "bogus.invalid" results in the 
> expected behavior, which depends on an UnknownHostException.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>
> Tests that want to use an unresolvable address may actually resolve in some 
> environments.  Replacing host names like "bogus" with a IETF RFC 2606 domain 
> name avoids the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Moved] (HDFS-16755) Unit test can fail due to unexpected host resolution

2022-08-31 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack moved HADOOP-18431 to HDFS-16755:
---

  Component/s: test
   (was: test)
  Key: HDFS-16755  (was: HADOOP-18431)
 Target Version/s:   (was: 3.4.0, 3.3.9)
Affects Version/s: 3.4.0
   3.3.9
   (was: 3.4.0)
   (was: 3.3.9)
  Project: Hadoop HDFS  (was: Hadoop Common)

> Unit test can fail due to unexpected host resolution
> 
>
> Key: HDFS-16755
> URL: https://issues.apache.org/jira/browse/HDFS-16755
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running using both Maven Surefire and an IDE results in 
> a test failure.  Switching the name to "bogus.invalid" results in the 
> expected behavior, which depends on an UnknownHostException.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>
> Tests that want to use an unresolvable address may actually resolve in some 
> environments.  Replacing host names like "bogus" with a IETF RFC 2606 domain 
> name avoids the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16684) Exclude self from JournalNodeSyncer when using a bind host

2022-08-28 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16684.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to trunk and branch-3.3. Resolving. Thanks for the nice contribution 
[~svaughan] 

> Exclude self from JournalNodeSyncer when using a bind host
> --
>
> Key: HDFS-16684
> URL: https://issues.apache.org/jira/browse/HDFS-16684
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
>Affects Versions: 3.4.0, 3.3.9
> Environment: Running with Java 11 and bind addresses set to 0.0.0.0.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> The JournalNodeSyncer will include the local instance in syncing when using a 
> bind host (e.g. 0.0.0.0).  There is a mechanism that is supposed to exclude 
> the local instance, but it doesn't recognize the meta-address as a local 
> address.
> Running with bind addresses set to 0.0.0.0, the JournalNodeSyncer will log 
> attempts to sync with itself as part of the normal syncing rotation.  For an 
> HA configuration running 3 JournalNodes, the "other" list used by the 
> JournalNodeSyncer will include 3 proxies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-25 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16586.
--
Fix Version/s: 3.4.0
   3.2.4
   3.3.4
 Hadoop Flags: Reviewed
   Resolution: Fixed

Merged to branch-3, branch-3.3, and to branch-3.2. Thank you for the review 
[~hexiaoqiao] 

> Purge FsDatasetAsyncDiskService threadgroup; it causes 
> BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
> exception and exit' 
> -
>
> Key: HDFS-16586
> URL: https://issues.apache.org/jira/browse/HDFS-16586
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.0, 3.2.3
>Reporter: Michael Stack
>Assignee: Michael Stack
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The below failed block finalize is causing a downstreamer's test to fail when 
> it uses hadoop 3.2.3 or 3.3.0+:
> {code:java}
> 2022-05-19T18:21:08,243 INFO  [Command processor] 
> impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
> FinalizedReplica, blk_1073741840_1016, FINALIZED
>   getNumBytes()     = 52
>   getBytesOnDisk()  = 52
>   getVisibleLength()= 52
>   getVolume()       = 
> /Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
>   getBlockURI()     = 
> file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
>  for deletion
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
> (auth:SIMPLE)
> 2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
> top.TopAuditLogger(78): --- logged event for top service: 
> allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
> src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
>   dst=null  perm=null
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1645): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE, replyAck=seqno: 901 reply: SUCCESS 
> downstreamAckTimeNanos: 0 flag: 0
> 2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
> BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE] datanode.BlockReceiver$PacketResponder(1327): 
> PacketResponder: BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, 
> type=LAST_IN_PIPELINE: seqno=-2 waiting for local datanode to finish write.
> 2022-05-19T18:21:08,243 ERROR [Command processor] 
> datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
> encountered fatal exception and exit.
> java.lang.IllegalThreadStateException: null
>   at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
>   at java.lang.Thread.(Thread.java:430) ~[?:?]
>   at java.lang.Thread.(Thread.java:704) ~[?:?]
>   at java.lang.Thread.(Thread.java:525) ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912)
>  ~[?:?]
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
> ~[?:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
>  ~[hadoop-hdfs-3.2.3.jar:?]
>   at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
>  ~[hadoop-hdf

[jira] [Commented] (HDFS-14997) BPServiceActor processes commands from NameNode asynchronously

2022-05-20 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17540321#comment-17540321
 ] 

Michael Stack commented on HDFS-14997:
--

Linking an issue where the change in how we do command processing uncovers an 
old, existing problem. Perhaps it is of interest.

> BPServiceActor processes commands from NameNode asynchronously
> --
>
> Key: HDFS-14997
> URL: https://issues.apache.org/jira/browse/HDFS-14997
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Xiaoqiao He
>Assignee: Xiaoqiao He
>Priority: Major
> Fix For: 3.3.0, 3.2.3, 3.2.4
>
> Attachments: HDFS-14997-branch-3.2.001.patch, HDFS-14997.001.patch, 
> HDFS-14997.002.patch, HDFS-14997.003.patch, HDFS-14997.004.patch, 
> HDFS-14997.005.patch, HDFS-14997.addendum.patch, 
> image-2019-12-26-16-15-44-814.png
>
>
> There are two core functions, report(#sendHeartbeat, #blockReport, 
> #cacheReport) and #processCommand in #BPServiceActor main process flow. If 
> processCommand cost long time it will block send report flow. Meanwhile 
> processCommand could cost long time(over 1000s the worst case I meet) when IO 
> load  of DataNode is very high. Since some IO operations are under 
> #datasetLock, So it has to wait to acquire #datasetLock long time when 
> process some of commands(such as #DNA_INVALIDATE). In such case, #heartbeat 
> will not send to NameNode in-time, and trigger other disasters.
> I propose to improve #processCommand asynchronously and not block 
> #BPServiceActor to send heartbeat back to NameNode when meet high IO load.
> Notes:
> 1. Lifeline could be one effective solution, however some old branches are 
> not support this feature.
> 2. IO operations under #datasetLock is another issue, I think we should solve 
> it at another JIRA.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16586) Purge FsDatasetAsyncDiskService threadgroup; it causes BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal exception and exit'

2022-05-20 Thread Michael Stack (Jira)
Michael Stack created HDFS-16586:


 Summary: Purge FsDatasetAsyncDiskService threadgroup; it causes 
BPServiceActor$CommandProcessingThread IllegalThreadStateException 'fatal 
exception and exit' 
 Key: HDFS-16586
 URL: https://issues.apache.org/jira/browse/HDFS-16586
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.2.3, 3.3.0
Reporter: Michael Stack
Assignee: Michael Stack


The below failed block finalize is causing a downstreamer's test to fail when 
it uses hadoop 3.2.3 or 3.3.0+:
{code:java}
2022-05-19T18:21:08,243 INFO  [Command processor] 
impl.FsDatasetAsyncDiskService(234): Scheduling blk_1073741840_1016 replica 
FinalizedReplica, blk_1073741840_1016, FINALIZED
  getNumBytes()     = 52
  getBytesOnDisk()  = 52
  getVisibleLength()= 52
  getVolume()       = 
/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2
  getBlockURI()     = 
file:/Users/stack/checkouts/hbase.apache.git/hbase-server/target/test-data/d544dd1e-b42d-8fae-aa9a-99e3eb52f61c/cluster_e8660d1b-733a-b023-2e91-dc3f951cf189/dfs/data/data2/current/BP-62743752-127.0.0.1-1653009535881/current/finalized/subdir0/subdir0/blk_1073741840
 for deletion
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
metrics.TopMetrics(134): a metric is reported: cmd: delete user: stack.hfs.0 
(auth:SIMPLE)
2022-05-19T18:21:08,243 DEBUG [IPC Server handler 0 on default port 54774] 
top.TopAuditLogger(78): --- logged event for top service: 
allowed=true ugi=stack.hfs.0 (auth:SIMPLE) ip=/127.0.0.1 cmd=delete  
src=/user/stack/test-data/b8167d53-bcd7-c682-a767-55faaf7f3e96/data/default/t1/4499521075f51d5138fe4f1916daf92d/.tmp
  dst=null  perm=null
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1645): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE, 
replyAck=seqno: 901 reply: SUCCESS downstreamAckTimeNanos: 0 flag: 0
2022-05-19T18:21:08,243 DEBUG [PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE] 
datanode.BlockReceiver$PacketResponder(1327): PacketResponder: 
BP-62743752-127.0.0.1-1653009535881:blk_1073741830_1006, type=LAST_IN_PIPELINE: 
seqno=-2 waiting for local datanode to finish write.
2022-05-19T18:21:08,243 ERROR [Command processor] 
datanode.BPServiceActor$CommandProcessingThread(1276): Command processor 
encountered fatal exception and exit.
java.lang.IllegalThreadStateException: null
  at java.lang.ThreadGroup.addUnstarted(ThreadGroup.java:865) ~[?:?]
  at java.lang.Thread.(Thread.java:430) ~[?:?]
  at java.lang.Thread.(Thread.java:704) ~[?:?]
  at java.lang.Thread.(Thread.java:525) ~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService$1.newThread(FsDatasetAsyncDiskService.java:113)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.(ThreadPoolExecutor.java:623)
 ~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:912) 
~[?:?]
  at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1343) 
~[?:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.execute(FsDatasetAsyncDiskService.java:189)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService.deleteAsync(FsDatasetAsyncDiskService.java:238)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2184)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2103)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:736)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:682)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processCommand(BPServiceActor.java:1318)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.lambda$enqueue$2(BPServiceActor.java:1364)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1291)
 ~[hadoop-hdfs-3.2.3.jar:?]
  at 
org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1274)
 ~[hadoop-hdfs-3.2.3.jar:?]
2022-05-19T18:21:08,243 DEBUG [DataXceiver for client 

[jira] [Resolved] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-05-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack resolved HDFS-16540.
--
Hadoop Flags: Reviewed
  Resolution: Fixed

Merged to branch-3.3. and to trunk.

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-05-15 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HDFS-16540:
-
Fix Version/s: 3.4.0
   3.3.4

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-16540) Data locality is lost when DataNode pod restarts in kubernetes

2022-04-13 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack reassigned HDFS-16540:


Assignee: Huaxiang Sun

> Data locality is lost when DataNode pod restarts in kubernetes 
> ---
>
> Key: HDFS-16540
> URL: https://issues.apache.org/jira/browse/HDFS-16540
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode
>Affects Versions: 3.3.2
>Reporter: Huaxiang Sun
>Assignee: Huaxiang Sun
>Priority: Major
>
> We have HBase RegionServer and Hdfs DataNode running in one pod. When the pod 
> restarts, we found that data locality is lost after we do a major compaction 
> of hbase regions. After some debugging, we found that upon pod restarts, its 
> ip changes. In DatanodeManager, maps like networktopology are updated with 
> the new info. host2DatanodeMap is not updated accordingly. When hdfs client 
> with the new ip tries to find a local DataNode, it fails. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16090) Fine grained locking for datanodeNetworkCounts

2021-06-30 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HDFS-16090:
-
Fix Version/s: 3.3.2
   3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Pushed to branch-3.3+ (It didn't go in clean against branch-3.2). Resolving. 
Thanks for the improvement [~vjasani] . Thanks for reviews [~aajisaka] and 
[~weichiu]

> Fine grained locking for datanodeNetworkCounts
> --
>
> Key: HDFS-16090
> URL: https://issues.apache.org/jira/browse/HDFS-16090
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> While incrementing DataNode network error count, we lock entire LoadingCache 
> in order to increment network count of specific host. We should provide fine 
> grained concurrency for this update because locking entire cache is redundant 
> and could impact performance while incrementing network count for multiple 
> hosts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-07 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969596#comment-16969596
 ] 

Michael Stack commented on HDFS-13613:
--

Thanks [~ndimiduk] and [~inigoiri] for taking a look.

Thanks for your experience disabling hedged reads. Will try that too.

I could add a check for DEBUG but just doing check in this old logging system 
of ours -- last release was more than 5 years ago -- requires our passing 
across a synchronized block. When this was log was spewing in a running 
process, as it will tend to do when HDFS is struggling and all threads are 
waiting on HDFS syncs to return, I changed the log level but then saw access to 
HDFS blocking on the log level-check (thread dumping so rough take only). 
Limiting the number of emissions would require a system to count and it'd have 
to be configurable, and so on. Seems a bit OTT.

I was thinking that if you are interested in thread count for hedged reads, 
you'd study the metrics incremented on the line that follows; it'd give you 
better notion than what this bare log does.

Thanks again for taking a look.








> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: 
> 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch
>
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967028#comment-16967028
 ] 

Michael Stack commented on HDFS-13613:
--

Attached proposed patch. The log is useless. Besides, we keep up a rejected 
execution metric.

In my case no stale datanodes. Just load.

> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: 
> 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch
>
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-04 Thread Michael Stack (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Stack updated HDFS-13613:
-
Attachment: 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch

> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
> Attachments: 
> 0001-HDFS-13613-RegionServer-log-is-flooded-with-Executio.patch
>
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967021#comment-16967021
 ] 

Michael Stack edited comment on HDFS-13613 at 11/4/19 9:40 PM:
---

Just ran into this one.  Thread dump showded loads of threads BLOCKED here:

{code}
"RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 
os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 
waiting for monitor entry  [0x7f3dd21a9000]
  java.lang.Thread.State: BLOCKED (on object monitor)
   at org.apache.log4j.Category.callAppenders(Category.java:204)
   - waiting to lock <0x8080c258> (a 
org.apache.log4j.spi.RootLogger)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904)
{code}

i.e. trying to log the above useless message.

I turned off logging but we still go into the logging system and hit the 
BLOCKED section.

The RS backs up, fills all call queues. Nothing can come in the front door. We 
start to burn all CPUs.

For context, running heavy load. HDFS slows down. HBase spews complaint that 
syncs are costing > 100ms. Then this phenomenon takes off

When load lightens, we seem to be get past the torrent but while it is going 
on, the backed up RS is not allowing access to any of its hosted data.

This was in a hadoop3 deriviative.


was (Author: stack):
Just ran into this one.  Thread dump showded loads of threads BLOCKED here:

{code}
"RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 
os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 
waiting for monitor entry  [0x7f3dd21a9000]
  java.lang.Thread.State: BLOCKED (on object monitor)
   at org.apache.log4j.Category.callAppenders(Category.java:204)
   - waiting to lock <0x8080c258> (a 
org.apache.log4j.spi.RootLogger)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904)
{code}

i.e. trying to log the above useless message.

I turned off logging but we still go into the logging system and hit the 
BLOCKED section.

The RS backs up, fills all call queues. Nothing can come in the front door. We 
start to burn all CPUs.

For context, running heavy load. HDFS slows down. HBase spews complaint that 
syncs are costing > 100ms. Then this phenomenon takes off

When load lightens, we seem to be get past the torrent but while it is going 
on, the backed up RS is not allowing access to any of its hosted data.

> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967021#comment-16967021
 ] 

Michael Stack edited comment on HDFS-13613 at 11/4/19 9:39 PM:
---

Just ran into this one.  Thread dump showded loads of threads BLOCKED here:

{code}
"RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 
os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 
waiting for monitor entry  [0x7f3dd21a9000]
  java.lang.Thread.State: BLOCKED (on object monitor)
   at org.apache.log4j.Category.callAppenders(Category.java:204)
   - waiting to lock <0x8080c258> (a 
org.apache.log4j.spi.RootLogger)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904)
{code}

i.e. trying to log the above useless message.

I turned off logging but we still go into the logging system and hit the 
BLOCKED section.

The RS backs up, fills all call queues. Nothing can come in the front door. We 
start to burn all CPUs.

For context, running heavy load. HDFS slows down. HBase spews complaint that 
syncs are costing > 100ms. Then this phenomenon takes off

When load lightens, we seem to be get past the torrent but while it is going 
on, the backed up RS is not allowing access to any of its hosted data.


was (Author: stack):
Just ran into this one.  Thread dump showded loads of threads BLOCKED here:

{code}
"RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 
os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 
waiting for monitor entry  [0x7f3dd21a9000]
  java.lang.Thread.State: BLOCKED (on object monitor)
   at org.apache.log4j.Category.callAppenders(Category.java:204)
   - waiting to lock <0x8080c258> (a 
org.apache.log4j.spi.RootLogger)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904)
{code}

i.e. trying to log the above useless message.

I turned off logging but we still go into the logging system and hit the 
BLOCKED section.

The RS backs up, fills all call queues. Nothing can come in the front door. We 
start to burn all CPUs.

> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13613) RegionServer log is flooded with "Execution rejected, Executing in current thread"

2019-11-04 Thread Michael Stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967021#comment-16967021
 ] 

Michael Stack commented on HDFS-13613:
--

Just ran into this one.  Thread dump showded loads of threads BLOCKED here:

{code}
"RpcServer.default.FPBQ.Fifo.handler=85,queue=25,port=16020" #137 daemon prio=5 
os_prio=0 cpu=85786.24ms elapsed=157927.35s tid=0x7f3dddad6000 nid=0xf390 
waiting for monitor entry  [0x7f3dd21a9000]
  java.lang.Thread.State: BLOCKED (on object monitor)
   at org.apache.log4j.Category.callAppenders(Category.java:204)
   - waiting to lock <0x8080c258> (a 
org.apache.log4j.spi.RootLogger)
   at org.apache.log4j.Category.forcedLog(Category.java:391)
   at org.apache.log4j.Category.log(Category.java:856)
   at org.slf4j.impl.Log4jLoggerAdapter.info(Log4jLoggerAdapter.java:305)
   at 
org.apache.hadoop.hdfs.DFSClient$2.rejectedExecution(DFSClient.java:2904)
{code}

i.e. trying to log the above useless message.

I turned off logging but we still go into the logging system and hit the 
BLOCKED section.

The RS backs up, fills all call queues. Nothing can come in the front door. We 
start to burn all CPUs.

> RegionServer log is flooded with "Execution rejected, Executing in current 
> thread"
> --
>
> Key: HDFS-13613
> URL: https://issues.apache.org/jira/browse/HDFS-13613
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.4.0
> Environment: CDH 5.13, HBase RegionServer, Kerberized, hedged read
>Reporter: Wei-Chiu Chuang
>Priority: Major
>
> In the log of a HBase RegionServer with hedged read, we saw the following 
> message flooding the log file.
> {noformat}
> 2018-05-19 17:22:55,691 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,692 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,695 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 2018-05-19 17:22:55,696 INFO org.apache.hadoop.hdfs.DFSClient: Execution 
> rejected, Executing in current thread
> 
> {noformat}
> Sometimes the RS spits tens of thousands of lines of this message in a 
> minute. We should do something to stop this message flooding the log file. 
> Also, we should make this message more actionable. Discussed with 
> [~huaxiang], this message can appear if there are stale DataNodes.
> I believe this issue existed since HDFS-5776.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-09 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926301#comment-16926301
 ] 

stack commented on HDFS-14837:
--

One question, is Long.hashCode same as (int)(blockId^(blockId>>>32)) (I've not 
looked..)

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14837) Review of Block.java

2019-09-09 Thread stack (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926214#comment-16926214
 ] 

stack commented on HDFS-14837:
--

+1
nice cleanup

> Review of Block.java
> 
>
> Key: HDFS-14837
> URL: https://issues.apache.org/jira/browse/HDFS-14837
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.2.0
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Minor
> Attachments: HDFS-14837.1.patch
>
>
> The {{Block}} class is such a core class in the project, I just wanted to 
> make sure it was super clean and documentation was correct.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-09 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881339#comment-16881339
 ] 

stack commented on HDFS-14483:
--

Reverted from branch-2 and branch-2.9 and then reapplied to both branches with 
amended commit message. Original was missing the JIRA id

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-09 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14483:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.3
   2.10.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2.9 and branch-2. Thanks for the patch [~leosun08]. Mind 
filling out the release note on what this patch adds? Thanks.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-07 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880012#comment-16880012
 ] 

stack commented on HDFS-14483:
--

[~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in 
the build just before this one for the HDFS-13694 patch. Unrelated then. Let me 
push.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-07 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880012#comment-16880012
 ] 

stack edited comment on HDFS-14483 at 7/8/19 3:52 AM:
--

[~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in 
the build just before this one for the HDFS-13694 patch. Unrelated then. Let me 
push. Will do tomorrow in case someone else wants to comment in meantime.


was (Author: stack):
[~leosun08] Thanks. Looking at history of hdfs builds, I see that it files in 
the build just before this one for the HDFS-13694 patch. Unrelated then. Let me 
push.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-07 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880008#comment-16880008
 ] 

stack commented on HDFS-14483:
--

...and +1 on patch. Lets just figure the story on this last flakey...and then 
I'll commit (unless objection). Thanks [~leosun08]

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-07 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14483:
-
Attachment: HDFS-14585.branch-2.9.v3.patch

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch, HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-07 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16880004#comment-16880004
 ] 

stack commented on HDFS-14483:
--

Thanks for fixing the short circuit unit test [~leosun08]. Seems to have 
worked. As said above..

hadoop.hdfs.web.TestWebHdfsTimeouts
hadoop.hdfs.server.datanode.TestDirectoryScanner

... are for sure flakey. TestJournalNodeRespectsBindHostKeys I'm not so sure. 
Will do a survey of recent test history... Meantime let me get another run in.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-06 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14483:
-
Attachment: HDFS-14585.branch-2.9.v3.patch

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v3.patch, 
> HDFS-14585.branch-2.9.v3.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-03 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14483:
-
Attachment: HDFS-14483.branch-2.9.v2 (2).patch

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2 (2).patch, HDFS-14483.branch-2.9.v2.patch, 
> HDFS-14483.branch-2.9.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877963#comment-16877963
 ] 

stack commented on HDFS-14483:
--

I looked back over recent hdfs qa builds 
https://builds.apache.org/job/PreCommit-HDFS-Build/. I see that

 TestWebHdfsTimeouts
 TestDirectoryScanner
 
... are definetly flakey.

The others I am not so sure. If I go back in build history, I see that they 
fail only w/ this patch in place seemingly (I went back through all builds 
before the first build above.. up here 
https://builds.apache.org/job/PreCommit-HDFS-Build/). Let me retry the patch.



> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14483.branch-2.9.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877402#comment-16877402
 ] 

stack commented on HDFS-14483:
--

Are the test failures related [~leosun08]? Thanks.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14483.branch-2.9.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-02 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14483:
-
Attachment: HDFS-14483.branch-2.9.v2.patch

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2.patch, HDFS-14483.branch-2.9.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-02 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877023#comment-16877023
 ] 

stack commented on HDFS-14483:
--

What about my other comments [~leosun08]? Mind responding to them?

There is no overlap here between the two test runs. Let me try another in 
meantime.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch, 
> HDFS-14483.branch-2.9.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876673#comment-16876673
 ] 

stack commented on HDFS-14483:
--

Retry. All but one failure look like they could be related. Lets see.

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14483) Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14483:
-
Attachment: HDFS-14483.branch-2.9.v1.patch

> Backport HDFS-14111,HDFS-3246 ByteBuffer pread interface to branch-2.9
> --
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch, 
> HDFS-14483.branch-2.9.v1.patch, HDFS-14483.branch-2.9.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-14585.
--
Resolution: Fixed

Reapplied w/ proper commit message. Re-resolving.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HDFS-14585:
--

Reopening. Commit message was missing the JIRA # so revert and reapply with 
fixed commit message.

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-07-01 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14585:
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 2.9.3
   2.10.0
   Status: Resolved  (was: Patch Available)

Pushed to branch-2 and branch-2.9. Thanks for the patch [~leosun08] (and review 
[~jojochuang]). Shout if I mangled this (it has been a while).

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Fix For: 2.10.0, 2.9.3
>
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-06-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875570#comment-16875570
 ] 

stack commented on HDFS-14585:
--

Yes. It passed on second attempt. Flakey. The findbugs has a covering JIRA 
HADOOP-16386 filed by the mighty [~jojochuang].

I'll commit this after Monday unless objection. Thanks [~leosun08].

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-06-28 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-14585:
-
Attachment: HDFS-14585.branch-2.9.v2.patch

> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.9.v2.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9

2019-06-28 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16875047#comment-16875047
 ] 

stack commented on HDFS-14585:
--

bq. Sorry, I don't quite understand thse meaning of this comments. 

I am offering praise on aspects of your work. No response required.

Test failure looks unrelated but let me retry.

Reviewing the patch, v2 looks good to me.







> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch2.9
> -
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.9.v2.patch, HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14483) Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x

2019-06-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874426#comment-16874426
 ] 

stack commented on HDFS-14483:
--

Test failures seem related, no?

nit: Was expecting ByteBufferPositionedReadable to be a sub-interface of 
ByteBufferReadable/PositionedReadable. Probably fine as is but mildly 
'surprising'.

Comparing the BB-based decrypt to the byte[] version, this confuses me:

buf.limit(start + len + Math.min(n - len, localInBuffer.remaining()));

In the byte[] version, its len - n vs n - len figuring how much to decrypt.  I 
see the byte[] decrypt loop is n < length vs the bb decrypt which is len < n... 
so I think its fine but take a look please.

Should these resets be in the finally block just below?

425   buf.position(start + len);
426   buf.limit(limit);

Nice javadoc in ByteBufferPositionedReadable

Appreciate the improvement in testPositionedRead

nit: more permutations in testPositionedReadWithByteBuffer would be 
nice-to-have -- though the testByteBufferPread addition is good. Maybe a 
follow-on? Could test more edge cases... a failed read or a read that does not 
fill the read request amount? The positionedReadCheckWithByteBuffer is nice.

Nice addition of the strncmp check in the test_libhdfs_ops.c file and bulking 
up of the pread tests.

Yeah, skip the re-formatting of unrelated code (especially when adds mistake as 
in '916 method = "open";')... which adds an offset.

Nice comments added to c function names.

Patch looks great.





> Backport HDFS-3246 ByteBuffer pread interface to branch-2.8.x
> -
>
> Key: HDFS-14483
> URL: https://issues.apache.org/jira/browse/HDFS-14483
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Zheng Hu
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14483.branch-2.8.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14585) Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch-2 and branch2.9

2019-06-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874403#comment-16874403
 ] 

stack commented on HDFS-14585:
--

nit: why the change in import order for nonnull and Preconditions? Is it about 
respecting checkstyle import ordering rules?
nit: Best to not make formatting changes that are unrelated to your patch: e.g. 
wrapping exception declaration on blockSeekTo (there are a few of this type of 
change). Formatting bulks up your patch and distract the reviewer.
nit: ByteBuffer bb = ByteBuffer.wrap(buffer, offset, length); is offset.

Some nice cleanup and duplication removal; e.g. using BB to keep account on the 
read buffer (offsets and length), purge of the HDFS-8703 unused EC version of 
actualGetFromOneDataNode, pulling out long targetEnd = targetStart + 
bytesToRead - 1, etc. 

Is this going to be ok: 1249tmp.limit(tmp.position() + len); ? The 
EC version of actualGetFromOneDataNode had a checkReadPortions. Should there be 
a check we don't go over the end of the buffer here?

Why do we drop the below in the patch?

1268  updateReadStatistics(readStatistics, nread, reader);  
1269  dfsClient.updateFileSystemReadStats(  
1270  reader.getNetworkDistance(), nread);

Thanks.






> Backport HDFS-8901 Use ByteBuffer in DFSInputStream#read to branch-2 and 
> branch2.9
> --
>
> Key: HDFS-14585
> URL: https://issues.apache.org/jira/browse/HDFS-14585
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Lisheng Sun
>Assignee: Lisheng Sun
>Priority: Major
> Attachments: HDFS-14585.branch-2.9.v1.patch, 
> HDFS-14585.branch-2.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3246) pRead equivalent for direct read path

2019-03-01 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781902#comment-16781902
 ] 

stack commented on HDFS-3246:
-

This patch looks beautiful going by the Interface. Being able to ask HDFS fill 
ByteBuffers over byte arrays will benefit downstreamers.

> pRead equivalent for direct read path
> -
>
> Key: HDFS-3246
> URL: https://issues.apache.org/jira/browse/HDFS-3246
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client, performance
>Affects Versions: 3.0.0-alpha1
>Reporter: Henry Robinson
>Assignee: Sahil Takiar
>Priority: Major
> Attachments: HDFS-3246.001.patch, HDFS-3246.002.patch, 
> HDFS-3246.003.patch, HDFS-3246.004.patch
>
>
> There is no pread equivalent in ByteBufferReadable. We should consider adding 
> one. It would be relatively easy to implement for the distributed case 
> (certainly compared to HDFS-2834), since DFSInputStream does most of the 
> heavy lifting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled

2018-06-27 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525612#comment-16525612
 ] 

stack commented on HDFS-13702:
--

bq. I do want a trace layer in there; I do want it broader than just HDFS, and 
I do want it to be used from the layers above. 

Me too.

bq. Otherwise: it'll get cut, nobody will replace it, and it'll get lost in 
folklore.

Better this than a dead, disabled lib burning everyone's CPU to no end. Even 
when enabled, as is, it is of little to no value. Trace in hdfs is in need of 
work but it has been suffering neglect since Colin's add.

bq. This is something to talk about at a broader level than a JIRA;

I can start a thread. Suggest this discussion not block this patch? Or, add in 
placeholders/comments for the trace points removed here?

Thanks [~ste...@apache.org]






> HTrace hooks taking 10-15% CPU in DFS client when disabled
> --
>
> Key: HDFS-13702
> URL: https://issues.apache.org/jira/browse/HDFS-13702
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: hdfs-13702.patch, hdfs-13702.patch, hdfs-13702.patch
>
>
> I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate 
> workload even when HTrace is disabled. This is because it stringifies several 
> integers. We should avoid all allocation and stringification when htrace is 
> disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524429#comment-16524429
 ] 

stack commented on HDFS-13643:
--

Yeah, we can take a look at [~daryn] stuff when it shows up.

On the patch, checkstyles?

No need of this since its default  44 compile ?

Otherwise, classes could do w/ a bit of class javadoc situating them (though 
they are @Private audience and its kinda plain what they are about adding basic 
client on netty). Fine in a follow-up.

+1 to commit on branch from me.

We should do a writeup on general approach as entrance for those who might be 
trying to follow-along

Good stuff.


> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13702) HTrace hooks taking 10-15% CPU in DFS client when disabled

2018-06-26 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524156#comment-16524156
 ] 

stack commented on HDFS-13702:
--

bq. what do you think?

I think we need to be able to trace end-to-end where time is being spent.

I think that if htrace is not enabled, it should not add friction.

I think harley-davidson's are awful motorcycles but even they don't deserve the 
abuse they are getting.

I think those numbers you posted for the difference your patch makes  in 
throughput stripping htrace are radical.

Poor htrace has been added to the apache attic. It got no loving.

htace in hdfs got no loving either post inital-commit; it was added and then 
let fester.

I think we should commit this patch, +1, and  then we can file another to 
review how to move forward with tracing in light of recent developments in 
htrace project; i.e. purge all other htrace references, look into alternatives, 
etc.

> HTrace hooks taking 10-15% CPU in DFS client when disabled
> --
>
> Key: HDFS-13702
> URL: https://issues.apache.org/jira/browse/HDFS-13702
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
>Priority: Major
> Attachments: hdfs-13702.patch, hdfs-13702.patch
>
>
> I am seeing DFSClient.newReaderTraceScope take ~15% CPU in a teravalidate 
> workload even when HTrace is disabled. This is because it stringifies several 
> integers. We should avoid all allocation and stringification when htrace is 
> disabled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-06-04 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500387#comment-16500387
 ] 

stack commented on HDFS-13643:
--

[~daryn] Thanks boss. Would like to see it.

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643-v1.patch, HDFS-13643-v2.patch, 
> HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread stack (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-13643:
-
Fix Version/s: HDFS-13572

> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Fix For: HDFS-13572
>
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13643) Implement basic async rpc client

2018-05-31 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497356#comment-16497356
 ] 

stack commented on HDFS-13643:
--

Thanks for the input [~daryn]

bq. It's a nice POC but not implementing security is a non-starter

[~daryn] you mean its a non-starter come merge vote, right? If so, I agree but 
if you are suggesting a 'basic rpc client' needs security to be committed on a 
feature branch, this seems a bit much.

bq. ...a divergent ipc client...

Its async. Its going to diverge, no? We'd like to make a pure async client 
untethered by the creaky synchronous predecessor if thats ok.

Thanks.





> Implement basic async rpc client
> 
>
> Key: HDFS-13643
> URL: https://issues.apache.org/jira/browse/HDFS-13643
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ipc
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
> Attachments: HDFS-13643.patch
>
>
> Implement the basic async rpc client so we can start working on the DFSClient 
> implementation ASAP.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-31 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16497220#comment-16497220
 ] 

stack commented on HDFS-13572:
--

I made a branch named for this issue by cloning trunk ('trunk' is head and 
where 3.2 will be cut from there is no branch-3, a base for 3.0 
releases...).

$ git checkout origin/trunk -b HDFS-13572
$ git push -u origin HDFS-13572

Let me now start a vote up on PMC to get [~Apache9] on this new feature branch.

> [umbrella] Non-blocking HDFS Access for H3
> --
>
> Key: HDFS-13572
> URL: https://issues.apache.org/jira/browse/HDFS-13572
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs async
>Affects Versions: 3.0.0
>Reporter: stack
>Priority: Major
> Attachments: Nonblocking HDFS Access.pdf
>
>
> An umbrella JIRA for supporting non-blocking HDFS access in h3.
> This issue has provenance in the stalled HDFS-9924 but would like to vault 
> over what was going on over there, in particular, focus on an async API for 
> hadoop3+ unencumbered by worries about how to make it work in hadoop2.
> Let me post a WIP design. Would love input/feedback (We make mention of the 
> HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
> thinking of cutting a feature branch if all good after a bit of chat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-30 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495291#comment-16495291
 ] 

stack commented on HDFS-13572:
--

Unless objection, was planning on creating a branch to work on this issue. Will 
file sub-issues here. Was going to get [~Apache9] as committer on the branch.

> [umbrella] Non-blocking HDFS Access for H3
> --
>
> Key: HDFS-13572
> URL: https://issues.apache.org/jira/browse/HDFS-13572
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs async
>Affects Versions: 3.0.0
>Reporter: stack
>Priority: Major
> Attachments: Nonblocking HDFS Access.pdf
>
>
> An umbrella JIRA for supporting non-blocking HDFS access in h3.
> This issue has provenance in the stalled HDFS-9924 but would like to vault 
> over what was going on over there, in particular, focus on an async API for 
> hadoop3+ unencumbered by worries about how to make it work in hadoop2.
> Let me post a WIP design. Would love input/feedback (We make mention of the 
> HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
> thinking of cutting a feature branch if all good after a bit of chat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-13565) [um

2018-05-16 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-13565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HDFS-13565.
--
Resolution: Invalid

Smile [~ebadger]

Yeah, sorry about that lads. Bad wifi. Resolving as invalid.



> [um
> ---
>
> Key: HDFS-13565
> URL: https://issues.apache.org/jira/browse/HDFS-13565
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: stack
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2018-05-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16476812#comment-16476812
 ] 

stack commented on HDFS-9924:
-

I moved the design doc over to a new issue, HDFS-13572, for the new effort 
(hadoop3+ basis).

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Duo Zhang
>Priority: Major
> Attachments: Async-HDFS-Performance-Report.pdf, 
> AsyncHdfs20160510.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13572) [umbrella] Non-blocking HDFS Access for H3

2018-05-15 Thread stack (JIRA)
stack created HDFS-13572:


 Summary: [umbrella] Non-blocking HDFS Access for H3
 Key: HDFS-13572
 URL: https://issues.apache.org/jira/browse/HDFS-13572
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: fs async
Affects Versions: 3.0.0
Reporter: stack


An umbrella JIRA for supporting non-blocking HDFS access in h3.

This issue has provenance in the stalled HDFS-9924 but would like to vault over 
what was going on over there, in particular, focus on an async API for hadoop3+ 
unencumbered by worries about how to make it work in hadoop2.

Let me post a WIP design. Would love input/feedback (We make mention of the 
HADOOP-12910 call for spec but as future work -- hopefully thats ok). Was 
thinking of cutting a feature branch if all good after a bit of chat.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-13565) [um

2018-05-15 Thread stack (JIRA)
stack created HDFS-13565:


 Summary: [um
 Key: HDFS-13565
 URL: https://issues.apache.org/jira/browse/HDFS-13565
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: stack






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2017-11-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16240584#comment-16240584
 ] 

stack commented on HDFS-7240:
-

The posted document needs author, date, and ref to this issue. Can it be made a 
google doc so can comment inline rather than here?

I skipped to the end, "So​ ​why​ ​put​ ​the​ ​Ozone​ ​in​ ​HDFS​ ​and​ ​not​ 
​keep​ ​it​ ​a​ ​separate​ ​project". There is no argument here on why Ozone 
needs to be part of Apache Hadoop. As per [~shv] above, Ozone as separate 
project does not preclude its being brought in instead as a dependency nor does 
it dictate the shape of deploy (Bullet #3 is an aspiration, not an argument).




> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: HDFS Scalability and Ozone.pdf, HDFS-7240.001.patch, 
> HDFS-7240.002.patch, HDFS-7240.003.patch, HDFS-7240.003.patch, 
> HDFS-7240.004.patch, Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12711) deadly hdfs test

2017-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238181#comment-16238181
 ] 

stack commented on HDFS-12711:
--

bq. For now, though, I'm sort of tired at looking at this problem and will go 
work on something else for a while.

Thanks for putting Hadoop in a box.

> deadly hdfs test
> 
>
> Key: HDFS-12711
> URL: https://issues.apache.org/jira/browse/HDFS-12711
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.9.0, 2.8.2
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12711) deadly hdfs test

2017-11-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16238056#comment-16238056
 ] 

stack commented on HDFS-12711:
--

This is excellent work.

Would a kill -QUIT before you do actual kill of the errant processes be of use? 
It'd do a dump of stack trace before process goes away (processes might not be 
connected to stdout/stderr anymore?). Thanks.

> deadly hdfs test
> 
>
> Key: HDFS-12711
> URL: https://issues.apache.org/jira/browse/HDFS-12711
> Project: Hadoop HDFS
>  Issue Type: Test
>Affects Versions: 2.9.0, 2.8.2
>Reporter: Allen Wittenauer
>Priority: Critical
> Attachments: HDFS-12711.branch-2.00.patch, fakepatch.branch-2.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12711) deadly hdfs test

2017-10-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16219178#comment-16219178
 ] 

stack commented on HDFS-12711:
--

Rah rah [~aw]! Thanks for digging in.

> deadly hdfs test
> 
>
> Key: HDFS-12711
> URL: https://issues.apache.org/jira/browse/HDFS-12711
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Allen Wittenauer
> Attachments: HDFS-12711.branch-2.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11644) DFSStripedOutputStream should not implement Syncable

2017-04-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968113#comment-15968113
 ] 

stack commented on HDFS-11644:
--

bq. "The current behavior of throwing an exception is safer."

... but changes precedent?

Semantics in general are messy here around sync, et al. It is a reflection of 
he torturous journey taken by sync/flush/hflush/hsync in HDFS.

The blessed [~ste...@apache.org] tried writing a spec for DFS and got far on 
the read-side. Helps. Write-side is to do. I like Steves' comment that rather 
than "probe for interface, cast, query, maintain.." at each point at which 
we encounter a feature, rather, there'd be an upfront query that could be run 
before engaging w/ the fs implementation (though how does this work if tiering 
changes the underlying storage on us at runtime?).

Meantime, having DFSStripedOutputStream throw an exception breaking all that 
run on top (with no means of querying whether support or not) seems disruptive.

> DFSStripedOutputStream should not implement Syncable
> 
>
> Key: HDFS-11644
> URL: https://issues.apache.org/jira/browse/HDFS-11644
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-alpha1
>Reporter: Andrew Wang
>Assignee: Manoj Govindassamy
>  Labels: hdfs-ec-3.0-must-do
>
> FSDataOutputStream#hsync checks if a stream implements Syncable, and if so, 
> calls hsync. Otherwise, it just calls flush. This is used, for instance, by 
> YARN's FileSystemTimelineWriter.
> DFSStripedOutputStream extends DFSOutputStream, which implements Syncable. 
> However, DFSStripedOS throws a runtime exception when the Syncable methods 
> are called.
> We should refactor the inheritance structure so DFSStripedOS does not 
> implement Syncable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11170) Add create API in filesystem public class to support assign parameter through builder

2017-03-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15927487#comment-15927487
 ] 

stack commented on HDFS-11170:
--

I took a look at 03

 * CreateBuilder.java needs license.
 * I was going to suggest that CreateBuilder is too generic a name but it seems 
like we are misusing builder and that is why the confusion. For example:

1427DistributedFileSystemCreateBuilder builder =
1428fs.newCreateBuilder(testFilePath).build();
1429FSDataOutputStream out = fs.create(builder);

i.e. I build a 'create' builder to pass to a create function that then 'builds' 
the wanted object.

I'd think that when I called build on the builder, that I'd get back a 
FSDataOutputStream -- not a 'Builder' (this is what [~xiaobingo] says above now 
I've read those comments).





> Add create API in filesystem public class to support assign parameter through 
> builder
> -
>
> Key: HDFS-11170
> URL: https://issues.apache.org/jira/browse/HDFS-11170
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: SammiChen
>Assignee: Wei Zhou
>  Labels: hdfs-ec-3.0-nice-to-have
> Attachments: HDFS-11170-00.patch, HDFS-11170-01.patch, 
> HDFS-11170-02.patch, HDFS-11170-03.patch
>
>
> FileSystem class supports multiple create functions to help user create file. 
> Some create functions has many parameters, it's hard for user to exactly 
> remember these parameters and their orders. This task is to add builder  
> based create functions to help user more easily create file. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6450) Support non-positional hedged reads in HDFS

2017-03-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902444#comment-15902444
 ] 

stack commented on HDFS-6450:
-

[~elgoiri] Ignore my comment above. I thought this the resolved positional 
hedged read issue. My bad.

> Support non-positional hedged reads in HDFS
> ---
>
> Key: HDFS-6450
> URL: https://issues.apache.org/jira/browse/HDFS-6450
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.0
>Reporter: Colin P. McCabe
>Assignee: Liang Xie
> Attachments: HDFS-6450-like-pread.txt
>
>
> HDFS-5776 added support for hedged positional reads.  We should also support 
> hedged non-position reads (aka regular reads).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6450) Support non-positional hedged reads in HDFS

2017-03-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15901691#comment-15901691
 ] 

stack commented on HDFS-6450:
-

Open a new issue I'd say [~elgoiri]. Link it back here. 

> Support non-positional hedged reads in HDFS
> ---
>
> Key: HDFS-6450
> URL: https://issues.apache.org/jira/browse/HDFS-6450
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.4.0
>Reporter: Colin P. McCabe
>Assignee: Liang Xie
> Attachments: HDFS-6450-like-pread.txt
>
>
> HDFS-5776 added support for hedged positional reads.  We should also support 
> hedged non-position reads (aka regular reads).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-02-03 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15851725#comment-15851725
 ] 

stack commented on HDFS-9924:
-

[~Apache9] Good luck (I didn't fully grok #3. It would be coolio if you could 
interpolate an async access by implementing pb Service async interface using 
its callback and controller).

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2017-01-25 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839393#comment-15839393
 ] 

stack commented on HDFS-9924:
-

bq. Add a port unification service in front of the grpc server and the old rpc 
server to support both grpc client and old client.

When you say port unification service, what are you thinking? It'd be 
in-process listening on the DN port reading a few bytes to figure which RPC?

Reading https://www.cockroachlabs.com/blog/a-tale-of-two-ports/ would advocate 
listening on a new port altogether; an option 5 which is probably too much 
to ask. We should probably perf test grpc (going by the citation).

Thanks [~Apache9]




> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf, Async-HDFS-Performance-Report.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-11368) LocalFS does not allow setting storage policy so spew running in local mode

2017-01-25 Thread stack (JIRA)
stack created HDFS-11368:


 Summary: LocalFS does not allow setting storage policy so spew 
running in local mode
 Key: HDFS-11368
 URL: https://issues.apache.org/jira/browse/HDFS-11368
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Minor


commit f92a14ade635e4b081f3938620979b5864ac261f
Author: Yu Li 
Date:   Mon Jan 9 09:52:58 2017 +0800

HBASE-14061 Support CF-level Storage Policy

...added setting storage policy which is nice. Being able to set storage policy 
came in in hdfs 2.6.0 (HDFS-6584 Support Archival Storage) but you can only do 
this for DFS, not for local FS.

Upshot is that starting up hbase in standalone mode, which uses localfs, you 
get this exception every time:

{code}
2017-01-25 12:26:53,400 WARN  [StoreOpener-93375c645ef2e649620b5d8ed9375985-1] 
fs.HFileSystem: Failed to set storage policy of 
[file:/var/folders/d8/8lyxycpd129d4fj7lb684dwhgp/T/hbase-stack/hbase/data/hbase/namespace/93375c645ef2e649620b5d8ed9375985/info]
 to [HOT]
java.lang.UnsupportedOperationException: Cannot find specified method 
setStoragePolicy
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:209)
at 
org.apache.hadoop.hbase.fs.HFileSystem.setStoragePolicy(HFileSystem.java:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:207)
at 
org.apache.hadoop.hbase.regionserver.HRegionFileSystem.setStoragePolicy(HRegionFileSystem.java:198)
at org.apache.hadoop.hbase.regionserver.HStore.(HStore.java:237)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:5265)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:988)
at org.apache.hadoop.hbase.regionserver.HRegion$1.call(HRegion.java:985)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoSuchMethodException: 
org.apache.hadoop.fs.LocalFileSystem.setStoragePolicy(org.apache.hadoop.fs.Path,
 java.lang.String)
at java.lang.Class.getMethod(Class.java:1786)
at 
org.apache.hadoop.hbase.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:205)
...
{code}

It is distracting at the least. Let me fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed

2017-01-24 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836371#comment-15836371
 ] 

stack commented on HDFS-11303:
--

Hey [~alicezhangchen] You got an email when Andrew made state changes to this 
issue. He set its state to 'patch available' which triggered a run of the CI 
system. It looks like one test failed. Do you think it related (let me trigger 
a rerun... some tests are flakey and fail on occasion irregardless of what the 
attached patch does). Andrew also flagged this JIRA as affecting 3.0.0-alpha1 
which probably make it fit for commit as fix for next hadoop3 release.

Let me trigger another run and see how the patch does. I'll leave this issue 
open another few days in the hope that someone else will chime in with a 
review. Will commit whether-or-which in a day or two. Thanks for the patch.

> Hedged read might hang infinitely if read data from all DN failed 
> --
>
> Key: HDFS-11303
> URL: https://issues.apache.org/jira/browse/HDFS-11303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha1
>Reporter: Chen Zhang
>Assignee: Chen Zhang
> Attachments: HDFS-11303-001.patch
>
>
> Hedged read will read from a DN first, if timeout, then read other DNs 
> simultaneously.
> If read all DN failed, this bug will cause the future-list not empty(the 
> first timeout request left in list), and hang in the loop infinitely



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed

2017-01-24 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HDFS-11303:
-
Attachment: HDFS-11303-001.patch

Retry

> Hedged read might hang infinitely if read data from all DN failed 
> --
>
> Key: HDFS-11303
> URL: https://issues.apache.org/jira/browse/HDFS-11303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha1
>Reporter: Chen Zhang
>Assignee: Chen Zhang
> Attachments: HDFS-11303-001.patch, HDFS-11303-001.patch
>
>
> Hedged read will read from a DN first, if timeout, then read other DNs 
> simultaneously.
> If read all DN failed, this bug will cause the future-list not empty(the 
> first timeout request left in list), and hang in the loop infinitely



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-11303) Hedged read might hang infinitely if read data from all DN failed

2017-01-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-11303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15812572#comment-15812572
 ] 

stack commented on HDFS-11303:
--

Patch LGTM. Your patch allows that the primary read might still complete before 
the new hedged reads whereas what was there previous would discard anything 
that came in after timeout. Good. The test is just to verify we time out? W/o 
your fix, the test hangs?

> Hedged read might hang infinitely if read data from all DN failed 
> --
>
> Key: HDFS-11303
> URL: https://issues.apache.org/jira/browse/HDFS-11303
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha1
>Reporter: Chen Zhang
> Attachments: HDFS-11303-001.patch
>
>
> Hedged read will read from a DN first, if timeout, then read other DNs 
> simultaneously.
> If read all DN failed, this bug will cause the future-list not empty(the 
> first timeout request left in list), and hang in the loop infinitely



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-28 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15531228#comment-15531228
 ] 

stack commented on HDFS-10690:
--

Skimmed. Patch LGTM. Unfortunate we leave behind some perf but agree on 
avoiding custom data structure unless large benefit. Nice work.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2016-06-17 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336228#comment-15336228
 ] 

stack commented on HDFS-9924:
-

[~steve_l] Thanks for the context. Need to make sure this usecase and its 
variants are talked up loudly over in HADOOP-12910

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: Async-HDFS-Performance-Report.pdf, AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2016-06-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15335420#comment-15335420
 ] 

stack commented on HDFS-9924:
-

[~xiaobingo] Thanks for posting the compare. It helps. It looks like the 
difference between 'async' and thread pool is negligible; 10% at the extreme. 
Is that how you interpret the results? As per [~andrew.wang], would be 
interested in what happens when less threads (especially as NN is set up with 
300 handlers...); tendency seems to be the less threads you use, the better it 
does. Thanks for the report.

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: Async-HDFS-Performance-Report.pdf, AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Nonblocking HDFS Access

2016-06-16 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333922#comment-15333922
 ] 

stack commented on HDFS-9924:
-

+1 for a branch. Otherwise we have this mess going on late in a mature branch 
where folks are piecemealing APIs and renaming stuff on the fly because there 
is no consensus.

> [umbrella] Nonblocking HDFS Access
> --
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-15 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332670#comment-15332670
 ] 

stack commented on HDFS-9924:
-

bq. Quoting Tsz Wo Nicholas Sze words, I understand your concern but it is a 
different problem. We should not protect NN by making the client slow. We 
should add protection in NN instead

The above quote is magical-thinking (see the response to the above quote given 
by Daryn, an operator of one of our largest deploys). We are talking branch-2 
here for this Future hack. The NN is not going to sprout scale of a sudden in 
the branch-2 line to support 'thousands' of concurrent ops coming in from an 
adjacent, Hive metadata server blame-shifting. Some form of parsimony, concern 
for NN loading, is in order.

Rereading this issue from the top down (including the design doc -- it needs 
numbers... what is a large number of calls?; why wouldn't a thread pool work 
given you need to throttle) and seeing where we have arrived, this issue is not 
about 'Asynchronous HDFS Access' as the summary and original description 
advertises but instead is an expedient hack-for-hive, for late in branch-2 
only. The 'change' will have a short shelf-life it seems given it arrives in 
2.9.0+ (?) and branch-3 is looking to be a different API (See discussion on 
HADOOP-12910).  The two distinct positions I discern in the discussion so far 
-- those who want a true async API on HDFS and those working on a hive fix -- 
are having trouble finding a common ground. If this characterization is 
correct, I'd suggest lets just call this issue a hack-for-hive explicitly and 
annotate it as such. A good few of the participants in this issue are likely 
not much interested in the latter (e.g. myself) as long as this work does not 
get in the way of our having a 'real' async API (HADOOP-12910) or confuse 
downstreamers on what the async story on HDFS is.








> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330867#comment-15330867
 ] 

stack commented on HDFS-9924:
-

I see. Thank you. I see what you want now.

You just need renames or you need more than rename? You want to do thousands of 
concurrent renames this way? Is that even going to work? Are you going to knock 
over the NN? Or, aren't you just have a bunch of outstanding calls blocked on 
remote NN locks? Won't you want to constrict how many ongoing calls there are?

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330173#comment-15330173
 ] 

stack commented on HDFS-9924:
-

bq. There are multiple comments from both sides indicating that 
CompletableFuture is the ideal option for 3.x.

[~arpiagariu] Please leave off concluding a discussion that is still ongoing 
(CF is not 'ideal' and is not a given). It doesn't help sir.

bq. You mean just like we recently added 'avoid local nodes' because another 
downstream component wanted to try it? 

You misrepresent, again. HBase ran for years with a workaround while waiting on 
the behavior to show up in HDFS; i.e. the hbase project did not have an 
'interest' in 'avoid local nodes'; they required this behavior of the 
filesystem and ran with a suboptimal hack until it showed up.

In this case all we have is 'interest' and requests for technical justification 
go unanswered.

bq. The Hive engineers think they can make it work for them and there was a 
compromise proposed to introduce the API as unstable.

I'm interested in how Hive will do async w/ only a Future and in how this 
suboptimal API in particular will solve their issue (is it described 
anywhere?). In my experience, a bunch of rigging (threads) for polling, rather 
than notification, is required when all you have is a Future to work with.




> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328974#comment-15328974
 ] 

stack commented on HDFS-9924:
-

Your summary and characterization of where the discussion is at is not correct 
[~arpit99]. The discussion is ongoing still (CompletableFuture is a significant 
undertaking, ListenableFuture copied local or something like is a possible 
candidate, etc.)

bq. Since some downstream developers have expressed an interest in trying out a 
2.x Future-based API even if it's tagged as Unstable/Experimental, is there a 
compelling reason to deny it?

I'd hope that it takes more than 'interest' to get code committed to HDFS.

bq. If Future turns to be of no use to anyone we can evolve the API in a later 
2.x release or just revert it completely while the way forward (3.x) remains 
unaffected.

If a technical argument on why Future will fix a codebases's scaling problem 
can't be produced, we can just skip the above evolutions and reverts altogether.


> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328290#comment-15328290
 ] 

stack commented on HDFS-9924:
-

[~ashutoshc] Can you make a bit of a better argument than citing the mighty 
[~steve_l] please? Dealing with a mess of returned Futures will also complicate 
Hive codebase, no? Can you explain why an half-an-async HDFS API would be 
easier for you to deal with? Thanks.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-09 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15322864#comment-15322864
 ] 

stack commented on HDFS-9924:
-

bq. This JIRA and the current implementation originally aim to support a basic 
async access to HDFS without callback support and without chaining support.

The JIRA is about HDFS async access. There is no exception in the summary nor 
description to rule out the basic async callback primitive. You could rule it 
out via fiat -- can you even call it an 'async' API if it doesn't do callback? 
-- but why not do it right from the get-go. Do it once only too.

bq. When we change to return XxxFuture in the future, it is a backward 
compatible change.

You and [~jnp] have said this a few times but for downstreamers, a Future-only 
API is not worth engaging with. It means each of us has to build parking 
structures to keep the unfinished the Futures in, polling to look for 
completions to react too. This is a performance-killer. Been there. Done that.

I like the [~mingma] summary/suggestion with the [~andrew.wang] caveat; revert 
and dev in a feature branch against trunk. I know of a few downstreamers that 
are interested, myself included, and would be up for helping out. Thanks.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15317717#comment-15317717
 ] 

stack commented on HDFS-9924:
-

Ugh. I meant to say, we risk having different 'async' API/implementations if we 
piecemeal the implementation ahead of our figuring a general approach for 
async'ing the Filesystem. What is committed currently is inadequate according 
to the discussion so far missing callback. Retrofitting callback on Future, as 
I understand it, will require a different implementation; therefore the commits 
are premature. Revert in the meantime seems like the right thing to do.

bq. I'm much more worried about API correctness Waiting a while to actually 
let more folks play with it before pushing it into a release (including the 3.x 
release that we're working to cut from trunk) just seems like an obvious, 
common sense thing to do.

Above makes sense to me.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

2016-06-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15316850#comment-15316850
 ] 

stack commented on HDFS-9924:
-

I'd suggest we work out a coherent, global filesystem async API/strategy before 
we start committing implementations (piecemeal) otherwise we will frustrate our 
users.

> [umbrella] Asynchronous HDFS Access
> ---
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302344#comment-15302344
 ] 

stack commented on HDFS-7240:
-

bq. It is unfair to say that you are being rebuffed.

Can we please move to discussion of the design. Back and forth on what is 
'fair', 'tone', and how folks got commit bits is corrosive and derails what is 
important here; i.e. landing this big one.

> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7240) Object store in HDFS

2016-05-26 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15302191#comment-15302191
 ] 

stack commented on HDFS-7240:
-

bq. Now, can people stop being territorial or making any form of criticism of 
each other. It is fundamentally against the ASF philosophy of collaborative, 
community development, doesn't help long term collaboration and makes the 
entire project look bad. Thanks.

Amen.

Thanks for posting design [~anu]

bq. Datanodes provide a shared generic storage service called the container 
layer .

Is this HDFS Datanode? We'd add block manager functionality to the Datanode? 
(Did we answer the [~zhz] question, "How about "why an object store as part of 
HDFS"?)

Thanks


> Object store in HDFS
> 
>
> Key: HDFS-7240
> URL: https://issues.apache.org/jira/browse/HDFS-7240
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Jitendra Nath Pandey
>Assignee: Jitendra Nath Pandey
> Attachments: Ozone-architecture-v1.pdf, Ozonedesignupdate.pdf, 
> ozone_user_v0.pdf
>
>
> This jira proposes to add object store capabilities into HDFS. 
> As part of the federation work (HDFS-1052) we separated block storage as a 
> generic storage layer. Using the Block Pool abstraction, new kinds of 
> namespaces can be built on top of the storage layer i.e. datanodes.
> In this jira I will explore building an object store using the datanode 
> storage, but independent of namespace metadata.
> I will soon update with a detailed design document.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-29 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15264590#comment-15264590
 ] 

stack commented on HDFS-3702:
-

Any chance of getting this on 2.8 branch? Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 3.0.0, 2.9.0
>
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702.012.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15239291#comment-15239291
 ] 

stack commented on HDFS-3702:
-

bq. Suppose we find that the CreateFlag.NO_LOCAL_WRITE is bad. How do we remove 
it, i.e. what is the procedure to remove it? I believe we cannot simply remove 
it since it probably will break HBASE compilation.

Just remove it. HBase has loads of practice dealing with stuff being 
moved/removed and changed under it by HDFS.

You could also just leave the flag in place since there is no obligation that 
any filesystem respect the flag. It is a suggestion only (See 
http://linux.die.net/man/2/open / create for the long, interesting set of flags 
it has) 

 bq. Another possible case: suppose that we find the disfavorNodes feature is 
very useful later on. How do we add it?

Same way you'd add any feature.. and HBase would look for it the way it does 
now peeking for presence of extra facility with if/else hdfs, reflection, 
try/catches of nosuchmethod, etc. We have lots of practice doing this also. 
We'd keep using the NO_LOCAL_WRITE flag though, unless it purged, since it does 
what we want. As I understand it, disfavoredNodes would require a lot more work 
of hbase to get the same functionality as NO_LOCAL_WRITE provides.

bq. It seems that the "whatever proofing" is to let the community try the 
features for a period of time. Then, we may add it to the FileSystem API.

Sorry. 'whatever proofing' is overly expansive. We are just adding a flag. I 
just meant, if the tests added here are not sufficient or you want some other 
proof it works, pre-commit, just say. No problem.

Also, the community has been running with this 'feature' for years (See 
HBASE-6435) so no need of our taking the suggested disruptive 'indirection' 
just to add a filesystem 'hint' with attendant mess in HDFS -- extra params on 
create -- that cannot subsequently be removed.

Thanks [~szetszwo]

What do you think of our adding the attributes LimitedPrivate and Evolving to 
the flag. Would that be indicator enough for you?

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-12 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237558#comment-15237558
 ] 

stack commented on HDFS-3702:
-

bq. I suggest HBase should do the same way of how it today is using 
favoredNodes.

Thanks [~szetszwo] for the response, but you did not answer the question. The 
question was if you thought the process of first staging a 'hidden' API is a 
fair burden to put on your favorite downstream project (not to mention the mess 
it makes inside HDFS -- see note above this one for graphic detail).

Lets back up. I think it will help us make some progress here again.

You say:

bq. So let add this flag later so that it allows us to test the feature and see 
if it is good enough or we may actually need disfavoredNodes. Sound good?

No. It does not sound good. There is no need to stage a feature as hidden 
first, one that is reasonable (see above discussion with the opinion of many), 
and has an immediate need/user. If any concern that the feature is lacking or 
does not work as advertised, lets do whatever proofing of the feature is needed 
here as part of this issue and just get it done. If the bundled tests are 
unsatisfactory or if you'd like me to try and report result of running this 
facility at scale, just say... no problem. If the implementation has a bug, 
lets fix in a follow-up. As we would do any other feature in HDFS.

On your concern that a new 'hint' to the create method exposes new API, an API 
that by definition does not put a burden on any FS implementation that they 
need implement the suggested operation -- i.e. the amount of API 'surface' is 
miniscule --  it has been suggested above that we flag it 
@InterfaceAudience.LimitedPrivate(HBase) for a probationary period. How about 
we also add @InterfaceStability.Evolving on the flag so it can be yanked 
anytime if for some unforeseen reason, it a total mistake. Would this assuage 
your exposure concern [~szetszwo]? Thanks for your time.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-08 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233363#comment-15233363
 ] 

stack commented on HDFS-3702:
-

bq. I suggest adding a new create(..) method to DistributedFileSystem, either 
with a new boolean or with the AddBlockFlag, in this JIRA so that the community 
can try out the feature. We may add the CreateFlag.NO_LOCAL_WRITE once the 
feature has been stabilized and we has decided that it is the right API.

Tell me more how this will process would work please [~szetszwo]? IIUC, a 
downstream project, say HBase which already has an awful hack in place to try 
and simulate a poor-man's version of this feature, would via reflection, look 
first for the presence of this new create override IFF the implementation is 
HDFS (don't look if LocalFS or S3, etc.)? If HDFS and if present, we'd drop our 
hack and use the new method (via reflection). Later, after it is 'proven' that 
a feature, one that hbase has wanted for years now, has 'merit', we would then 
add a new path w/ more reflection (IFF the FS implementation is HDFS) that 
would use the NO_LOCAL_WRITE when it becomes available? (Would we remove the 
create override when the NO_LOCAL_WRITE FS hint gets added?) Are you suggesting 
that downstream projects do this?

Regards favorednodes, thats an unfinished topic and of a different character to 
what is being suggested here as it added overrides rather than a 'hint' flag as 
this patch does here.

Thanks [~szetszwo]

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-04-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231686#comment-15231686
 ] 

stack commented on HDFS-3702:
-

bq. So let add this flag later so that it allows us to test the feature and see 
if it is good enough or we may actually need disfavoredNodes. Sound good?

[~szetszwo] Isn't CreateFlag.NO_LOCAL_WRITE how this facility gets exposed to 
clients? If it is not present, how does the feature get exercised at all? 
Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702.010.patch, 
> HDFS-3702.011.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209177#comment-15209177
 ] 

stack commented on HDFS-3702:
-

I skimmed #9 patch. Seems good to me other than issues [~szetszwo] raises (we 
are using the AddBlockFlag rather than the client flag... and I think 
AddBlockFlag should be in hdfs as he suggests given your remark above on 
difference between client-facing flag and hdfs flag. Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-23 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209164#comment-15209164
 ] 

stack commented on HDFS-3702:
-

bq. stack, let's add a boolean noLoaclWrite to DistributedFileSystem or just 
reuse the new AddBlockFlag there. 

On adding a flag to DFS, taking a look, it would be 'odd' given what is there 
currently, and adding a public method to set a hint for a particular operation 
only would be tough to explain to the reader of the API ("Why flag here... when 
create takes flags already..."). Then there is the fact that the user has to do 
{code}if HDFS, then{code} and if we are on GPFS, an FS supported by one of our 
committers, then it is {code} if HDFS || GPFS{code} and so on.

I think you also mean 'and' in the above rather than 'or'. AddBlockFlag is 
internal to HDFS and marked Private so...not useable by clients maybe you 
are talking of how it will be implemented.  I'm not sure what you are 
suggesting here. Pardon me.

bq. You know, once it is in FileSystem, it is forever.

I know that for the client to ask for a behavior that is not there presently, 
yes, FileSystem has to change. We are talking about a self-described advisory, 
not a required new operation of the underlying FS.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702.009.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207327#comment-15207327
 ] 

stack commented on HDFS-3702:
-

And more(getting an emotional) a downstreamer is hampered spending 
unnecessary i/o and cpu for years now and the patch is being blocked because 
we'd add an enum to public API! Help us out mighty [~szetszwo]!  Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207307#comment-15207307
 ] 

stack commented on HDFS-3702:
-

bq. I am very uncomfortable to add CreateFlag.NO_LOCAL_WRITE and AddBlockFlag 
since we cannot remove them once they are added to the public FileSystem API.

The AddBlockFlag would have @InterfaceAudience.Private so it is not being added 
to the public API.

The CreateFlag.NO_LOCAL_WRITE is an advisory enum. Something has to be 
available in the API for users like HBase to pull on. This seems to be most 
minimal intrusion possible. Being a hint by nature, it'd be undoable.

Thanks for your consideration [~szetszwo]

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207267#comment-15207267
 ] 

stack commented on HDFS-3702:
-

[~szetszwo]

[~arpitagarwal] is -0 if

bq. AddBlockFlag should be tagged as @InterfaceAudience.Private if we proceed 
with the .008 patch.

... and then what if  CreateFlag.NO_LOCAL_WRITE was marked LimitedPrivate with 
HBase denoted as the consumer? Would that be sufficient accommodation of your 
concern?

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-22 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207034#comment-15207034
 ] 

stack commented on HDFS-3702:
-

bq. If the region server has write permissions on /hbase/.logs, which I assume 
it does, it should be able to set policies on that directory.

Makes sense [~arpitagarwal] Thanks. We can mess with this stuff when/if an 
accommodating block policy shows up. Meantime, you still -0 on this patch going 
in in meantime?

[~szetszwo] You against commit still sir? @nkeywal reminds me of the price we 
are currently paying not being able to ask HDFS to avoid local replicas.  Seems 
easy enough to revisit given the way this is implemented should favoredNodes 
stabilize, and then a subsequent disfavoredNodes facility. Thanks.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205361#comment-15205361
 ] 

stack commented on HDFS-3702:
-

bq. ...could you comment on the usability of providing node lists to this API? 

Usually nodes and NN can agree on what they call machines but we've all seen 
plenty of clusters where this is not so. Both HDFS and HBase have their own 
means of insulating themselves against dodgy named setups. These systems are 
not in alignment.

bq. My impression was that tracking this in HBase was onerous, and is part of 
why favored nodes fell out of favor.

No. It was never fully plumbed in HBase (it was plumbed into a balancer that no 
one used and would not swap into place because the default was featureful). 
Regards the FB experience, we need to get them to do us a post-mortem.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205343#comment-15205343
 ] 

stack commented on HDFS-3702:
-

bq. Hi stack, the attribute could be set by an installer script or an API call 
at process startup

 [~arpitagarwal]Thanks. Yeah, vendors could ensure installers set the 
attribute. There are a significant set of installs where HBase shows up 
post-HDFS install and/or where HBase does not have sufficient permissions to 
set attributes on HDFS. I don't know the percentage. Would be just easier all 
around if it could be managed internally by HBase so no need to get scripts 
and/or operators involved. 

bq. ...so if you think HBase needs a solution now, ...

Smile. The issue was opened in July 2012 so we not holding our breath (smile). 
Would be cool if we could ask HDFS to not write local. Anyone doing WAL-on-HDFS 
will appreciate this in HDFS.

Thanks [~arpitagarwal]

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205337#comment-15205337
 ] 

stack commented on HDFS-3702:
-

bq. How would DFSClient know which nodes are disfavored nodes? How could it 
enforce disfavored nodes?

You postulated an application that wanted to  'distribute its files 
uniformly in a cluster.' I was just trying to suggest that users would prefer 
that HDFS would just do it for them. HDFS would know how to do it better being 
the arbiter of what is happening in the cluster. An application will do a poor 
job compared. 'distribute its files uniformly...' sounds like a good feature to 
implement with a block placement policy.

bq. Since we already have favoredNodes, adding disfavoredNodes seems more 
natural than adding a flag.

As noted above at 'stack added a comment - 12/Mar/16 15:20', favoredNodes is an 
unexercised feature that has actually been disavowed by the originators of the 
idea, FB, because it proved broken in practice. I'd suggest we not build more 
atop a feature-under-review as adding disfavoredNodes would (or at least until 
we hear of successful use of favoredNodes -- apparently our Y! are trying it).

bq. In addition, the new FileSystem CreateFlag does not look clean to me since 
it is too specific to HDFS. How would other FileSystems such as LocalFileSystem 
implement it?

The flag added by the attached patch is qualified throughout as a 'hint'. When 
set against LFS, it'll just be ignored. No harm done. The 'hint' didn't take.

If we went your suggested route and added a disfavoredNodes route, things get a 
bit interesting when hbase, say, passes localhost. What'll happen? Does the 
user now have to check the FS implementation type before they select DFSClient 
method to call?

I don't think you are objecting to the passing of flags on create, given this 
seems pretty standard fare in FSs.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-3702) Add an option for NOT writing the blocks locally if there is a datanode on the same box as the client

2016-03-21 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15205142#comment-15205142
 ] 

stack commented on HDFS-3702:
-

For uniform distribution of files over a cluster, I think users would prefer 
that DFSClient managed it for them (a new flag on CreateFlag?) rather than do 
calculation figuring how to populate favoredNodes and disfavoredNodes using 
imperfect knowledge of the cluster, something the NN will always do better at.

Unless you have other possible uses, disfavoredNodes seems like a more 
intrusive and roundabout route -- with its overrides, possible builders, and 
global interpretation of 'localhost' string -- to the clean flag this patch 
carries?

What you think [~szetszwo]? Thanks Nicolas.

> Add an option for NOT writing the blocks locally if there is a datanode on 
> the same box as the client
> -
>
> Key: HDFS-3702
> URL: https://issues.apache.org/jira/browse/HDFS-3702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.5.1
>Reporter: Nicolas Liochon
>Assignee: Lei (Eddy) Xu
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HDFS-3702.000.patch, HDFS-3702.001.patch, 
> HDFS-3702.002.patch, HDFS-3702.003.patch, HDFS-3702.004.patch, 
> HDFS-3702.005.patch, HDFS-3702.006.patch, HDFS-3702.007.patch, 
> HDFS-3702.008.patch, HDFS-3702_Design.pdf
>
>
> This is useful for Write-Ahead-Logs: these files are writen for recovery 
> only, and are not read when there are no failures.
> Taking HBase as an example, these files will be read only if the process that 
> wrote them (the 'HBase regionserver') dies. This will likely come from a 
> hardware failure, hence the corresponding datanode will be dead as well. So 
> we're writing 3 replicas, but in reality only 2 of them are really useful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   6   >