[jira] [Commented] (HDFS-9081) False-positive ACK slow log in DFSClient

2016-11-08 Thread static-max (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15647839#comment-15647839
 ] 

static-max commented on HDFS-9081:
--

I also get this warning in my client logs, but not on any namenode or datanode.
So I think this is a rather annoying bug. I get this error with a client 
application (Apache Flink) with very low throughput (1MB / minute) and all my 
other Hadoop applications work without any problem.

Any work planned to fix this?

> False-positive ACK slow log in DFSClient
> 
>
> Key: HDFS-9081
> URL: https://issues.apache.org/jira/browse/HDFS-9081
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
>Priority: Minor
>
> This issue is related with code below:
> {noformat}
> if (duration > dfsclientSlowLogThresholdMs
> && ack.getSeqno() != Packet.HEART_BEAT_SEQNO) {
>   DFSClient.LOG
>   .warn("Slow ReadProcessor read fields took " + duration
>   + "ms (threshold=" + dfsclientSlowLogThresholdMs + "ms); ack: "
>   + ack + ", targets: " + Arrays.asList(targets));
> } else if (DFSClient.LOG.isDebugEnabled()) {
>   DFSClient.LOG.debug("DFSClient " + ack);
> }
> {noformat}
> DFSClient prints slow log when awaited after unexpected amount of time 
> (usually 3 ms). This is a good indicator for network or I/O performance 
> issue.
> However, there is scenario that this slow log is false-positive, i.e. a 
> reducer, (StageA) iterates over records with identical key, this takes 
> arbitrary amount of time, but generates no output. (StageB) Then, it output 
> arbitrary number of records when meet a different key.
> If one StageA lasts more than 3 ms (as the example above), there will be 
> one or more slow log generated, which is not related to any HDFS performance 
> issue. 
> In general cases, user should not expect this, as they could be misguided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9081) False-positive ACK slow log in DFSClient

2015-10-02 Thread He Tianyi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942008#comment-14942008
 ] 

He Tianyi commented on HDFS-9081:
-

I think code below in {{waitForAckedSeqno}} is already a good indicator of slow 
acks:
{noformat}
  long duration = Time.monotonicNow() - begin;
  if (duration > dfsclientSlowLogThresholdMs) {
LOG.warn("Slow waitForAckedSeqno took " + duration
+ "ms (threshold=" + dfsclientSlowLogThresholdMs + "ms)");
  }
{noformat}

While slow log in {{ResponseProcessor}} indicates either slow data producer or 
slow ack. I'd suggest more informative slow log.

> False-positive ACK slow log in DFSClient
> 
>
> Key: HDFS-9081
> URL: https://issues.apache.org/jira/browse/HDFS-9081
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.6.0
>Reporter: He Tianyi
>Assignee: He Tianyi
>Priority: Minor
>
> This issue is related with code below:
> {noformat}
> if (duration > dfsclientSlowLogThresholdMs
> && ack.getSeqno() != Packet.HEART_BEAT_SEQNO) {
>   DFSClient.LOG
>   .warn("Slow ReadProcessor read fields took " + duration
>   + "ms (threshold=" + dfsclientSlowLogThresholdMs + "ms); ack: "
>   + ack + ", targets: " + Arrays.asList(targets));
> } else if (DFSClient.LOG.isDebugEnabled()) {
>   DFSClient.LOG.debug("DFSClient " + ack);
> }
> {noformat}
> DFSClient prints slow log when awaited after unexpected amount of time 
> (usually 3 ms). This is a good indicator for network or I/O performance 
> issue.
> However, there is scenario that this slow log is false-positive, i.e. a 
> reducer, (StageA) iterates over records with identical key, this takes 
> arbitrary amount of time, but generates no output. (StageB) Then, it output 
> arbitrary number of records when meet a different key.
> If one StageA lasts more than 3 ms (as the example above), there will be 
> one or more slow log generated, which is not related to any HDFS performance 
> issue. 
> In general cases, user should not expect this, as they could be misguided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)