[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2017-09-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182800#comment-16182800
 ] 

Hadoop QA commented on HDFS-8088:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-8088 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-8088 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12724103/HDFS-8088.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/21383/console |
| Powered by | Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2017-09-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182239#comment-16182239
 ] 

Steve Loughran commented on HDFS-8088:
--

anyone fancy bringing this up to sync with trunk so we can look at with a goal 
for getting it into 3.1?

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2017-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805967#comment-15805967
 ] 

Hadoop QA commented on HDFS-8088:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  5s{color} 
| {color:red} HDFS-8088 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HDFS-8088 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12724103/HDFS-8088.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/18073/console |
| Powered by | Apache Yetus 0.5.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin P. McCabe
>Assignee: Colin P. McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-16 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498655#comment-14498655
 ] 

Billie Rinaldi commented on HDFS-8088:
--

To add to Josh's data, I ran some tests with and without the HDFS-8026 patch 
(both started from a clean Accumulo instance).  The patch definitely reduced 
the 0ms spans; I saw about 4x improvement.  There are still a lot of 0ms spans, 
though.
{noformat}
Before HDFS-8026:
tserver:DFSOutputStream#writeChunk={type='HDFS', nonzeroCount=5224, 
zeroCount=2564098, numTraces=163, log10SpanLength=[2564098, 5114, 85, 24, 1, 0, 
0]}
After HDFS-8026:
tserver:DFSOutputStream#write={type='HDFS', nonzeroCount=15263, 
zeroCount=2383993, numTraces=667, log10SpanLength=[2383993, 15037, 172, 52, 2, 
0, 0]}
{noformat}

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-14 Thread Billie Rinaldi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494799#comment-14494799
 ] 

Billie Rinaldi commented on HDFS-8088:
--

I would also lean towards goal #1, figuring out what parts of an operation took 
the longest.  I have it on my to-do list to analyze the spans Accumulo is 
creating and cut down on the ones that don't seem to be adding much 
information.  I think targeting 2.7.1 for changes in HDFS tracing will be fine 
for Accumulo.  I've made our sampling configurable and made our span receiver 
able to configurably drop 0ms spans to handle any excess of spans from HDFS 
tracing.  (I know dropping some of the spans in a trace is not a recommended 
procedure, but it ends up being okay since nearly all 0ms spans have no 
children.)

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-14 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494783#comment-14494783
 ] 

Josh Elser commented on HDFS-8088:
--

bq. Keep in mind that doing a write in HDFS just hands the data off to a 
background thread called DataStreamer. which writes it out asynchronously

Ahh, good point. I didn't fully connect the dots in my head when I was 
initially guessing.

bq. I'm inclined to lean more towards goal #1 (figure out why specific requests 
had high latency) than goal #2

Agreed.

bq.  I do think maybe we should target 2.7.1 for some of these changes since I 
need to think through everything

I would be very happy to see this (as well as HDFS-8026) land into 2.7.1.

bq.  I'd also like to run some patches by you guys to see if it improves the 
usefulness of HTrace to you.

Happy to do so. I'm sure [~billie.rinaldi], as well as some other Accumulo 
folks, would be interested.

bq. also I am at a conference now, so I apologize if my replies are slow!

No worries! Assuming you're at ApacheCon (and presumably speaking?), I hope it 
goes well. Enjoy, and we can catch up when you're on a normal schedule again.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494760#comment-14494760
 ] 

Colin Patrick McCabe commented on HDFS-8088:


bq. I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and 
things are much happier. The performance is much closer to what I previously 
saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc)

Thanks, Josh.  I discovered that we are reading non-trivial amounts of remote 
data inside the {{DFSInputStream#blockSeekTo}} method, so I think we'll also 
need to create a trace span for that one.  Also, the {{BlockReader}} trace 
scopes will need to use the {{DFSClient#traceSampler}} (currently they don't) 
or else we will never get any trace spans from reads.  I think that is what we 
would need to get the patch on this JIRA committed.

bq. Giving a very quick look at the code (and making what's possible a bad 
guess), perhaps all of the 0ms length spans (denoted by zeroCount in the above, 
as opposed to the nonzeroCount) are when DFSOutputStream#writeChunk is only 
appending data into the current packet and not actually submitting that packet 
for the data streamer to process? With some more investigation into the 
hierarchy, I bet I could definitively determine that.

Keep in mind that doing a write in HDFS just hands the data off to a background 
thread called {{DataStreamer}}. which writes it out asynchronously.  The only 
reason why {{writeChunk}} would ever have a time much higher than 0 is that 
there was lock contention (the {{DataStreamer#waitAndQueuePacket}} method 
couldn't get the {{DataStreamer#dataQueue}} lock immediately) or that there 
were more than {{dfs.client.write.max-packets-in-flight}} unacked messages in 
flight already.  (HDFS calls "messages" by the name of "packets" even though 
each message is typically multiple ethernet packets.)

I guess we have to step back and ask what the end goal is for HTrace.  If the 
end goal is figuring out why some requests had a high latency, it makes sense 
to only trace parts of the program that we think will take a non-trivial amount 
of time.  In that case, we should probably only trace the handoff of the full 
packet to the {{DataStreamer}}.  If the end goal is understanding the 
downstream consequences of all operations, then we have to connect up the dots 
for all operations.  That's why I originally had all calls to write() and 
read() create trace spans.

I'm inclined to lean more towards goal #1 (figure out why specific requests had 
high latency) than goal #2.  I think that looking at the high-latency outliers 
will naturally lead us to fix the biggest performance issues (such as locking 
contention, disk issues, network issues, etc.).  Also, if all calls to write() 
and read() create trace spans, then this will have a "multiplicative" effect on 
our top-level sampling rate which I think is undesirable.

bq. That being said, I hope I'm not being too much of a bother with all this. I 
was just really excited to see this functionality in HDFS and want to make 
we're getting good data coming back out. Thanks for bearing with me and for the 
patches you've already made!

We definitely appreciate all the input.  I think it's very helpful.  I do think 
maybe we should target 2.7.1 for some of these changes since I need to think 
through everything.  I know that's frustrating, but hopefully if we maintain a 
reasonable Hadoop release cadence it won't be too bad.  I'd also like to run 
some patches by you guys to see if it improves the usefulness of HTrace to you. 
 And I am doing a bunch of testing internally which I think will turn up a lot 
more potential improvements to HTrace and to its integration into HDFS.  
Use-cases really should be very helpful in motivating us here.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-14 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494761#comment-14494761
 ] 

Colin Patrick McCabe commented on HDFS-8088:


also I am at a conference now, so I apologize if my replies are slow!

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-13 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14493363#comment-14493363
 ] 

Josh Elser commented on HDFS-8088:
--

Ran a write-heavy test with the patch here as well as the HDFS-8026 patch on 
top of 2.7.1-SNAPSHOT, and found one last span "hotspot" (wonderfully 
formatted, courtesy of [~billie.rinaldi])

{noformat}
# Total spans from HDFS
{type='HDFS', nonzeroCount=6941, zeroCount=77221, numTraces=338, 
log10SpanLength=[77221, 5336, 1594, 11, 0, 0, 0]}, total 84162

# Offender
DFSOutputStream#write={type='HDFS', nonzeroCount=4252, zeroCount=75000, 
numTraces=24, log10SpanLength=[75000, 3598, 654, 0, 0, 0, 0]}
{noformat}

Giving a very quick look at the code (and making what's possible a bad guess), 
perhaps all of the 0ms length spans (denoted by zeroCount in the above, as 
opposed to the nonzeroCount) are when {{DFSOutputStream#writeChunk}} is only 
appending data into the current packet and not actually submitting that packet 
for the data streamer to process? With some more investigation into the 
hierarchy, I bet I could definitively determine that.

That being said, I hope I'm not being too much of a bother with all this. I was 
just really excited to see this functionality in HDFS and want to make we're 
getting good data coming back out. Thanks for bearing with me and for the 
patches you've already made!

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-13 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492997#comment-14492997
 ] 

Josh Elser commented on HDFS-8088:
--

I re-ran my test on Hadoop-2.7.1-SNAP with your patch applied, Colin, and 
things are much happier. The performance is much closer to what I previously 
saw with 2.6.0 (without any quantitative measurements). +1 (non-binding, ofc)

Billie pointed me at HDFS-8026 as well. I'll pull that one down and test with 
it as DFSOutputStream#writeChunk appeared to be the next, most-painfully traced 
method.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-13 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492586#comment-14492586
 ] 

Josh Elser commented on HDFS-8088:
--

Thanks for putting up a patch, Colin. I'll try to play with this today and see 
how it works out.

{quote}
bq. does it really degrade the performance of DFSInputStream a lot?

It doesn't degrade the performance at all if tracing is turned off.
{quote}

But it does quite terribly if tracing is enabled :)

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-09 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487757#comment-14487757
 ] 

Colin Patrick McCabe commented on HDFS-8088:


Thanks for looking at this.

bq. Colin, thanks for working on this, I have not gone through HDFS-8069, does 
it really degrade the performance of DFSInputStream a lot? If so, I think this 
is a block issue and let's make it in ASAP.

It doesn't degrade the performance at all if tracing is turned off.

bq. \[The change to hedgedReadId\] is not necessary.

True, but I think it's more intuitive to start the count at 1 than 0.  Just for 
some background, {{hedgedReadId}} is something I introduced, and which is only 
used for tracing.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-09 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487376#comment-14487376
 ] 

Yi Liu commented on HDFS-8088:
--

The patch needs to be rebased.
I agree that we don't need to have trace span for each read, which will affect 
performance.  I have checked that we have not added trace span for {{pread}}. 
The patch looks good to me.
BTW,
{code}
-int hedgedReadId = 0;
+int hedgedReadId = 1;
{code}
This change is not necessary.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-08 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486605#comment-14486605
 ] 

Yi Liu commented on HDFS-8088:
--

Colin, seems today is the deadline of 2.7 release? I have not gone through 
HDFS-8069, does it really degrade the performance of DFSInputStream a lot?  If 
so, I think this is a block issue and we should make this in today.
let me look at the patch now.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486600#comment-14486600
 ] 

Colin Patrick McCabe commented on HDFS-8088:


I have to look at this a bit more, I didn't get the tracing I wanted on my 
sample cluster.  The shorter names are very nice, though.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486590#comment-14486590
 ] 

Hadoop QA commented on HDFS-8088:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12724103/HDFS-8088.001.patch
  against trunk revision dc0282d.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HDFS-Build/10224//console

This message is automatically generated.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8088) Reduce the number of HTrace spans generated by HDFS reads

2015-04-08 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486519#comment-14486519
 ] 

Colin Patrick McCabe commented on HDFS-8088:


* Shorten the method names we're tracing.  {{ClientProtocol#create}} instead of 
{{org.apache.hadoop.hdfs.protocol.ClientProtocol.create}}, etc.

* Don't create trace spans for {{DFSInputStream#read(final byte buf[], int off, 
int len)}} and {{int read(final ByteBuffer buf)}}.  Note that we still create 
trace spans inside the block readers, when refilling the block reader buffers.

* RemoteBlockReader2.java: include the block ID as a key/value annotation, not 
in the name of the trace span itself.

* TestTracing.java: explain which trace spans we couldn't find, if we can't 
find some trace spans we are looking for.

> Reduce the number of HTrace spans generated by HDFS reads
> -
>
> Key: HDFS-8088
> URL: https://issues.apache.org/jira/browse/HDFS-8088
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Attachments: HDFS-8088.001.patch
>
>
> HDFS generates too many trace spans on read right now.  Every call to read() 
> we make generates its own span, which is not very practical for things like 
> HBase or Accumulo that do many such reads as part of a single operation.  
> Instead of tracing every call to read(), we should only trace the cases where 
> we refill the buffer inside a BlockReader.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)