Great, 30% is definitely in the ballpark of what we'd expect.
No worries on finding time, of course. Thanks for the reply.
On 6/7/17 1:51 AM, Sean Busbey wrote:
I read through all the internal notes I could find from back in that
testing, and I don't see any mention of changing the durability
settings on meta nor root.
So that's a plausible source for the perf hit. I don't know when I'll
have time to run through some tests to verify.
On Tue, Jun 6, 2017 at 2:43 PM, Josh Elser <josh.el...@gmail.com> wrote:
(spinning off from the other thread)
The backstory on Sean's testing can be found in [1]. Essentially, in his
testing, he observed some cases where there was an unexplained ~30%
performance impact.
<quote
Batch write performance for Accumulo 1.7.2‐cdh5.5.0 shows a regression of up
to approximately 30 percent, depending on table shape, when compared to
Accumulo 1.6.0‐cdh5.1.4. The performance decrease is more severe for
exceptionally large cells (100k and larger) or exceptionally wide rows (10k
columns). Carefully consider the performance impact for your environment
when deciding to upgrade to Accumulo 1.7.2‐cdh5.5.0.
</quote>
Since it came up again, I was hoping we could put this concern to rest,
chalking it up to the WAL flush/sync calls that changed between 1.6 and 1.7
as documented by our Keith[2]. Hopefully, Sean's notes are sufficient for us
to reconstruct his environment :)
- Josh
[1]
https://www.cloudera.com/documentation/other/accumulo/latest/PDF/Apache-Accumulo-Installation-Guide-1-7-2.pdf
[2] https://accumulo.apache.org/blog/2016/11/02/durability-performance.html
-------- Forwarded Message --------
Subject: Re: [DISCUSS] Question about 1.7 bugfix releases
Date: Tue, 6 Jun 2017 14:20:27 -0400
From: Josh Elser <josh.el...@gmail.com>
To: dev@accumulo.apache.org
On 6/6/17 2:13 PM, Sean Busbey wrote:
On Tue, Jun 6, 2017 at 12:07 PM, Josh Elser <josh.el...@gmail.com> wrote:
On 6/6/17 12:39 PM, Sean Busbey wrote:
For example, has anyone done perf comparisons between 1.7 and 1.8.z?
When it came time for me to start telling folks that it was "safe" to
upgrade to 1.7.z I ran into something like a 40-60% perf degradation
on writes compared to 1.6 across the board. A little bit of this was
already fixed in 1.8 at the time, but a substantial amount required a
non-trivial refactoring because just no one had looked[1]. Even after
all of that, I still had to caveat things because I still saw a
~15-30% perf drop on random writes in the presence of lots of columns.
At a risk of de-railing otherwise good discussion on releases: do you
recall
if you had accounted for the following, Sean? (notably, the last code
snippet)
https://accumulo.apache.org/blog/2016/11/02/durability-performance.html
I know that "set durability to flush and not sync" was one of the
parameters for the comparison, but I don't remember what was done
specifically during the testing back in September, tbh.
I can probably dig it out if you'd like; I think we were pretty good
at keeping notes. Probably something for a different thread?
Agreed. Just wanted to ask before I forgot again. Saw some relevance in the
worry of perf regressions 1.7->1.8 based on the existence of those you saw
1.6->1.7, but def don't want to derail further here.
If you have the time and the notes, would be happy to review.