1.6 to 1.7 performance regression

Josh Elser Tue, 06 Jun 2017 12:44:00 -0700

(spinning off from the other thread)

The backstory on Sean's testing can be found in [1]. Essentially, in histesting, he observed some cases where there was an unexplained ~30%performance impact.


<quote

Batch write performance for Accumulo 1.7.2‐cdh5.5.0 shows a regressionof up to approximately 30 percent, depending on table shape, whencompared to Accumulo 1.6.0‐cdh5.1.4. The performance decrease is moresevere for exceptionally large cells (100k and larger) or exceptionallywide rows (10k columns). Carefully consider the performance impact foryour environment when deciding to upgrade to Accumulo 1.7.2‐cdh5.5.0.

</quote>

Since it came up again, I was hoping we could put this concern to rest,chalking it up to the WAL flush/sync calls that changed between 1.6 and1.7 as documented by our Keith[2]. Hopefully, Sean's notes aresufficient for us to reconstruct his environment :)


- Josh

[1]https://www.cloudera.com/documentation/other/accumulo/latest/PDF/Apache-Accumulo-Installation-Guide-1-7-2.pdf

[2] https://accumulo.apache.org/blog/2016/11/02/durability-performance.html


-------- Forwarded Message --------
Subject: Re: [DISCUSS] Question about 1.7 bugfix releases
Date: Tue, 6 Jun 2017 14:20:27 -0400
From: Josh Elser <[email protected]>
To: [email protected]

On 6/6/17 2:13 PM, Sean Busbey wrote:

On Tue, Jun 6, 2017 at 12:07 PM, Josh Elser <[email protected]> wrote:

On 6/6/17 12:39 PM, Sean Busbey wrote:


For example, has anyone done perf comparisons between 1.7 and 1.8.z?

When it came time for me to start telling folks that it was "safe" to
upgrade to 1.7.z I ran into something like a 40-60% perf degradation
on writes compared to 1.6 across the board. A little bit of this was
already fixed in 1.8 at the time, but a substantial amount required a
non-trivial refactoring because just no one had looked[1]. Even after
all of that, I still had to caveat things because I still saw a
~15-30% perf drop on random writes in the presence of lots of columns.



At a risk of de-railing otherwise good discussion on releases: do you recall
if you had accounted for the following, Sean? (notably, the last code
snippet)

https://accumulo.apache.org/blog/2016/11/02/durability-performance.html


I know that "set durability to flush and not sync" was one of the
parameters for the comparison, but I don't remember what was done
specifically during the testing back in September, tbh.

I can probably dig it out if you'd like; I think we were pretty good
at keeping notes. Probably something for a different thread?

Agreed. Just wanted to ask before I forgot again. Saw some relevance inthe worry of perf regressions 1.7->1.8 based on the existence of thoseyou saw 1.6->1.7, but def don't want to derail further here.


If you have the time and the notes, would be happy to review.

1.6 to 1.7 performance regression

Reply via email to