Interesting! Yeah, I'm not sure anymore (assuming I even knew at one
time). I think they've moved away from it entirely and the code wasn't
open-sourced, so it may just be lost to the ages :)
Either way, thanks for the extra details!
Adam J. Shook wrote:
Maybe. I'd be interested in what they'd done to get the big gains. My
approach was, for a full table scan (vs. using the index), the connector
creates one Presto split for each tablet. The Accumulo metadata table is
scanned to get the tablet location, and the host hint for the Presto split
is set to where the tablet is located. Using this approach, the query
times were nearly identical whether the workers were co-located with tablet
servers or not.
On Mon, Jun 13, 2016 at 11:55 AM, Josh Elser<josh.el...@gmail.com> wrote:
Adam J. Shook wrote:
A few clarifications:
- Presto supports hash-based distributed joins as well as broadcast joins
- Presto metadata is stored in ZooKeeper, but metadata storage is
pluggable
and could be stored in Accumulo instead
- The connector does use tablet locality when scanning Accumulo, but our
testing has shown you get better performance by giving Accumulo and Presto
their own dedicated machines, making locality a moot point. This will
certainly change based on types of queries, data sizes, network quality,
etc.
I'm a little confused by this. Co-locating presto workers and tservers
doesn't necessarily mean that you're going to get local reads/writes at the
Accumulo layer. I remember this is something that the Argyle Data folks had
found was a big gain for them when they were doing Presto+Accumulo. Maybe
your findings were more based on your specific circumstances?
- You can insert the results of a query into a Presto table using INSERT
INTO foo SELECT ..., as well as create a table from the results of a query
(CTAS). Though, for large inserts, it is typically best to bypass the
Presto layer and insert directly into the Accumulo tables using the
PrestoBatchWriter API
Cheers,
--Adam
On Mon, Jun 13, 2016 at 7:20 AM, Christopher<ctubb...@apache.org> wrote:
Thanks for that summary, Dylan! Very helpful.
On Mon, Jun 13, 2016, 01:36 Dylan Hutchison<dhutc...@cs.washington.edu>
wrote:
Thanks for sharing Sean. Here are some notes I wrote after reading the
article on Presto-Accumulo design. I have a research interest in the
relationship between relational (SQL) and non-relational (Accumulo)
systems, so I couldn't resist reading the post in detail.
- Places the primary key in the Accumulo row.
- Performs row-at-a-time processing (each tuple is one row in
Accumulo) using WholeRowIterator behavior.
- Relational table metadata is stored in the Presto infrastructure
(as
opposed to an Accumulo table).
- Supports the creation of index tables for any attributes. These
index tables speed up queries that filter on indexed attributes. It
is
standard secondary indexing, which provides speedups when the
selectivity
of the query is roughly<10% of the original table.
- Only database->client querying is supported. You cannot run
"select
... into result_table".
- As far as I can see, Presto only has one join strategy: *broadcast
join*. The right table of every join is scanned into one of the
Presto worker's memory. Subsequently the size of the right table is
limited by worker memory.
- There is one Presto worker for each Accumulo tablet, which enables
good scaling.
- The Presto bridge classes track internal Accumulo information such
as the assignment of tablets to tablet servers by reading Accumulo's
Metadata table. Presto uses tablet locations to provide better
locality.
- The Presto bridge comes with several Accumulo server-side
iterators
for filtering and aggregating.
- The code is quite nice and clean.
This image below gives Presto's architecture. Accumulo takes the role
of
the DB icon in the bottom-right corner.
[image: Inline image 2]
Bloomberg ran 13 out of the 22 TPC-H queries. There is no fundamental
reason why they cannot run all the queries; they just have not
implemented
everything required ('exists' clauses, non-equi join, etc.).
The interface looks like this, though they use a compiled java jar to
insert entries from a csv file (it wraps around a BatchWriter).
[image: Inline image 3]
Here are performance results. They don't say what hardware or data
sizes
they use. Whatever it is, they must have the ability to fit the smaller
table of any join into memory as a result of Presto's broadcast join
strategy. The strong scaling looks very nice.
[image: Inline image 4]
They have one other plot that shows how secondary indexing speeds up
some
queries with low selectivity.
Cheers, Dylan
On Sun, Jun 12, 2016 at 7:06 PM, Sean Busbey<bus...@cloudera.com>
wrote:
Bloomberg have a post about a connector they made to query Accumulo from
Presto:
http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/
--
Sean Busbey