Thanks for clarifying Adam. I am interested in learning more about the hash-join strategy, in case you're familiar with them. Suppose we want to join the PartSupp and Supplier table on ps_suppkey = s_suppkey. The s_suppkey is a primary key of the Supplier table and it is stored in the Accumulo row. The ps_suppkey is neither a key nor stored in the row of the PartSupp table. (The PartSupp table's row is a UUID.)
Is the hash-join strategy to (1) scan tuples (whole rows) from PartSupp to a Presto worker, (2) for a batch of PartSupp tuples fetch the matching Supplier tuples, (3) repeat until all tuples are read from PartSupp? Regards, Dylan On Mon, Jun 13, 2016 at 8:24 AM, Adam J. Shook <adamjsh...@gmail.com> wrote: > A few clarifications: > > - Presto supports hash-based distributed joins as well as broadcast joins > > - Presto metadata is stored in ZooKeeper, but metadata storage is pluggable > and could be stored in Accumulo instead > > - The connector does use tablet locality when scanning Accumulo, but our > testing has shown you get better performance by giving Accumulo and Presto > their own dedicated machines, making locality a moot point. This will > certainly change based on types of queries, data sizes, network quality, > etc. > > - You can insert the results of a query into a Presto table using INSERT > INTO foo SELECT ..., as well as create a table from the results of a query > (CTAS). Though, for large inserts, it is typically best to bypass the > Presto layer and insert directly into the Accumulo tables using the > PrestoBatchWriter API > > Cheers, > --Adam > > On Mon, Jun 13, 2016 at 7:20 AM, Christopher <ctubb...@apache.org> wrote: > > > Thanks for that summary, Dylan! Very helpful. > > > > On Mon, Jun 13, 2016, 01:36 Dylan Hutchison <dhutc...@cs.washington.edu> > > wrote: > > > > > Thanks for sharing Sean. Here are some notes I wrote after reading the > > > article on Presto-Accumulo design. I have a research interest in the > > > relationship between relational (SQL) and non-relational (Accumulo) > > > systems, so I couldn't resist reading the post in detail. > > > > > > - Places the primary key in the Accumulo row. > > > - Performs row-at-a-time processing (each tuple is one row in > > > Accumulo) using WholeRowIterator behavior. > > > - Relational table metadata is stored in the Presto infrastructure > (as > > > opposed to an Accumulo table). > > > - Supports the creation of index tables for any attributes. These > > > index tables speed up queries that filter on indexed attributes. It > > is > > > standard secondary indexing, which provides speedups when the > > selectivity > > > of the query is roughly <10% of the original table. > > > - Only database->client querying is supported. You cannot run > "select > > > ... into result_table". > > > - As far as I can see, Presto only has one join strategy: *broadcast > > > join*. The right table of every join is scanned into one of the > > > Presto worker's memory. Subsequently the size of the right table is > > > limited by worker memory. > > > - There is one Presto worker for each Accumulo tablet, which enables > > > good scaling. > > > - The Presto bridge classes track internal Accumulo information such > > > as the assignment of tablets to tablet servers by reading Accumulo's > > > Metadata table. Presto uses tablet locations to provide better > > locality. > > > - The Presto bridge comes with several Accumulo server-side > iterators > > > for filtering and aggregating. > > > - The code is quite nice and clean. > > > > > > This image below gives Presto's architecture. Accumulo takes the role > of > > > the DB icon in the bottom-right corner. > > > > > > [image: Inline image 2] > > > > > > Bloomberg ran 13 out of the 22 TPC-H queries. There is no fundamental > > > reason why they cannot run all the queries; they just have not > > implemented > > > everything required ('exists' clauses, non-equi join, etc.). > > > > > > The interface looks like this, though they use a compiled java jar to > > > insert entries from a csv file (it wraps around a BatchWriter). > > > > > > [image: Inline image 3] > > > > > > Here are performance results. They don't say what hardware or data > sizes > > > they use. Whatever it is, they must have the ability to fit the > smaller > > > table of any join into memory as a result of Presto's broadcast join > > > strategy. The strong scaling looks very nice. > > > > > > [image: Inline image 4] > > > > > > They have one other plot that shows how secondary indexing speeds up > some > > > queries with low selectivity. > > > > > > Cheers, Dylan > > > > > > > > > > > > On Sun, Jun 12, 2016 at 7:06 PM, Sean Busbey <bus...@cloudera.com> > > wrote: > > > > > >> Bloomberg have a post about a connector they made to query Accumulo > from > > >> Presto: > > >> > > >> > > >> > > > http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/ > > >> > > >> -- > > >> Sean Busbey > > >> > > > > > > > > >