I'm always looking for places to help out and integrate/share designs &
ideas. I look forward to chatting with you about Q4A at the hackathon
tomorrow!

Have you, by chance, seen the Spark SQL adapter for the Accumulo Recipes
Event & Entity Stores [1]? At the very least, it's a good example of using
Spark's SQL abstraction over Accumulo. As Mike Drob pointed out, Spark SQL
has a pretty robust query planning / optimization layer. The Event/Entity
stores in Accumulo Recipes also have a pluggable query
planning/optimization layer.


[1]
https://github.com/calrissian/accumulo-recipes/blob/master/thirdparty/spark/src/test/scala/org/calrissian/accumulorecipes/spark/sql/EventStoreCatalystTest.scala

On Mon, Apr 27, 2015 at 9:38 PM, Mike Drob <mad...@cloudera.com> wrote:

> Andrew,
>
> This is a cool thing to work on, I hope you have great success!
>
> A couple of questions about the motivations behind this, if you don't mind
> -
> - There are several SQL implementations already in the Hadoop ecosystem.
> In what ways do you expect this to improve upon
> Hive/Impala/Phoenix/Presto/Spark SQL? I haven't looked at the code, so it
> is quite possible you're already using one of those technologies.
> - In a conversation with some HP engineers earlier this year, they
> mentioned that building a SQL-92 layer is the easy part, and that a mature
> optimization engine is the really hard part. This is where Oracle may still
> be leaps and bounds ahead of its nearest competitors. Do you have plans for
> a query planner? If not, you might be back to writing MapReduce jobs sooner
> than you think.
>
> Look forward to seeing more!
>
> Mike
>
> On Mon, Apr 27, 2015 at 7:37 PM, Andrew Wells <awe...@clearedgeit.com>
> wrote:
>
>> I have been working on a project, tentatively called Q4A (Query for
>> Accumulo). Another possible name is ASQ (Accumulo Streaming Query) [discus].
>>
>> This is a streaming query as the query is completed via a stream, should
>> never group data in memory. To batch, intermediate results would be written
>> back to Accumulo temporarily.
>>
>>
>> The *primary goal* is to have a complete SQL implementation native to
>> Accumulo.
>>
>> *Why do this?*
>> I am getting tired of writing bad java code to query a database. I would
>> rather write bad SQL code. Also, people should be able to get queries out
>> faster and it shouldn't take a developer.
>>
>>
>> *Native To Accumulo*:
>>
>>    - There should be no special format to read a database created by Q4A
>>    - There should be no special format for Q4A to query a table
>>    - All tables are tables available to Q4A
>>    - Any special tables, are stored away from the users databases
>>    (indexes, column definitions, etc)
>>
>> *Other Goals*:
>>
>>    - Implement the entire SQL definition (currently all of SQLite)
>>    - Create JDBC Driver/Server
>>    - Push down Expressions to the Tablet Servers
>>    - Install-less queries, use Q4A jar directly against any Accumulo
>>    Cluster ( less push-down expressions)
>>    - documentation :o
>>    - testing ;)
>>
>> *Does it work?*
>> Not yet, the project is still a work in progress. and I will be working
>> on it at the Accumulo Summit this year. Progress is slow as I am getting
>> married in about a month and some change.
>>
>> *Questions:*
>> If you have questions about Q4A as here, I will be at the Accumulo Summit
>> @ ClearEdgeIT Table and Hackathon.
>>
>> *WHERE IS TEH LINK?!1!*
>> Oh.... here: https://github.com/agwells0714/q4a
>>
>> --
>> *Andrew George Wells*
>> *Software Engineer*
>> *awe...@clearedgeit.com <awe...@clearedgeit.com>*
>>
>>
>

Reply via email to