Apache Drill vs Spark SQL

kant kodali Thu, 06 Apr 2017 22:35:32 -0700

Hi All,

I am very impressed with the work done on Spark SQL however when I have to
pick something to serve real time queries I am in a dilemma for the
following reasons.


1. Even though Spark Sql has logical plans, physical plans and run time
code generation and all that it still doesn't look like the tool to serve
real time queries like we normally do from a database. I tend to think this
is because the queries had to go through job submission first. I don't want
to call this overhead or anything but this is what it seems to do.
comparing this, on the other hand we have the data that we want to serve
sitting on a database where we simply issue an SQL query and get the
response back so for this use case what would be an appropriate tool? I
tend to think its Drill but would like to hear if there are any interesting
arguments.

2. I can see a case for Spark SQL such as queries that need to be expressed
in a iterative fashion. For example doing a graph traversal such BFS, DFS
or say even a simple pre order, in order , post order Traversals on a BST.
All this will be very hard to express on a Declarative syntax like SQL. I
also tend to think Ad-hoc distributed joins (By Ad-hoc I mean one is not
certain about their query patterns) are also better off expressing it in
map-reduce style than say SQL unless one know their query patterns well
ahead such that the possibility of queries that require redistribution is
so low. I am also sure there are plenty of other cases where Spark SQL will
excel but I wanted to see what is good choice to simple serve the data?

Any suggestions are appreciated.

Thanks!

Apache Drill vs Spark SQL

Reply via email to