[ 
https://issues.apache.org/jira/browse/SPARK-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16860976#comment-16860976
 ] 

Lai Zhou edited comment on SPARK-9983 at 6/11/19 12:18 PM:
-----------------------------------------------------------

[~rxin], we now use Calcite to build a high performance hive sql engine , it's 
released as opensource now.

see [https://github.com/51nb/marble]

It works fine for real-time ML scene in our financial business. But I think 
it's not the best solution.

Adding a single-node version of DataFrame to spark may be the best solution, 
because spark sql has 

natural compatibility with Hive sql, and people can enjoy the benefits of the 
excellent optimizer, vectorized execution, code gen...etc .


was (Author: hhlai1990):
[~rxin], we now use Calcite to build a high performance hive sql engine , it's 
released as opensource now.

see [https://github.com/51nb/marble]

It works fine for real-time ML scene in our financial business. But I think 
it's not the best solution.

 I think adding a single-node version of DataFrame to spark may be the best 
solution, because spark sql has 

natural compatibility with Hive sql, and people can enjoy the benefits of the 
excellent optimizer, vectorized execution, code gen...etc .

> Local physical operators for query execution
> --------------------------------------------
>
>                 Key: SPARK-9983
>                 URL: https://issues.apache.org/jira/browse/SPARK-9983
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>            Priority: Major
>
> In distributed query execution, there are two kinds of operators:
> (1) operators that exchange data between different executors or threads: 
> examples include broadcast, shuffle.
> (2) operators that process data in a single thread: examples include project, 
> filter, group by, etc.
> This ticket proposes clearly differentiating them and creating local 
> operators in Spark. This leads to a lot of benefits: easier to test, easier 
> to optimize data exchange, better design (single responsibility), and 
> potentially even having a hyper-optimized single-node version of DataFrame.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to