[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector

Zhan Zhang (JIRA) Wed, 23 Dec 2015 14:55:33 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15070286#comment-15070286
 ]


Zhan Zhang commented on HBASE-14796:
------------------------------------

Thanks [~ted.m] for the quick review. It is reasonable to have a performance 
test, and I will try to grab some physical cluster for it. It may take some 
time, as I don't have physical cluster for this. 

On the other hand, I do think we should change it to perform BulkGet in 
executors regardless the performance (although I think it should improve the 
performance instead of the other way), because:

1. Current implementation do gather-scatter in driver, which would increase 
network overhead and latency if the number of gets is big.

2. Failure recovery. It is hard to do failure recovery as it is performed in 
driver, which is single point of failure.

The above two have been discussed in details. But I just realized there is 
another potential issue, which the current implementation may be against Spark 
SQL engine design as below.

3. Currently, the bulkGet is happening in the query plan (buildScan), and the 
results will stay in driver (1st). The result is distributed to executors in 
query execution(2nd). 
  3.1 1st and 2nd are not always happening in pair. Even worse, sometimes only 
1st is happening, for example, users do plan.explain, but may never trigger the 
plan execution. 
  3.2 Memory taken by table.get may never get released in driver, increase the 
driver memory overhead.

[~ted.m] Please let me know how do you think, and correct me if my 
understanding is wrong.

> Enhance the Gets in the connector
> ---------------------------------
>
>                 Key: HBASE-14796
>                 URL: https://issues.apache.org/jira/browse/HBASE-14796
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Malaska
>            Assignee: Zhan Zhang
>            Priority: Minor
>         Attachments: HBASE-14976.patch
>
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector

Reply via email to