Re: Spark SQL Thriftserver with HBase

Felix Cheung Sat, 08 Oct 2016 11:41:13 -0700

I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC server 
- HBASE would work better.

Without naming specifics, there are at least 4 or 5 different implementations 
of HBASE sources, each at varying level of development and different 
requirements (HBASE release version, Kerberos support etc)

_____________________________
From: Benjamin Kim <bbuil...@gmail.com<mailto:bbuil...@gmail.com>>
Sent: Saturday, October 8, 2016 11:26 AM
Subject: Re: Spark SQL Thriftserver with HBase
To: Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>>
Cc: <user@spark.apache.org<mailto:user@spark.apache.org>>, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>

Mich,

Are you talking about the Phoenix JDBC Server? If so, I forgot about that 
alternative.

Thanks,
Ben

On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh 
<mich.talebza...@gmail.com<mailto:mich.talebza...@gmail.com>> wrote:

I don't think it will work

you can use phoenix on top of hbase

hbase(main):336:0> scan 'tsco', 'LIMIT' => 1
ROW                                                       COLUMN+CELL
 TSCO-1-Apr-08                                            
column=stock_daily:Date, timestamp=1475866783376, value=1-Apr-08
 TSCO-1-Apr-08                                            
column=stock_daily:close, timestamp=1475866783376, value=405.25
 TSCO-1-Apr-08                                            
column=stock_daily:high, timestamp=1475866783376, value=406.75
 TSCO-1-Apr-08                                            
column=stock_daily:low, timestamp=1475866783376, value=379.25
 TSCO-1-Apr-08                                            
column=stock_daily:open, timestamp=1475866783376, value=380.00
 TSCO-1-Apr-08                                            
column=stock_daily:stock, timestamp=1475866783376, value=TESCO PLC
 TSCO-1-Apr-08                                            
column=stock_daily:ticker, timestamp=1475866783376, value=TSCO
 TSCO-1-Apr-08                                            
column=stock_daily:volume, timestamp=1475866783376, value=49664486

And the same on Phoenix on top of Hvbase table

0: jdbc:phoenix:thin:url=http://rhes564:8765<http://rhes564:8765/>> select 
substr(to_char(to_date("Date",'dd-MMM-yy')),1,10) AS TradeDate, "close" AS 
"Day's close", "high" AS "Day's High", "low" AS "Day's Low", "open" AS "Day's 
Open", "ticker", "volume", (to_number("low")+to_number("high"))/2 AS 
"AverageDailyPrice" from "tsco" where to_number("volume") > 0 and "high" != '-' 
and to_date("Date",'dd-MMM-yy') > to_date('2015-10-06','yyyy-MM-dd') order by  
to_date("Date",'dd-MMM-yy') limit 1;
+-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
|  TRADEDATE  | Day's close  | Day's High  | Day's Low  | Day's Open  | ticker  
|  volume   | AverageDailyPrice  |
+-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
| 2015-10-07  | 197.00       | 198.05      | 184.84     | 192.20      | TSCO    
| 30046994  | 191.445            |

HTH

Dr Mich Talebzadeh

LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/>

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destructionof data or any other property which may arise from relying 
on this email's technical content is explicitly disclaimed.The author will in 
no case be liable for any monetary damages arising from suchloss, damage or 
destruction.

On 8 October 2016 at 19:05, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:
Great, then I think those packages as Spark data source should allow you to do 
exactly that (replace org.apache.spark.sql.jdbc with HBASE one)

I do think it will be great to get more examples around this though. Would be 
great if you could share your experience with this!

_____________________________
From: Benjamin Kim <bbuil...@gmail.com<mailto:bbuil...@gmail.com>>
Sent: Saturday, October 8, 2016 11:00 AM
Subject: Re: Spark SQL Thriftserver with HBase
To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>
Cc: <user@spark.apache.org<mailto:user@spark.apache.org>>

Felix,

My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using just 
SQL. I have been able to CREATE tables using this statement below in the past:

CREATE TABLE <table-name>
USING org.apache.spark.sql.jdbc
OPTIONS (
  url 
"jdbc:postgresql://<hostname>:<port>/dm?user=<username>&password=<password>",
  dbtable "dim.dimension_acamp"
);

After doing this, I can access the PostgreSQL table using Spark SQL JDBC 
Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I want to do 
the same with HBase tables. We tried this using Hive and HiveServer2, but the 
response times are just too long.

Thanks,
Ben

On Oct 8, 2016, at 10:53 AM, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:

Ben,

I'm not sure I'm following completely.

Is your goal to use Spark to create or access tables in HBASE? If so the link 
below and several packages out there support that by having a HBASE data source 
for Spark. There are some examples on how the Spark code look like in that link 
as well. On that note, you should also be able to use the HBASE data source 
from pure SQL (Spark SQL) query as well, which should work in the case with the 
Spark SQL JDBC Thrift Server (with 
USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10).

_____________________________
From: Benjamin Kim <bbuil...@gmail.com<mailto:bbuil...@gmail.com>>
Sent: Saturday, October 8, 2016 10:40 AM
Subject: Re: Spark SQL Thriftserver with HBase
To: Felix Cheung <felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>>
Cc: <user@spark.apache.org<mailto:user@spark.apache.org>>

Felix,

The only alternative way is to create a stored procedure (udf) in database 
terms that would run Spark scala code underneath. In this way, I can use Spark 
SQL JDBC Thriftserver to execute it using SQL code passing the key, values I 
want to UPSERT. I wonder if this is possible since I cannot CREATE a wrapper 
table on top of a HBase table in Spark SQL?

What do you think? Is this the right approach?

Thanks,
Ben

On Oct 8, 2016, at 10:33 AM, Felix Cheung 
<felixcheun...@hotmail.com<mailto:felixcheun...@hotmail.com>> wrote:

HBase has released support for Spark
hbase.apache.org/book.html#spark<http://hbase.apache.org/book.html#spark>

And if you search you should find several alternative approaches.

On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" 
<bbuil...@gmail.com<mailto:bbuil...@gmail.com>> wrote:

Does anyone know if Spark can work with HBase tables using Spark SQL? I know in 
Hive we are able to create tables on top of an underlying HBase table that can 
be accessed using MapReduce jobs. Can the same be done using HiveContext or 
SQLContext? We are trying to setup a way to GET and POST data to and from the 
HBase table using the Spark SQL JDBC thriftserver from our RESTful API 
endpoints and/or HTTP web farms. If we can get this to work, then we can load 
balance the thriftservers. In addition, this will benefit us in giving us a way 
to abstract the data storage layer away from the presentation layer code. There 
is a chance that we will swap out the data storage technology in the future. We 
are currently experimenting with Kudu.

Thanks,
Ben
---------------------------------------------------------------------
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>

Re: Spark SQL Thriftserver with HBase

Reply via email to