Re: Spark SQL Thriftserver with HBase

Benjamin Kim Sat, 08 Oct 2016 12:55:56 -0700

Mich,

Unfortunately, we are moving away from Hive and unifying on Spark using CDH 5.8 
as our distro. And, the Tableau released a Spark ODBC/JDBC driver too. I will 
either try Phoenix JDBC Server for HBase or push to move faster to Kudu with 
Impala. We will use Impala as the JDBC in-between until the Kudu team completes 
Spark SQL support for JDBC.


Thanks for the advice.

Cheers,
Ben


> On Oct 8, 2016, at 12:35 PM, Mich Talebzadeh <mich.talebza...@gmail.com> 
> wrote:
> 
> Sure. But essentially you are looking at batch data for analytics for your 
> tableau users so Hive may be a better choice with its rich SQL and ODBC.JDBC 
> connection to Tableau already.
> 
> I would go for Hive especially the new release will have an in-memory 
> offering as well for frequently accessed data :)
> 
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
> 
> Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
> damage or destruction of data or any other property which may arise from 
> relying on this email's technical content is explicitly disclaimed. The 
> author will in no case be liable for any monetary damages arising from such 
> loss, damage or destruction.
>  
> 
> On 8 October 2016 at 20:15, Benjamin Kim <bbuil...@gmail.com 
> <mailto:bbuil...@gmail.com>> wrote:
> Mich,
> 
> First and foremost, we have visualization servers that run Tableau for 
> external user reports. Second, we have servers that are ad servers and REST 
> endpoints for cookie sync and segmentation data exchange. These will use JDBC 
> directly within the same data-center. When not colocated in the same 
> data-center, they will connected to a located database server using JDBC. 
> Either way, by using JDBC everywhere, it simplifies and unifies the code on 
> the JDBC industry standard.
> 
> Does this make sense?
> 
> Thanks,
> Ben
> 
> 
>> On Oct 8, 2016, at 11:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> 
>> Like any other design what is your presentation layer and end users?
>> 
>> Are they SQL centric users from Tableau background or they may use spark 
>> functional programming.
>> 
>> It is best to describe the use case.
>> 
>> HTH
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> On 8 October 2016 at 19:40, Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>> wrote:
>> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC 
>> server - HBASE would work better.
>> 
>> Without naming specifics, there are at least 4 or 5 different 
>> implementations of HBASE sources, each at varying level of development and 
>> different requirements (HBASE release version, Kerberos support etc)
>> 
>> 
>> _____________________________
>> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>>
>> Sent: Saturday, October 8, 2016 11:26 AM
>> Subject: Re: Spark SQL Thriftserver with HBase
>> To: Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>>
>> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>, Felix Cheung 
>> <felixcheun...@hotmail.com <mailto:felixcheun...@hotmail.com>>
>> 
>> 
>> 
>> Mich,
>> 
>> Are you talking about the Phoenix JDBC Server? If so, I forgot about that 
>> alternative.
>> 
>> Thanks,
>> Ben
>> 
>> 
>> On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh <mich.talebza...@gmail.com 
>> <mailto:mich.talebza...@gmail.com>> wrote:
>> 
>> I don't think it will work
>> 
>> you can use phoenix on top of hbase
>> 
>> hbase(main):336:0> scan 'tsco', 'LIMIT' => 1
>> ROW                                                       COLUMN+CELL
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:Date, timestamp=1475866783376, value=1-Apr-08
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:close, timestamp=1475866783376, value=405.25
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:high, timestamp=1475866783376, value=406.75
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:low, timestamp=1475866783376, value=379.25
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:open, timestamp=1475866783376, value=380.00
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:stock, timestamp=1475866783376, value=TESCO PLC
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:ticker, timestamp=1475866783376, value=TSCO
>>  TSCO-1-Apr-08                                            
>> column=stock_daily:volume, timestamp=1475866783376, value=49664486
>> 
>> And the same on Phoenix on top of Hvbase table
>> 
>> 0: jdbc:phoenix:thin:url=http://rhes564:8765 <http://rhes564:8765/>> select 
>> substr(to_char(to_date("Date",'dd-MMM-yy')),1,10) AS TradeDate, "close" AS 
>> "Day's close", "high" AS "Day's High", "low" AS "Day's Low", "open" AS 
>> "Day's Open", "ticker", "volume", (to_number("low")+to_number("high"))/2 AS 
>> "AverageDailyPrice" from "tsco" where to_number("volume") > 0 and "high" != 
>> '-' and to_date("Date",'dd-MMM-yy') > to_date('2015-10-06','yyyy-MM-dd') 
>> order by  to_date("Date",'dd-MMM-yy') limit 1;
>> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
>> |  TRADEDATE  | Day's close  | Day's High  | Day's Low  | Day's Open  | 
>> ticker  |  volume   | AverageDailyPrice  |
>> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+
>> | 2015-10-07  | 197.00       | 198.05      | 184.84     | 192.20      | TSCO 
>>    | 30046994  | 191.445            |
>> 
>> HTH
>> 
>> 
>> 
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>> 
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destructionof data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed.The 
>> author will in no case be liable for any monetary damages arising from 
>> suchloss, damage or destruction.
>>  
>> 
>> On 8 October 2016 at 19:05, Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>> wrote:
>> Great, then I think those packages as Spark data source should allow you to 
>> do exactly that (replace org.apache.spark.sql.jdbc with HBASE one)
>> 
>> I do think it will be great to get more examples around this though. Would 
>> be great if you could share your experience with this!
>> 
>> 
>> _____________________________
>> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>>
>> Sent: Saturday, October 8, 2016 11:00 AM
>> Subject: Re: Spark SQL Thriftserver with HBase
>> To: Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>>
>> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>
>> 
>> 
>> Felix,
>> 
>> My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using 
>> just SQL. I have been able to CREATE tables using this statement below in 
>> the past:
>> 
>> CREATE TABLE <table-name>
>> USING org.apache.spark.sql.jdbc
>> OPTIONS (
>>   url 
>> "jdbc:postgresql://<hostname>:<port>/dm?user=<username>&password=<password>",
>>   dbtable "dim.dimension_acamp"
>> );
>> 
>> After doing this, I can access the PostgreSQL table using Spark SQL JDBC 
>> Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I want to 
>> do the same with HBase tables. We tried this using Hive and HiveServer2, but 
>> the response times are just too long.
>> 
>> Thanks,
>> Ben
>> 
>> 
>> On Oct 8, 2016, at 10:53 AM, Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>> wrote:
>> 
>> Ben,
>> 
>> I'm not sure I'm following completely.
>> 
>> Is your goal to use Spark to create or access tables in HBASE? If so the 
>> link below and several packages out there support that by having a HBASE 
>> data source for Spark. There are some examples on how the Spark code look 
>> like in that link as well. On that note, you should also be able to use the 
>> HBASE data source from pure SQL (Spark SQL) query as well, which should work 
>> in the case with the Spark SQL JDBC Thrift Server (with 
>> USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10
>>  
>> <http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>).
>> 
>> 
>> _____________________________
>> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>>
>> Sent: Saturday, October 8, 2016 10:40 AM
>> Subject: Re: Spark SQL Thriftserver with HBase
>> To: Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>>
>> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>
>> 
>> 
>> Felix,
>> 
>> The only alternative way is to create a stored procedure (udf) in database 
>> terms that would run Spark scala code underneath. In this way, I can use 
>> Spark SQL JDBC Thriftserver to execute it using SQL code passing the key, 
>> values I want to UPSERT. I wonder if this is possible since I cannot CREATE 
>> a wrapper table on top of a HBase table in Spark SQL?
>> 
>> What do you think? Is this the right approach?
>> 
>> Thanks,
>> Ben
>> 
>> On Oct 8, 2016, at 10:33 AM, Felix Cheung <felixcheun...@hotmail.com 
>> <mailto:felixcheun...@hotmail.com>> wrote:
>> 
>> HBase has released support for Spark
>> hbase.apache.org/book.html#spark <http://hbase.apache.org/book.html#spark>
>> 
>> And if you search you should find several alternative approaches.
>> 
>> 
>> 
>> 
>> 
>> On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <bbuil...@gmail.com 
>> <mailto:bbuil...@gmail.com>> wrote:
>> 
>> Does anyone know if Spark can work with HBase tables using Spark SQL? I know 
>> in Hive we are able to create tables on top of an underlying HBase table 
>> that can be accessed using MapReduce jobs. Can the same be done using 
>> HiveContext or SQLContext? We are trying to setup a way to GET and POST data 
>> to and from the HBase table using the Spark SQL JDBC thriftserver from our 
>> RESTful API endpoints and/or HTTP web farms. If we can get this to work, 
>> then we can load balance the thriftservers. In addition, this will benefit 
>> us in giving us a way to abstract the data storage layer away from the 
>> presentation layer code. There is a chance that we will swap out the data 
>> storage technology in the future. We are currently experimenting with Kudu.
>> 
>> Thanks,
>> Ben
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>> <mailto:user-unsubscr...@spark.apache.org>
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
>

Re: Spark SQL Thriftserver with HBase

Reply via email to