Mich, Unfortunately, we are moving away from Hive and unifying on Spark using CDH 5.8 as our distro. And, the Tableau released a Spark ODBC/JDBC driver too. I will either try Phoenix JDBC Server for HBase or push to move faster to Kudu with Impala. We will use Impala as the JDBC in-between until the Kudu team completes Spark SQL support for JDBC.
Thanks for the advice. Cheers, Ben > On Oct 8, 2016, at 12:35 PM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Sure. But essentially you are looking at batch data for analytics for your > tableau users so Hive may be a better choice with its rich SQL and ODBC.JDBC > connection to Tableau already. > > I would go for Hive especially the new release will have an in-memory > offering as well for frequently accessed data :) > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 8 October 2016 at 20:15, Benjamin Kim <bbuil...@gmail.com > <mailto:bbuil...@gmail.com>> wrote: > Mich, > > First and foremost, we have visualization servers that run Tableau for > external user reports. Second, we have servers that are ad servers and REST > endpoints for cookie sync and segmentation data exchange. These will use JDBC > directly within the same data-center. When not colocated in the same > data-center, they will connected to a located database server using JDBC. > Either way, by using JDBC everywhere, it simplifies and unifies the code on > the JDBC industry standard. > > Does this make sense? > > Thanks, > Ben > > >> On Oct 8, 2016, at 11:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com >> <mailto:mich.talebza...@gmail.com>> wrote: >> >> Like any other design what is your presentation layer and end users? >> >> Are they SQL centric users from Tableau background or they may use spark >> functional programming. >> >> It is best to describe the use case. >> >> HTH >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> On 8 October 2016 at 19:40, Felix Cheung <felixcheun...@hotmail.com >> <mailto:felixcheun...@hotmail.com>> wrote: >> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC >> server - HBASE would work better. >> >> Without naming specifics, there are at least 4 or 5 different >> implementations of HBASE sources, each at varying level of development and >> different requirements (HBASE release version, Kerberos support etc) >> >> >> _____________________________ >> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>> >> Sent: Saturday, October 8, 2016 11:26 AM >> Subject: Re: Spark SQL Thriftserver with HBase >> To: Mich Talebzadeh <mich.talebza...@gmail.com >> <mailto:mich.talebza...@gmail.com>> >> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>>, Felix Cheung >> <felixcheun...@hotmail.com <mailto:felixcheun...@hotmail.com>> >> >> >> >> Mich, >> >> Are you talking about the Phoenix JDBC Server? If so, I forgot about that >> alternative. >> >> Thanks, >> Ben >> >> >> On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh <mich.talebza...@gmail.com >> <mailto:mich.talebza...@gmail.com>> wrote: >> >> I don't think it will work >> >> you can use phoenix on top of hbase >> >> hbase(main):336:0> scan 'tsco', 'LIMIT' => 1 >> ROW COLUMN+CELL >> TSCO-1-Apr-08 >> column=stock_daily:Date, timestamp=1475866783376, value=1-Apr-08 >> TSCO-1-Apr-08 >> column=stock_daily:close, timestamp=1475866783376, value=405.25 >> TSCO-1-Apr-08 >> column=stock_daily:high, timestamp=1475866783376, value=406.75 >> TSCO-1-Apr-08 >> column=stock_daily:low, timestamp=1475866783376, value=379.25 >> TSCO-1-Apr-08 >> column=stock_daily:open, timestamp=1475866783376, value=380.00 >> TSCO-1-Apr-08 >> column=stock_daily:stock, timestamp=1475866783376, value=TESCO PLC >> TSCO-1-Apr-08 >> column=stock_daily:ticker, timestamp=1475866783376, value=TSCO >> TSCO-1-Apr-08 >> column=stock_daily:volume, timestamp=1475866783376, value=49664486 >> >> And the same on Phoenix on top of Hvbase table >> >> 0: jdbc:phoenix:thin:url=http://rhes564:8765 <http://rhes564:8765/>> select >> substr(to_char(to_date("Date",'dd-MMM-yy')),1,10) AS TradeDate, "close" AS >> "Day's close", "high" AS "Day's High", "low" AS "Day's Low", "open" AS >> "Day's Open", "ticker", "volume", (to_number("low")+to_number("high"))/2 AS >> "AverageDailyPrice" from "tsco" where to_number("volume") > 0 and "high" != >> '-' and to_date("Date",'dd-MMM-yy') > to_date('2015-10-06','yyyy-MM-dd') >> order by to_date("Date",'dd-MMM-yy') limit 1; >> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+ >> | TRADEDATE | Day's close | Day's High | Day's Low | Day's Open | >> ticker | volume | AverageDailyPrice | >> +-------------+--------------+-------------+------------+-------------+---------+-----------+--------------------+ >> | 2015-10-07 | 197.00 | 198.05 | 184.84 | 192.20 | TSCO >> | 30046994 | 191.445 | >> >> HTH >> >> >> >> >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destructionof data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed.The >> author will in no case be liable for any monetary damages arising from >> suchloss, damage or destruction. >> >> >> On 8 October 2016 at 19:05, Felix Cheung <felixcheun...@hotmail.com >> <mailto:felixcheun...@hotmail.com>> wrote: >> Great, then I think those packages as Spark data source should allow you to >> do exactly that (replace org.apache.spark.sql.jdbc with HBASE one) >> >> I do think it will be great to get more examples around this though. Would >> be great if you could share your experience with this! >> >> >> _____________________________ >> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>> >> Sent: Saturday, October 8, 2016 11:00 AM >> Subject: Re: Spark SQL Thriftserver with HBase >> To: Felix Cheung <felixcheun...@hotmail.com >> <mailto:felixcheun...@hotmail.com>> >> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>> >> >> >> Felix, >> >> My goal is to use Spark SQL JDBC Thriftserver to access HBase tables using >> just SQL. I have been able to CREATE tables using this statement below in >> the past: >> >> CREATE TABLE <table-name> >> USING org.apache.spark.sql.jdbc >> OPTIONS ( >> url >> "jdbc:postgresql://<hostname>:<port>/dm?user=<username>&password=<password>", >> dbtable "dim.dimension_acamp" >> ); >> >> After doing this, I can access the PostgreSQL table using Spark SQL JDBC >> Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I want to >> do the same with HBase tables. We tried this using Hive and HiveServer2, but >> the response times are just too long. >> >> Thanks, >> Ben >> >> >> On Oct 8, 2016, at 10:53 AM, Felix Cheung <felixcheun...@hotmail.com >> <mailto:felixcheun...@hotmail.com>> wrote: >> >> Ben, >> >> I'm not sure I'm following completely. >> >> Is your goal to use Spark to create or access tables in HBASE? If so the >> link below and several packages out there support that by having a HBASE >> data source for Spark. There are some examples on how the Spark code look >> like in that link as well. On that note, you should also be able to use the >> HBASE data source from pure SQL (Spark SQL) query as well, which should work >> in the case with the Spark SQL JDBC Thrift Server (with >> USING,http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10 >> >> <http://spark.apache.org/docs/latest/sql-programming-guide.html#tab_sql_10>). >> >> >> _____________________________ >> From: Benjamin Kim <bbuil...@gmail.com <mailto:bbuil...@gmail.com>> >> Sent: Saturday, October 8, 2016 10:40 AM >> Subject: Re: Spark SQL Thriftserver with HBase >> To: Felix Cheung <felixcheun...@hotmail.com >> <mailto:felixcheun...@hotmail.com>> >> Cc: <user@spark.apache.org <mailto:user@spark.apache.org>> >> >> >> Felix, >> >> The only alternative way is to create a stored procedure (udf) in database >> terms that would run Spark scala code underneath. In this way, I can use >> Spark SQL JDBC Thriftserver to execute it using SQL code passing the key, >> values I want to UPSERT. I wonder if this is possible since I cannot CREATE >> a wrapper table on top of a HBase table in Spark SQL? >> >> What do you think? Is this the right approach? >> >> Thanks, >> Ben >> >> On Oct 8, 2016, at 10:33 AM, Felix Cheung <felixcheun...@hotmail.com >> <mailto:felixcheun...@hotmail.com>> wrote: >> >> HBase has released support for Spark >> hbase.apache.org/book.html#spark <http://hbase.apache.org/book.html#spark> >> >> And if you search you should find several alternative approaches. >> >> >> >> >> >> On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" <bbuil...@gmail.com >> <mailto:bbuil...@gmail.com>> wrote: >> >> Does anyone know if Spark can work with HBase tables using Spark SQL? I know >> in Hive we are able to create tables on top of an underlying HBase table >> that can be accessed using MapReduce jobs. Can the same be done using >> HiveContext or SQLContext? We are trying to setup a way to GET and POST data >> to and from the HBase table using the Spark SQL JDBC thriftserver from our >> RESTful API endpoints and/or HTTP web farms. If we can get this to work, >> then we can load balance the thriftservers. In addition, this will benefit >> us in giving us a way to abstract the data storage layer away from the >> presentation layer code. There is a chance that we will swap out the data >> storage technology in the future. We are currently experimenting with Kudu. >> >> Thanks, >> Ben >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> <mailto:user-unsubscr...@spark.apache.org> >> >> >> >> >> >> >> >> >> >> > >