please keep also in mind that Tableau Server has the capabilities to store data in-memory and refresh only when needed the in-memory data. This means you can import it from any source and let your users work only on the in-memory data in Tableau Server.
On Sun, Oct 9, 2016 at 9:22 AM, Jörn Franke <jornfra...@gmail.com> wrote: > Cloudera 5.8 has a very old version of Hive without Tez, but Mich provided > already a good alternative. However, you should check if it contains a > recent version of Hbase and Phoenix. That being said, I just wonder what is > the dataflow, data model and the analysis you plan to do. Maybe there are > completely different solutions possible. Especially these single inserts, > upserts etc. should be avoided as much as possible in the Big Data > (analysis) world with any technology, because they do not perform well. > > Hive with Llap will provide an in-memory cache for interactive analytics. > You can put full tables in-memory with Hive using Ignite HDFS in-memory > solution. All this does only make sense if you do not use MR as an engine, > the right input format (ORC, parquet) and a recent Hive version. > > On 8 Oct 2016, at 21:55, Benjamin Kim <bbuil...@gmail.com> wrote: > > Mich, > > Unfortunately, we are moving away from Hive and unifying on Spark using > CDH 5.8 as our distro. And, the Tableau released a Spark ODBC/JDBC driver > too. I will either try Phoenix JDBC Server for HBase or push to move faster > to Kudu with Impala. We will use Impala as the JDBC in-between until the > Kudu team completes Spark SQL support for JDBC. > > Thanks for the advice. > > Cheers, > Ben > > > On Oct 8, 2016, at 12:35 PM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Sure. But essentially you are looking at batch data for analytics for your > tableau users so Hive may be a better choice with its rich SQL and > ODBC.JDBC connection to Tableau already. > > I would go for Hive especially the new release will have an in-memory > offering as well for frequently accessed data :) > > > Dr Mich Talebzadeh > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > http://talebzadehmich.wordpress.com > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 8 October 2016 at 20:15, Benjamin Kim <bbuil...@gmail.com> wrote: > >> Mich, >> >> First and foremost, we have visualization servers that run Tableau for >> external user reports. Second, we have servers that are ad servers and REST >> endpoints for cookie sync and segmentation data exchange. These will use >> JDBC directly within the same data-center. When not colocated in the same >> data-center, they will connected to a located database server using JDBC. >> Either way, by using JDBC everywhere, it simplifies and unifies the code on >> the JDBC industry standard. >> >> Does this make sense? >> >> Thanks, >> Ben >> >> >> On Oct 8, 2016, at 11:47 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >> Like any other design what is your presentation layer and end users? >> >> Are they SQL centric users from Tableau background or they may use spark >> functional programming. >> >> It is best to describe the use case. >> >> HTH >> >> Dr Mich Talebzadeh >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> http://talebzadehmich.wordpress.com >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> On 8 October 2016 at 19:40, Felix Cheung <felixcheun...@hotmail.com> >> wrote: >> >>> I wouldn't be too surprised Spark SQL - JDBC data source - Phoenix JDBC >>> server - HBASE would work better. >>> >>> Without naming specifics, there are at least 4 or 5 different >>> implementations of HBASE sources, each at varying level of development and >>> different requirements (HBASE release version, Kerberos support etc) >>> >>> >>> _____________________________ >>> From: Benjamin Kim <bbuil...@gmail.com> >>> Sent: Saturday, October 8, 2016 11:26 AM >>> Subject: Re: Spark SQL Thriftserver with HBase >>> To: Mich Talebzadeh <mich.talebza...@gmail.com> >>> Cc: <user@spark.apache.org>, Felix Cheung <felixcheun...@hotmail.com> >>> >>> >>> >>> Mich, >>> >>> Are you talking about the Phoenix JDBC Server? If so, I forgot about >>> that alternative. >>> >>> Thanks, >>> Ben >>> >>> >>> On Oct 8, 2016, at 11:21 AM, Mich Talebzadeh <mich.talebza...@gmail.com> >>> wrote: >>> >>> I don't think it will work >>> >>> you can use phoenix on top of hbase >>> >>> hbase(main):336:0> scan 'tsco', 'LIMIT' => 1 >>> ROW COLUMN+CELL >>> TSCO-1-Apr-08 >>> column=stock_daily:Date, timestamp=1475866783376, value=1-Apr-08 >>> TSCO-1-Apr-08 >>> column=stock_daily:close, timestamp=1475866783376, value=405.25 >>> TSCO-1-Apr-08 >>> column=stock_daily:high, timestamp=1475866783376, value=406.75 >>> TSCO-1-Apr-08 >>> column=stock_daily:low, timestamp=1475866783376, value=379.25 >>> TSCO-1-Apr-08 >>> column=stock_daily:open, timestamp=1475866783376, value=380.00 >>> TSCO-1-Apr-08 >>> column=stock_daily:stock, timestamp=1475866783376, value=TESCO PLC >>> TSCO-1-Apr-08 >>> column=stock_daily:ticker, timestamp=1475866783376, value=TSCO >>> TSCO-1-Apr-08 >>> column=stock_daily:volume, timestamp=1475866783376, value=49664486 >>> >>> And the same on Phoenix on top of Hvbase table >>> >>> 0: jdbc:phoenix:thin:url=http://rhes564:8765> select >>> substr(to_char(to_date("Date",'dd-MMM-yy')),1,10) AS TradeDate, "close" >>> AS "Day's close", "high" AS "Day's High", "low" AS "Day's Low", "open" AS >>> "Day's Open", "ticker", "volume", (to_number("low")+to_number("high"))/2 >>> AS "AverageDailyPrice" from "tsco" where to_number("volume") > 0 and "high" >>> != '-' and to_date("Date",'dd-MMM-yy') > to_date('2015-10-06','yyyy-MM-dd') >>> order by to_date("Date",'dd-MMM-yy') limit 1; >>> +-------------+--------------+-------------+------------+--- >>> ----------+---------+-----------+--------------------+ >>> | TRADEDATE | Day's close | Day's High | Day's Low | Day's Open | >>> ticker | volume | AverageDailyPrice | >>> +-------------+--------------+-------------+------------+--- >>> ----------+---------+-----------+--------------------+ >>> | 2015-10-07 | 197.00 | 198.05 | 184.84 | 192.20 | >>> TSCO | 30046994 | 191.445 | >>> >>> HTH >>> >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destructionof data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed.The author will in no case be liable for any monetary damages >>> arising from suchloss, damage or destruction. >>> >>> >>> >>> On 8 October 2016 at 19:05, Felix Cheung <felixcheun...@hotmail.com> >>> wrote: >>> >>>> Great, then I think those packages as Spark data source should allow >>>> you to do exactly that (replace org.apache.spark.sql.jdbc with HBASE one) >>>> >>>> I do think it will be great to get more examples around this though. >>>> Would be great if you could share your experience with this! >>>> >>>> >>>> _____________________________ >>>> From: Benjamin Kim <bbuil...@gmail.com> >>>> Sent: Saturday, October 8, 2016 11:00 AM >>>> Subject: Re: Spark SQL Thriftserver with HBase >>>> To: Felix Cheung <felixcheun...@hotmail.com> >>>> Cc: <user@spark.apache.org> >>>> >>>> >>>> Felix, >>>> >>>> My goal is to use Spark SQL JDBC Thriftserver to access HBase tables >>>> using just SQL. I have been able to CREATE tables using this statement >>>> below in the past: >>>> >>>> CREATE TABLE <table-name> >>>> USING org.apache.spark.sql.jdbc >>>> OPTIONS ( >>>> url "jdbc:postgresql://<hostname>:<port>/dm?user=<username>&pass >>>> word=<password>", >>>> dbtable "dim.dimension_acamp" >>>> ); >>>> >>>> >>>> After doing this, I can access the PostgreSQL table using Spark SQL >>>> JDBC Thriftserver using SQL statements (SELECT, UPDATE, INSERT, etc.). I >>>> want to do the same with HBase tables. We tried this using Hive and >>>> HiveServer2, but the response times are just too long. >>>> >>>> Thanks, >>>> Ben >>>> >>>> >>>> On Oct 8, 2016, at 10:53 AM, Felix Cheung <felixcheun...@hotmail.com> >>>> wrote: >>>> >>>> Ben, >>>> >>>> I'm not sure I'm following completely. >>>> >>>> Is your goal to use Spark to create or access tables in HBASE? If so >>>> the link below and several packages out there support that by having a >>>> HBASE data source for Spark. There are some examples on how the Spark code >>>> look like in that link as well. On that note, you should also be able to >>>> use the HBASE data source from pure SQL (Spark SQL) query as well, which >>>> should work in the case with the Spark SQL JDBC Thrift Server (with USING, >>>> http://spark.apache.org/docs/latest/sql-programming-gu >>>> ide.html#tab_sql_10). >>>> >>>> >>>> _____________________________ >>>> From: Benjamin Kim <bbuil...@gmail.com> >>>> Sent: Saturday, October 8, 2016 10:40 AM >>>> Subject: Re: Spark SQL Thriftserver with HBase >>>> To: Felix Cheung <felixcheun...@hotmail.com> >>>> Cc: <user@spark.apache.org> >>>> >>>> >>>> Felix, >>>> >>>> The only alternative way is to create a stored procedure (udf) in >>>> database terms that would run Spark scala code underneath. In this way, I >>>> can use Spark SQL JDBC Thriftserver to execute it using SQL code passing >>>> the key, values I want to UPSERT. I wonder if this is possible since I >>>> cannot CREATE a wrapper table on top of a HBase table in Spark SQL? >>>> >>>> What do you think? Is this the right approach? >>>> >>>> Thanks, >>>> Ben >>>> >>>> On Oct 8, 2016, at 10:33 AM, Felix Cheung <felixcheun...@hotmail.com> >>>> wrote: >>>> >>>> HBase has released support for Spark >>>> hbase.apache.org/book.html#spark >>>> >>>> And if you search you should find several alternative approaches. >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Oct 7, 2016 at 7:56 AM -0700, "Benjamin Kim" < >>>> bbuil...@gmail.com> wrote: >>>> >>>> Does anyone know if Spark can work with HBase tables using Spark SQL? I >>>> know in Hive we are able to create tables on top of an underlying HBase >>>> table that can be accessed using MapReduce jobs. Can the same be done using >>>> HiveContext or SQLContext? We are trying to setup a way to GET and POST >>>> data to and from the HBase table using the Spark SQL JDBC thriftserver from >>>> our RESTful API endpoints and/or HTTP web farms. If we can get this to >>>> work, then we can load balance the thriftservers. In addition, this will >>>> benefit us in giving us a way to abstract the data storage layer away from >>>> the presentation layer code. There is a chance that we will swap out the >>>> data storage technology in the future. We are currently experimenting with >>>> Kudu. >>>> >>>> Thanks, >>>> Ben >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >> >> > >