Hi,
I need to store terabytes of data which will be used for BI tools like
qlikview.
The queries can be on the basis of filter on any column.
Currently, we are using redshift for this purpose.
I am trying to explore things other than the redshift .
Is it possible to gain better performance in
Yes, you can easily configure Spark Thrift server and connect BI Tools.
Here's an example
https://hadoopi.wordpress.com/2014/12/31/spark-connect-tableau-desktop-to-sparksql/
showing how to integrate SparkSQL with Tableau dashboards.
Thanks
Best Regards
On Thu, Mar 26, 2015 at 3:56 PM, kundan
You can also preaggregate results for the queries by the user - depending
on what queries they use this might be necessary for any underlying
technology
Le 26 mars 2015 11:27, kundan kumar iitr.kun...@gmail.com a écrit :
Hi,
I need to store terabytes of data which will be used for BI tools
I looking for some options and came across
http://www.jethrodata.com/
On Thu, Mar 26, 2015 at 5:47 PM, Jörn Franke jornfra...@gmail.com wrote:
You can also preaggregate results for the queries by the user - depending
on what queries they use this might be necessary for any underlying
I was looking for some options and came across JethroData.
http://www.jethrodata.com/
This stores the data maintaining indexes over all the columns seems good
and claims to have better performance than Impala.
Earlier I had tried Apache Phoenix because of its secondary indexing
feature. But the
BTW, a tool that I have been using to help do the preaggregation of data
using hyperloglog in combination with Spark is atscale (http://atscale.com/).
It builds the aggregations and makes use of the speed of SparkSQL - all
within the context of a model that is accessible by Tableau or Qlik.
On
As I wrote previously - indexing is not your only choice, you can
preaggregate data during load or depending on your needs you need to think
about other data structures, such as graphs, hyperloglog, bloom filters
etc. (challenge to integrate in standard bi tools)
Le 26 mars 2015 13:34, kundan