Handling Big data for interactive BI tools

2015-03-26 Thread kundan kumar
Hi, I need to store terabytes of data which will be used for BI tools like qlikview. The queries can be on the basis of filter on any column. Currently, we are using redshift for this purpose. I am trying to explore things other than the redshift . Is it possible to gain better performance in

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Akhil Das
Yes, you can easily configure Spark Thrift server and connect BI Tools. Here's an example https://hadoopi.wordpress.com/2014/12/31/spark-connect-tableau-desktop-to-sparksql/ showing how to integrate SparkSQL with Tableau dashboards. Thanks Best Regards On Thu, Mar 26, 2015 at 3:56 PM, kundan

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Jörn Franke
You can also preaggregate results for the queries by the user - depending on what queries they use this might be necessary for any underlying technology Le 26 mars 2015 11:27, kundan kumar iitr.kun...@gmail.com a écrit : Hi, I need to store terabytes of data which will be used for BI tools

Re: Handling Big data for interactive BI tools

2015-03-26 Thread kundan kumar
I looking for some options and came across http://www.jethrodata.com/ On Thu, Mar 26, 2015 at 5:47 PM, Jörn Franke jornfra...@gmail.com wrote: You can also preaggregate results for the queries by the user - depending on what queries they use this might be necessary for any underlying

Re: Handling Big data for interactive BI tools

2015-03-26 Thread kundan kumar
I was looking for some options and came across JethroData. http://www.jethrodata.com/ This stores the data maintaining indexes over all the columns seems good and claims to have better performance than Impala. Earlier I had tried Apache Phoenix because of its secondary indexing feature. But the

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Denny Lee
BTW, a tool that I have been using to help do the preaggregation of data using hyperloglog in combination with Spark is atscale (http://atscale.com/). It builds the aggregations and makes use of the speed of SparkSQL - all within the context of a model that is accessible by Tableau or Qlik. On

Re: Handling Big data for interactive BI tools

2015-03-26 Thread Jörn Franke
As I wrote previously - indexing is not your only choice, you can preaggregate data during load or depending on your needs you need to think about other data structures, such as graphs, hyperloglog, bloom filters etc. (challenge to integrate in standard bi tools) Le 26 mars 2015 13:34, kundan