BTW, a tool that I have been using to help do the preaggregation of data
using hyperloglog in combination with Spark is atscale (http://atscale.com/).
It builds the aggregations and makes use of the speed of SparkSQL - all
within the context of a model that is accessible by Tableau or Qlik.

On Thu, Mar 26, 2015 at 8:55 AM Jörn Franke <jornfra...@gmail.com> wrote:

> As I wrote previously - indexing is not your only choice, you can
> preaggregate data during load or depending on your needs you  need to think
> about other data structures, such as graphs, hyperloglog, bloom filters
> etc. (challenge to integrate in standard bi tools)
> Le 26 mars 2015 13:34, "kundan kumar" <iitr.kun...@gmail.com> a écrit :
>
> I was looking for some options and came across JethroData.
>>
>> http://www.jethrodata.com/
>>
>> This stores the data maintaining indexes over all the columns seems good
>> and claims to have better performance than Impala.
>>
>> Earlier I had tried Apache Phoenix because of its secondary indexing
>> feature. But the major challenge I faced there was, secondary indexing was
>> not supported for bulk loading process.
>> Only the sequential loading process supported the secondary indexes,
>> which took longer time.
>>
>>
>> Any comments on this ?
>>
>>
>>
>>
>> On Thu, Mar 26, 2015 at 5:59 PM, kundan kumar <iitr.kun...@gmail.com>
>> wrote:
>>
>>> I looking for some options and came across
>>>
>>> http://www.jethrodata.com/
>>>
>>> On Thu, Mar 26, 2015 at 5:47 PM, Jörn Franke <jornfra...@gmail.com>
>>> wrote:
>>>
>>>> You can also preaggregate results for the queries by the user -
>>>> depending on what queries they use this might be necessary for any
>>>> underlying technology
>>>> Le 26 mars 2015 11:27, "kundan kumar" <iitr.kun...@gmail.com> a écrit :
>>>>
>>>> Hi,
>>>>>
>>>>> I need to store terabytes of data which will be used for BI tools like
>>>>> qlikview.
>>>>>
>>>>> The queries can be on the basis of filter on any column.
>>>>>
>>>>> Currently, we are using redshift for this purpose.
>>>>>
>>>>> I am trying to explore things other than the redshift .
>>>>>
>>>>> Is it possible to gain better performance in spark as compared to
>>>>> redshift ?
>>>>>
>>>>> If yes, please suggest what is the best way to achieve this.
>>>>>
>>>>>
>>>>> Thanks!!
>>>>> Kundan
>>>>>
>>>>
>>>
>>

Reply via email to