Re: Few Questions about Kylin Ability

Li Yang Thu, 07 Jul 2016 20:24:07 -0700

Hi Santosh

Kylin support most of your requirement with following limitations/notes.


- The streaming cubing is still in experiment stage. KYLIN-1726
<https://issues.apache.org/jira/browse/KYLIN-1726#> will improve realtime
analysis significantly but it may take a few month to complete.
- Kylin does not store raw data by default. There is RAW measure
<http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/> that can
store raw data to some extent. But it also has volume limitation.
- The query speed will largely depend on your dimension cardinality (not
the data volume) and if the cube is well defined and optimized for your
query.
- Finally the capacity of your cluster always plays an important part.

Kylin uses micro batch to build streaming cube. A small job is kicked off
every 5 minutes for example and build cube incrementally with the input of
last 5 minutes. The job currently runs one a single node which is not very
scale-able.  KYLIN-1726 will solve this problem.

Kylin support all kinds of SQL queries as long as they are within defined
data model.

Cheers
Yang

On Sat, Jul 2, 2016 at 5:55 PM, Santosh Akhilesh <[email protected]>
wrote:

> Hi All ,
> Last year I had done a PoC for one of our products using Kylin. Our
> distributed architecture journey was on hold for some time but now we are
> back again to rearchitect our system to distributed. I am writing this mail
> to understand how and whether Kylin can fit in to our requirements.
> Let me give background of our requirement.
> Ours is a network performance management solution which needs to handle
> following scenes.
>
> 1. Collect data from network elements in granularity between 30 sec to 5
> minute period. Every period we collect around 150Million KPIs Which are
> distributed across different service type. The service types are model
> driven and can change over period of time.
> 2. Data which we collect needs to available for Adhoc and OLAP type query
> ASAP. For example data collected between 10:00 and 10:05 for 5 mins period
> should be available for reports to fire query by 10:06. Query will involve
> joining performance data with inventory data and also have filters like
> query data for Area = Area1 and we also need sort by KPI or property of
> inventory with order by Clause
> 3. We also need OLAP type query like group by area , province , country
> etc... and needs to apply sum , max , min , avg aggregator. We also need to
> generate Top talkers report which means we need Top N function.
> 4. There will be background machine learning jobs which need to scan raw
> and aggregated data.
> 5. We would be generating around 5-10 TB of data every day and In future
> may be more.
> Now my questions are these. We need to retain data for several days and
> months based on aggregation period.
> 6. Adhoc and OLAP query from report should take < 2 seconds.
> So my questions are;
>
> 1. Which of the use cases Kylin can support?
> 2. How long cube building takes and how does it handle the data which will
> be appended every 30 sec or 5 minutes.
> 3. Can Kylin support both Adhoc query and OLAP query ?
>
> I have several other questions but I would like to initiate the discussion
> with these.
> We plan to start a test next week with Kylin I am just setting up a
> cluster now. We don't plan to use cloud era or Horton work sandbox as our
> company has its own sandbox.
>
> Appreciate response from Kylin experts.
>
> Regards
> Santosh
>
>
> Sent from my iPhone

Re: Few Questions about Kylin Ability

Reply via email to