Hi Santosh Kylin support most of your requirement with following limitations/notes.
- The streaming cubing is still in experiment stage. KYLIN-1726 <https://issues.apache.org/jira/browse/KYLIN-1726#> will improve realtime analysis significantly but it may take a few month to complete. - Kylin does not store raw data by default. There is RAW measure <http://kylin.apache.org/blog/2016/05/29/raw-measure-in-kylin/> that can store raw data to some extent. But it also has volume limitation. - The query speed will largely depend on your dimension cardinality (not the data volume) and if the cube is well defined and optimized for your query. - Finally the capacity of your cluster always plays an important part. Kylin uses micro batch to build streaming cube. A small job is kicked off every 5 minutes for example and build cube incrementally with the input of last 5 minutes. The job currently runs one a single node which is not very scale-able. KYLIN-1726 will solve this problem. Kylin support all kinds of SQL queries as long as they are within defined data model. Cheers Yang On Sat, Jul 2, 2016 at 5:55 PM, Santosh Akhilesh <[email protected]> wrote: > Hi All , > Last year I had done a PoC for one of our products using Kylin. Our > distributed architecture journey was on hold for some time but now we are > back again to rearchitect our system to distributed. I am writing this mail > to understand how and whether Kylin can fit in to our requirements. > Let me give background of our requirement. > Ours is a network performance management solution which needs to handle > following scenes. > > 1. Collect data from network elements in granularity between 30 sec to 5 > minute period. Every period we collect around 150Million KPIs Which are > distributed across different service type. The service types are model > driven and can change over period of time. > 2. Data which we collect needs to available for Adhoc and OLAP type query > ASAP. For example data collected between 10:00 and 10:05 for 5 mins period > should be available for reports to fire query by 10:06. Query will involve > joining performance data with inventory data and also have filters like > query data for Area = Area1 and we also need sort by KPI or property of > inventory with order by Clause > 3. We also need OLAP type query like group by area , province , country > etc... and needs to apply sum , max , min , avg aggregator. We also need to > generate Top talkers report which means we need Top N function. > 4. There will be background machine learning jobs which need to scan raw > and aggregated data. > 5. We would be generating around 5-10 TB of data every day and In future > may be more. > Now my questions are these. We need to retain data for several days and > months based on aggregation period. > 6. Adhoc and OLAP query from report should take < 2 seconds. > So my questions are; > > 1. Which of the use cases Kylin can support? > 2. How long cube building takes and how does it handle the data which will > be appended every 30 sec or 5 minutes. > 3. Can Kylin support both Adhoc query and OLAP query ? > > I have several other questions but I would like to initiate the discussion > with these. > We plan to start a test next week with Kylin I am just setting up a > cluster now. We don't plan to use cloud era or Horton work sandbox as our > company has its own sandbox. > > Appreciate response from Kylin experts. > > Regards > Santosh > > > Sent from my iPhone
