Cool. It extends Kylin scenario into Real Time Query. With Warm regards
Billy Liu ShaoFeng Shi <[email protected]> 于2018年11月2日周五 下午7:17写道: > > Hi Gang, I appreciate your hard work! > > Ma Gang <[email protected]> 于2018年11月1日周四 下午3:29写道: > > > Hi ShaoFeng, > > For streaming ingest/query performance, there is a doc: > > https://drive.google.com/file/d/1GSBMpRuVQRmr8Ev2BWvssfMd-Rck9vsH/view?ths=true > > , it is also in the design doc's 'performance' section attached in the > > jira: https://issues.apache.org/jira/browse/KYLIN-3654 > > For stability, it is very stable in our environment, but currently it is > > not widely used in eBay, so it is hard to say. > > I will start to merge code to master branch, it may take some time because > > our current version is Kylin 2.1.0, hope it can be done before Nov.30, but > > I cannot guarantee it, there is lots of other works to do. > > > > At 2018-11-01 15:08:12, "ShaoFeng Shi" <[email protected]> wrote: > > >Hi Gang, > > > > > >Thank you for the information, that is helpful for understanding the > > >overall design and implementation. > > > > > >Do you have some statistical information, like performance, throughput, > > >stability, etc.? Besides, what's the plan of contributing it to the > > >community? Thanks! > > > > > > > > >Ma Gang <[email protected]> 于2018年11月1日周四 下午2:45写道: > > > > > >> Thanks Xiaoxiang, > > >> Very good questions! Please see my comments started with [Gang]: > > >> > > >> > > >> 1. Is it possible to use Yarn as cluster manager for index task. > > >> Coordinator process will set up them at specificed period. > > >> [Gang] I think it is possible, but in current design, the indexing task > > >> is designed as long running task, it also can provide query service, > > this > > >> makes the whole system very simple and efficiency, I don't think we > > need to > > >> stop/start indexing task time by time. But use yarn to manage the > > resource > > >> is possible, we need to redesign the existing coordinator, to make it > > easy > > >> to deploy to Yarn, Kubernetes, etc. Hope this can be done after > > >> contribution to community. > > >> > > >> 2. As I know, ebay’s New Kylin Streaming Solution use replica Set > > to > > >> ensure that income messages wouldn’t lost if some processes lost. I > > think > > >> replica set is a set of kafka cosumer processes which is responsible for > > >> ingest message and build base cuboid in memory. Could you please show me > > >> some detail about how replica Set provide HA guarantee? How to configure > > >> it? A link / paper is OK. I found one but I don’t know if it same > > meaning > > >> for your replica Set. > > >> > > >> > > >> [Gang] Yes, it is similar as the MongoDB replication, but currently we > > >> don't replicate data from Primary node, just assign the same Kafka > > >> topic/partitions to the receivers in a ReplicaSet, all receivers in a > > >> ReplicaSet will consume data from Kafka, so if one receiver is down, > > other > > >> receivers in the ReplicaSet are still consuming the same Kafka data, so > > the > > >> consume/query will not be impact. And We don't guarantee that the > > receivers > > >> in a ReplicaSet have the same consuming rate, but we can guarantee that > > the > > >> user can view data consistently by stick to the query to one receiver > > for > > >> one cube. > > >> The HA implementation is a little bit naive, but simple and worked. > > Maybe > > >> in the future, we can do HA by replication to support other streaming > > >> sources that don't support multiple consumers and don't have persistent > > >> store. > > >> > > >> 3. How to add or remove node of replica Set in production env? How > > to > > >> monitor the health/pressure of replica Set cluster ? > > >> [Gang] Currently we have UI/restful api to let admin to add/remove node > > >> to/from a ReplicaSet, and have a simple ui to let admin monitor the > > health, > > >> consuming rate for each receiver/cube. Also all metrics are collected > > using > > >> yammer metrics framework, it is easy to exposed to other monitor system. > > >> > > >> 4. Does all measure are supported in ebay’s New Kylin Streaming > > >> Solution? What about count distinct(bitmap)? > > >> [Gang] Most measures are supported, but precise count distinct(bitmap) > > is > > >> not support in case that the distinct dimension is not int type. As you > > >> know, to support precise count distinct for not-int type dimension, it > > >> needs to build global dictionary, it is not possible in the streaming > > env. > > >> > > >> > > >> 5. It seems ebay’s New Kylin Streaming Solution use a custom > > columnar > > >> storage, why not use a open source mature columnar storage solution ? > > Have > > >> your ever compare the performance of your custom columnar storage to > > open > > >> source columnar storage solution ? > > >> > > >> [Gang] Most open source columnar format like Parquet, ORC are designed > > to > > >> use in Hadoop env, the streaming data are in local disk, so I didn't > > >> consider them at the beginning. It is not very hard to define columnar > > >> format to store Kylin specific data, use a customize columnar storage, > > you > > >> can use mmap file to scan data, add row-level invert index for all > > >> dimensions, so I think the performance will be better compared to using > > >> common columnar format. I didn't compare the performance, but the > > storage > > >> engine is pluggable, you may contribute a parquet storage if you are > > >> interesting. > > >> > > >> > > >> > > >> > > >> > > >> > > >> At 2018-11-01 12:42:25, "Xiaoxiang Yu" <[email protected]> > > wrote: > > >> >Hi gang, I am so glad to know that eBay has a solution for realtime > > olap > > >> on kylin. I have some small question: > > >> > > > >> > > > >> >1. Is it possible to use Yarn as cluster manager for index task. > > >> Coordinator process will set up them at specificed period. Yarn will > > manage > > >> : > > >> > > > >> >a) retry these task if some failed > > >> > > > >> >b) resource allocation > > >> > > > >> >c) log collection > > >> > > > >> >2. As I know, ebay’s New Kylin Streaming Solution use replica Set > > to > > >> ensure that income messages wouldn’t lost if some processes lost. I > > think > > >> replica set is a set of kafka cosumer processes which is responsible for > > >> ingest message and build base cuboid in memory. Could you please show me > > >> some detail about how replica Set provide HA guarantee? How to configure > > >> it? A link / paper is OK. I found one but I don’t know if it same > > meaning > > >> for your replica Set. > > >> > > > >> >a) [Mongodb replication]( > > >> https://docs.mongodb.com/manual/replication/). > > >> > > > >> >3. How to add or remove node of replica Set in production env? How > > >> to monitor the health/pressure of replica Set cluster ? > > >> > > > >> >4. Does all measure are supported in ebay’s New Kylin Streaming > > >> Solution? What about count distinct(bitmap)? > > >> > > > >> >5. It seems ebay’s New Kylin Streaming Solution use a custom > > >> columnar storage, why not use a open source mature columnar storage > > >> solution ? Have your ever compare the performance of your custom > > columnar > > >> storage to open source columnar storage solution ? > > >> > > > >> > > > >> > > > >> >---------------- > > >> >Best wishes, > > >> >Xiaoxiang Yu > > >> > > > >> > > > >> >发件人: Ma Gang <[email protected]> > > >> >答复: "[email protected]" <[email protected]> > > >> >日期: 2018年10月30日 星期二 15:24 > > >> >收件人: "[email protected]" <[email protected]> > > >> >主题: [DISCUSS] New Kylin Streaming Solution From eBay > > >> > > > >> >Hi all, > > >> > > > >> >eBay Kylin team has developed a new Kylin streaming solution, the basic > > >> idea is to build a streaming cluster to ingest data from streaming > > >> source(Kafka), and provide query for real-time data, the data > > preparation > > >> latency is milliseconds, which means the data is queryable almost when > > it > > >> is ingested, attach is the architecture design doc. > > >> >We would like to contribute the feature to community, please let us > > know > > >> if you have any concern. > > >> > > > >> >Thanks, > > >> >Gang(Allen) Ma > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > > > >-- > > >Best regards, > > > > > >Shaofeng Shi 史少锋 > > > > > -- > Best regards, > > Shaofeng Shi 史少锋
