Interesting.  Is there any other english documentation about it's purpose
and architecture?


On Mon, Aug 12, 2013 at 2:25 AM, 子落 <[email protected]> wrote:

> it`s address is https://github.com/alibaba/mdrill ,i think some of the
> information or desion maybe help full for apache drill dev.
>
>
>
> Which is like apache drill or google power drill, it is base on
> hadoop,lucene,solr,jstorm
>
>
>
> Now in my project , has 10 tables, 47760506482 rows ,80~400columns. (run on
> 10 mathines, permachine ram:48GB,12*2TB disk)
>
>
>
> Some of the search example.,like bellows:
>
>
>
> select count(*) from r_rpt_cps_luna_item where thedate >='20130416' and
> thedate <'20130811' limit 0,100
>
>   _____
>
> totalRecords:1
>
>
> count(*)
>
>
> 11108914892
>
> times taken 4.031 seconds
>
>
>
>
>
> select sum(landing_uv) from r_rpt_cps_luna_item where thedate >='20130416'
> and  thedate <'20130811' limit 0,100
>
>   _____
>
> totalRecords:1
>
>
> sum(landing_uv)
>
>
> 2.07678497E8
>
> times taken 56.081 seconds
>
>
>
> select dist(user_id) from r_rpt_cps_luna_item where thedate >='20130416'
> and
> thedate <'20130811' limit 0,100
>
>   _____
>
> totalRecords:1
>
>
> dist(user_id)
>
>
> 1483008.0
>
> times taken 246.147 seconds
>
>
>
> select thedate,count(*) as cnt from r_rpt_cps_luna_item where thedate
> >='20130416' and  thedate <'20130811' group by thedate order by cnt desc
> limit 0,3
>
>   _____
>
> totalRecords:118
>
>
> thedate
>
> cnt
>
>
> 20130803
>
> 158301304
>
>
> 20130802
>
> 157748487
>
>
> 20130725
>
> 157047045
>
> times taken 34.727 seconds
>
>
>
> select thedate,user_id,count(*) as cnt from r_rpt_cps_luna_item where
> thedate >='20130416' and  thedate <'20130811' group by thedate,user_id
> order
> by cnt desc limit 0,3
>
>   _____
>
> totalRecords:10010
>
>
> thedate
>
> user_id
>
> cnt
>
>
> 20130725
>
> 725677994
>
> 194397
>
>
> 20130725
>
> 101450072
>
> 192650
>
>
> 20130701
>
> 101450072
>
> 189107
>
> times taken 149.316 seconds
>
>
>
> select thedate,category_level1,count(*) as cnt from r_rpt_cps_luna_item
> where thedate >='20130416' and  thedate <'20130811' group by
> thedate,category_level1 order by cnt desc limit 0,3
>
>   _____
>
> totalRecords:10010
>
>
> thedate
>
> category_level1
>
> cnt
>
>
> 20130803
>
> 16
>
> 26487658
>
>
> 20130802
>
> 16
>
> 26306163
>
>
> 20130725
>
> 16
>
> 26128576
>
> times taken 94.989 seconds
>
>
>
> select thedate,category_level1,category_level2,count(*) as cnt from
> r_rpt_cps_luna_item where thedate >='20130416' and  thedate <'20130811'
> group by thedate,category_level1,category_level2 order by cnt desc limit
> 0,3
>
>   _____
>
> totalRecords:10010
>
>
> thedate
>
> category_level1
>
> category_level2
>
> cnt
>
>
> 20130725
>
> 16
>
> 50010850
>
> 7315606
>
>
> 20130803
>
> 16
>
> 50010850
>
> 7006255
>
>
> 20130802
>
> 16
>
> 50010850
>
> 6936059
>
> times taken 288.885 seconds
>
>
>
>
>
> chinese introduce
> 1:mdrill旨在帮助用户在几秒到几十秒的时间内,分析百亿级别的任意维度组合的数
> 据。
> 2:mdrill是一个分布式的在线分析查询系统,基于hadoop,lucene,solr,jstorm等开源
> 系统作为实现,基于SQL的查询语法。 mdrill是一个能够对大量数据进行分布式处理的
> 软件框架。mdrill是快速的高性能的,他的底层因使用了索引、列式存储、以及内存
> cache等技 术,使得数据扫描的速度大为增加。mdrill是分布式的,它以并行的方式工
> 作,通过并行处理加快处理速度。
> 3:基于mdrill应用的adhoc项目,使用了10台机器,存储了400亿的数据
>   ==>每次扫描30亿的行数,响应时间在20秒~120秒左右(取决不同的查询条件与扫描的
> 列数)。
>   ==>对100亿数据进行count(*),耗时为2秒,单列sum耗时在25秒,按照日期分组求
> count和sum耗时47秒,按照用户id分组并且按照成交笔数排序去TopN 耗时 243秒。
>
>

Reply via email to