Interesting. Is there any other english documentation about it's purpose and architecture?
On Mon, Aug 12, 2013 at 2:25 AM, 子落 <[email protected]> wrote: > it`s address is https://github.com/alibaba/mdrill ,i think some of the > information or desion maybe help full for apache drill dev. > > > > Which is like apache drill or google power drill, it is base on > hadoop,lucene,solr,jstorm > > > > Now in my project , has 10 tables, 47760506482 rows ,80~400columns. (run on > 10 mathines, permachine ram:48GB,12*2TB disk) > > > > Some of the search example.,like bellows: > > > > select count(*) from r_rpt_cps_luna_item where thedate >='20130416' and > thedate <'20130811' limit 0,100 > > _____ > > totalRecords:1 > > > count(*) > > > 11108914892 > > times taken 4.031 seconds > > > > > > select sum(landing_uv) from r_rpt_cps_luna_item where thedate >='20130416' > and thedate <'20130811' limit 0,100 > > _____ > > totalRecords:1 > > > sum(landing_uv) > > > 2.07678497E8 > > times taken 56.081 seconds > > > > select dist(user_id) from r_rpt_cps_luna_item where thedate >='20130416' > and > thedate <'20130811' limit 0,100 > > _____ > > totalRecords:1 > > > dist(user_id) > > > 1483008.0 > > times taken 246.147 seconds > > > > select thedate,count(*) as cnt from r_rpt_cps_luna_item where thedate > >='20130416' and thedate <'20130811' group by thedate order by cnt desc > limit 0,3 > > _____ > > totalRecords:118 > > > thedate > > cnt > > > 20130803 > > 158301304 > > > 20130802 > > 157748487 > > > 20130725 > > 157047045 > > times taken 34.727 seconds > > > > select thedate,user_id,count(*) as cnt from r_rpt_cps_luna_item where > thedate >='20130416' and thedate <'20130811' group by thedate,user_id > order > by cnt desc limit 0,3 > > _____ > > totalRecords:10010 > > > thedate > > user_id > > cnt > > > 20130725 > > 725677994 > > 194397 > > > 20130725 > > 101450072 > > 192650 > > > 20130701 > > 101450072 > > 189107 > > times taken 149.316 seconds > > > > select thedate,category_level1,count(*) as cnt from r_rpt_cps_luna_item > where thedate >='20130416' and thedate <'20130811' group by > thedate,category_level1 order by cnt desc limit 0,3 > > _____ > > totalRecords:10010 > > > thedate > > category_level1 > > cnt > > > 20130803 > > 16 > > 26487658 > > > 20130802 > > 16 > > 26306163 > > > 20130725 > > 16 > > 26128576 > > times taken 94.989 seconds > > > > select thedate,category_level1,category_level2,count(*) as cnt from > r_rpt_cps_luna_item where thedate >='20130416' and thedate <'20130811' > group by thedate,category_level1,category_level2 order by cnt desc limit > 0,3 > > _____ > > totalRecords:10010 > > > thedate > > category_level1 > > category_level2 > > cnt > > > 20130725 > > 16 > > 50010850 > > 7315606 > > > 20130803 > > 16 > > 50010850 > > 7006255 > > > 20130802 > > 16 > > 50010850 > > 6936059 > > times taken 288.885 seconds > > > > > > chinese introduce > 1:mdrill旨在帮助用户在几秒到几十秒的时间内,分析百亿级别的任意维度组合的数 > 据。 > 2:mdrill是一个分布式的在线分析查询系统,基于hadoop,lucene,solr,jstorm等开源 > 系统作为实现,基于SQL的查询语法。 mdrill是一个能够对大量数据进行分布式处理的 > 软件框架。mdrill是快速的高性能的,他的底层因使用了索引、列式存储、以及内存 > cache等技 术,使得数据扫描的速度大为增加。mdrill是分布式的,它以并行的方式工 > 作,通过并行处理加快处理速度。 > 3:基于mdrill应用的adhoc项目,使用了10台机器,存储了400亿的数据 > ==>每次扫描30亿的行数,响应时间在20秒~120秒左右(取决不同的查询条件与扫描的 > 列数)。 > ==>对100亿数据进行count(*),耗时为2秒,单列sum耗时在25秒,按照日期分组求 > count和sum耗时47秒,按照用户id分组并且按照成交笔数排序去TopN 耗时 243秒。 > >
