Hi william Exactly! your understanding is pretty correct.
And currently community is developing sort_columns feature, user can specify columns to make MDK. the PR number is 635. Invite all of you to review this pr code. Regards Liang 2017-03-26 9:15 GMT+05:30 william <allwefant...@gmail.com>: > 1. Dictionary encoding make column storage more efficient with small size > and improved search performance。 > 2. when search,MDK/Min-Max can be used to do block/blocklet prunning in > oder to reduce IO. For now ,MDK is composed by dimensions with the oder of > declared in create table statement > > On Thu, Mar 23, 2017 at 11:51 PM, Liang Chen <chenliang6...@gmail.com> > wrote: > > > Hi > > > > 1.System makes MDK index for dimensions(string columns as dimensions, > > numeric > > columns as measures) , so you have to specify at least one > dimension(string > > column) for building MDK index. > > > > 2.You can set numeric column with DICTIONARY_INCLUDE or > DICTIONARY_EXCLUDE > > to > > build MDK index. > > For case2, you can change script like : > > carbon.sql("create table if not exists test(a integer, b integer, c > > integer) > > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > > > > Regards > > Liang > > > > 2017-03-23 18:39 GMT+05:30 Jin Zhou <eman1...@163.com>: > > > > > Exception info: > > > scala> carbon.sql("create table if not exists test(a integer, b > integer, > > c > > > integer) STORED BY 'carbondata'"); > > > org.apache.carbondata.spark.exception.MalformedCarbonCommandException: > > > Table > > > default.test can not be created without key columns. Please use > > > DICTIONARY_INCLUDE or DICTIONARY_EXCLUDE to set at least one key column > > if > > > all specified columns are numeric types > > > at > > > org.apache.spark.sql.catalyst.CarbonDDLSqlParser.prepareTableModel( > > > CarbonDDLSqlParser.scala:240) > > > at > > > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > > > CarbonSparkSqlParser.scala:162) > > > at > > > org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable( > > > CarbonSparkSqlParser.scala:60) > > > at > > > org.apache.spark.sql.catalyst.parser.SqlBaseParser$ > > > CreateTableContext.accept(SqlBaseParser.java:503) > > > at > > > org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit( > > > AbstractParseTreeVisitor.java:42) > > > at > > > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > > > visitSingleStatement$1.apply(AstBuilder.scala:66) > > > at > > > org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$ > > > visitSingleStatement$1.apply(AstBuilder.scala:66) > > > at > > > org.apache.spark.sql.catalyst.parser.ParserUtils$. > > > withOrigin(ParserUtils.scala:93) > > > at > > > org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement( > > > AstBuilder.scala:65) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > > > anonfun$parsePlan$1.apply(ParseDriver.scala:54) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$ > > > anonfun$parsePlan$1.apply(ParseDriver.scala:53) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > > > parse(ParseDriver.scala:82) > > > at > > > org.apache.spark.sql.parser.CarbonSparkSqlParser.parse( > > > CarbonSparkSqlParser.scala:56) > > > at > > > org.apache.spark.sql.catalyst.parser.AbstractSqlParser. > > > parsePlan(ParseDriver.scala:53) > > > at > > > org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan( > > > CarbonSparkSqlParser.scala:46) > > > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) > > > ... 50 elided > > > > > > I didn't notice “if all specified columns are numeric types” in > exception > > > info. So I did more tests and found the issue only occurs when all > > columns > > > are numeric types. > > > > > > Below are cases I tested: > > > case 1: > > > carbon.sql("create table if not exists test(a string, b string, c > string) > > > STORED BY 'carbondata' 'DICTIONARY_EXCLUDE'='a,b,c' "); > > > ====> ok, no dictionary column > > > > > > case 2: > > > carbon.sql("create table if not exists test(a integer, b integer, c > > > integer) > > > STORED BY 'carbondata'"); > > > ====> fail > > > > > > case 3: > > > carbon.sql("create tale if not exists test(a integer, b integer, c > > integer) > > > STORED BY 'carbondata' TBLPROPERTIES ('DICTIONARY_INCLUDE'='a')"); > > > ====> ok, at least one dictionary column > > > > > > One little problem about case 2 is that there are no proper dictionary > > > column when all columns have high cardinality. > > > > > > > > > > > > > > > -- > > > View this message in context: http://apache-carbondata- > > > mailing-list-archive.1130556.n5.nabble.com/Questions-about- > > > dictionary-encoded-column-and-MDK-tp9457p9484.html > > > Sent from the Apache CarbonData Mailing List archive mailing list > archive > > > at Nabble.com. > > > > > > > > > > > -- > > Regards > > Liang > > > > > > -- > Best Regards > _______________________________________________________________ > 开阔视野 专注开发 > WilliamZhu 祝海林 zh...@csdn.net > 产品事业部-基础平台-搜索&数据挖掘 > 手机:18601315052 > MSN:zhuhailin...@hotmail.com > 微博:@PrinceCharmingJ http://weibo.com/PrinceCharmingJ > 地址:北京市朝阳区广顺北大街33号院1号楼福码大厦B座12层 > _______________________________________________________________ > http://www.csdn.net You're the One > 全球最大中文IT技术社区 一切由你开始 > > http://www.iteye.net > 程序员深度交流社区 > -- Regards Liang