Hi ,all
  The first step in cube merge, an error :

   java.lang.RuntimeException: Too big dictionary, dictionary cannot be bigger 
than 2GB
       at 
org.apache.kylin.dict.TrieDictionaryBuilder.buildTrieBytes(TrieDictionaryBuilder.java:421)
       at 
org.apache.kylin.dict.TrieDictionaryBuilder.build(TrieDictionaryBuilder.java:408)
       at 
org.apache.kylin.dict.DictionaryGenerator$StringDictBuilder.build(DictionaryGenerator.java:165)
       at 
org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:81)
       at 
org.apache.kylin.dict.DictionaryGenerator.buildDictionary(DictionaryGenerator.java:73)
       at 
org.apache.kylin.dict.DictionaryGenerator.mergeDictionaries(DictionaryGenerator.java:102)
       at 
org.apache.kylin.dict.DictionaryManager.mergeDictionary(DictionaryManager.java:268)
       at 
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.mergeDictionaries(MergeDictionaryStep.java:145)
       at 
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.makeDictForNewSegment(MergeDictionaryStep.java:135)
       at 
org.apache.kylin.engine.mr.steps.MergeDictionaryStep.doWork(MergeDictionaryStep.java:67)
       at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
       at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57)
       at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:113)
       at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:136)
       at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
       at java.lang.Thread.run(Thread.java:745)


     “SALE_ORD_ID”  Cardinality :157644463
     SALE    COUNT_DISTINCT      Value:SALE_ORD_ID, Type:column   bitmap

I'm wondering that the high base fields can't do count_distinct accurate 
statistical metrics ??



Reply via email to