[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374678#comment-15374678
 ] 

wuyu commented on KYLIN-1834:
-----------------------------

I hive get the same exception when the lookup table is much(about 1000'000 
rows+), when joint field (it's lookup table id) is integer type , in  #3 Step 
Name: Build Dimension Dictionary, it always get the exception:
java.lang.IllegalArgumentException: Value not exists!
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
        at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
--------------------------------------------------------------------------------------------------------------------------
when I try change the joint field's type to string , and build again, it build 
success. is there any thing different in type integer and string when ‘Build 
Dimension Dictionary,’

by the way, my fact table is  order detail, my lookup table is deals 
(merchandise) , so the deals table's rows number is much.



> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> ----------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-1834
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1834
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v1.5.2, v1.5.2.1
>            Reporter: Richard Calaba
>            Priority: Blocker
>         Attachments: job_2016_06_28_09_59_12-value-not-found.zip
>
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>       at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>       at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>       at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>       at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>       at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>       at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>       at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>       at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>       at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>       at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>       at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>       at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>       at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>       at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>       at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>      * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>      * <p>
>      * - if roundingFlag=0, throw IllegalArgumentException; <br>
>      * - if roundingFlag<0, the closest smaller ID integer if exist; <br>
>      * - if roundingFlag>0, the closest bigger ID integer if exist. <br>
>      * <p>
>      * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>      * 
>      * @throws IllegalArgumentException
>      *             if value is not found in dictionary and rounding is off;
>      *             or if rounding cannot find a smaller or bigger ID
>      */
>     final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
>         if (isNullByteForm(value, offset, len))
>             return nullId();
>         else {
>             int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
>             if (id < 0)
>                 throw new IllegalArgumentException("Value not exists!");
>             return id;
>         }
>     } 
> ==========================================================
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 10 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==========================================================
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected 
> dimension/row key Encoding from "dict" to "int; length=8" on the Advanced 
> Settings of the Cube.
> ==========================================================
> We have 2 high-cardinality fields (one from fact table and one from the big 
> dimension (customer - see above). We need to use in distinc_count measure for 
> our calculations. I wonder if this exception Value not found! is somewhat 
> related ??? Those count_distinct measures are defined one with return type 
> "bitmap" (exact precission - only for Int columns) and 2nd with return type 
> "hllc16" (error rate <= 1.22 %)
> ==========================================================
> I am looking for any clues to debug the cause of this error and way how to 
> circumwent this ... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to