[ 
https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1834:
----------------------------------
    Description: 
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
        at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
        at 
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
        at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
        at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
        at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
        at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
        at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
        at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
        at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

result code:2


The code which generates the exception is:

org.apache.kylin.dimension.Dictionary.java:

 /**
     * A lower level API, return ID integer from raw value bytes. In case of 
not found 
     * <p>
     * - if roundingFlag=0, throw IllegalArgumentException; <br>
     * - if roundingFlag<0, the closest smaller ID integer if exist; <br>
     * - if roundingFlag>0, the closest bigger ID integer if exist. <br>
     * <p>
     * Bypassing the cache layer, this could be significantly slower than 
getIdFromValue(T value).
     * 
     * @throws IllegalArgumentException
     *             if value is not found in dictionary and rounding is off;
     *             or if rounding cannot find a smaller or bigger ID
     */
    final public int getIdFromValueBytes(byte[] value, int offset, int len, int 
roundingFlag) throws IllegalArgumentException {
        if (isNullByteForm(value, offset, len))
            return nullId();
        else {
            int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
            if (id < 0)
                throw new IllegalArgumentException("Value not exists!");
            return id;
        }
    } 

==========================================================

The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 
mio rows. I have increased the JVM -Xmx to 16gb and set the 
kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
build doesn't fail (previously we were getting exception complaining about the 
300MB limit for Dimension dictionary size (req. approx 700MB)).

==========================================================

Before that we were getting exception complaining about the Dictionary encoding 
problem - "Too high cardinality is not suitable for dictionary -- cardinality: 
10873977" - this we resolved by changing the affected Encoding from "dict" to 
"int; length=8" .... 

==========================================================

Those 2 high cardinality fields (one from fact table and one from the big 
dimension (see above) we need to use in distinc_count measure for our 
calculations. I wonder if this is somewhat related ??? 

==========================================================

I am looking for any clues to debug the cause of this error and way how to 
circumwent this ... 

  was:
Getting exception in Step 4 - Build Dimension Dictionary:

java.lang.IllegalArgumentException: Value not exists!
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
        at 
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
        at 
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
        at 
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
        at 
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
        at 
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
        at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
        at 
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
        at 
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at 
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
        at 
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
        at 
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
        at 
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

result code:2


The code which generates the exception is:

org.apache.kylin.dimension.Dictionary.java:

 /**
     * A lower level API, return ID integer from raw value bytes. In case of 
not found 
     * <p>
     * - if roundingFlag=0, throw IllegalArgumentException; <br>
     * - if roundingFlag<0, the closest smaller ID integer if exist; <br>
     * - if roundingFlag>0, the closest bigger ID integer if exist. <br>
     * <p>
     * Bypassing the cache layer, this could be significantly slower than 
getIdFromValue(T value).
     * 
     * @throws IllegalArgumentException
     *             if value is not found in dictionary and rounding is off;
     *             or if rounding cannot find a smaller or bigger ID
     */
    final public int getIdFromValueBytes(byte[] value, int offset, int len, int 
roundingFlag) throws IllegalArgumentException {
        if (isNullByteForm(value, offset, len))
            return nullId();
        else {
            int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
            if (id < 0)
                throw new IllegalArgumentException("Value not exists!");
            return id;
        }
    } 

==========================================================

The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 
mio entries. I have increases the JVM -Xmx to 16gb and set the 
kylin.table.snapshot.max_mb=20148 in kylin.properties to make sure the Cube 
build doesn't fail (previously we were getting exception complaining about the 
300MB limit for Dimension dictionary size (req. approx 700MB)).

==========================================================

Before that we were getting exception complaining about the Dictionary encoding 
problem - "Too high cardinality is not suitable for dictionary -- cardinality: 
10873977" - this we resolved by changing the affected Encoding from "dict" to 
"int; length=8" .... 

==========================================================

Those 2 high cardinality fields (one from fact table and one from the big 
dimension (see above) we need to use in distinc_count measure for our 
calculations. I wonder if this is somewhat related ??? 

==========================================================

I am looking for any clues to debug the cause of this error and way how to 
circumwent this ... 


> java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build 
> Dimension Dictionary
> ----------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-1834
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1834
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v1.5.2, v1.5.2.1
>            Reporter: Richard Calaba
>            Priority: Critical
>
> Getting exception in Step 4 - Build Dimension Dictionary:
> java.lang.IllegalArgumentException: Value not exists!
>       at 
> org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
>       at 
> org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
>       at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
>       at 
> org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
>       at 
> org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
>       at 
> org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
>       at 
> org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
>       at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
>       at 
> org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
>       at 
> org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>       at 
> org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
>       at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>       at 
> org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
>       at 
> org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
>       at 
> org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> result code:2
> The code which generates the exception is:
> org.apache.kylin.dimension.Dictionary.java:
>  /**
>      * A lower level API, return ID integer from raw value bytes. In case of 
> not found 
>      * <p>
>      * - if roundingFlag=0, throw IllegalArgumentException; <br>
>      * - if roundingFlag<0, the closest smaller ID integer if exist; <br>
>      * - if roundingFlag>0, the closest bigger ID integer if exist. <br>
>      * <p>
>      * Bypassing the cache layer, this could be significantly slower than 
> getIdFromValue(T value).
>      * 
>      * @throws IllegalArgumentException
>      *             if value is not found in dictionary and rounding is off;
>      *             or if rounding cannot find a smaller or bigger ID
>      */
>     final public int getIdFromValueBytes(byte[] value, int offset, int len, 
> int roundingFlag) throws IllegalArgumentException {
>         if (isNullByteForm(value, offset, len))
>             return nullId();
>         else {
>             int id = getIdFromValueBytesImpl(value, offset, len, 
> roundingFlag);
>             if (id < 0)
>                 throw new IllegalArgumentException("Value not exists!");
>             return id;
>         }
>     } 
> ==========================================================
> The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 
> mio rows. I have increased the JVM -Xmx to 16gb and set the 
> kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube 
> build doesn't fail (previously we were getting exception complaining about 
> the 300MB limit for Dimension dictionary size (req. approx 700MB)).
> ==========================================================
> Before that we were getting exception complaining about the Dictionary 
> encoding problem - "Too high cardinality is not suitable for dictionary -- 
> cardinality: 10873977" - this we resolved by changing the affected Encoding 
> from "dict" to "int; length=8" .... 
> ==========================================================
> Those 2 high cardinality fields (one from fact table and one from the big 
> dimension (see above) we need to use in distinc_count measure for our 
> calculations. I wonder if this is somewhat related ??? 
> ==========================================================
> I am looking for any clues to debug the cause of this error and way how to 
> circumwent this ... 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to