[ https://issues.apache.org/jira/browse/KYLIN-1834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Calaba updated KYLIN-1834: ---------------------------------- Description: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 The code which generates the exception is: org.apache.kylin.dimension.Dictionary.java: /** * A lower level API, return ID integer from raw value bytes. In case of not found * <p> * - if roundingFlag=0, throw IllegalArgumentException; <br> * - if roundingFlag<0, the closest smaller ID integer if exist; <br> * - if roundingFlag>0, the closest bigger ID integer if exist. <br> * <p> * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value). * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } } ========================================================== The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 mio rows. I have increased the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)). ========================================================== Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected Encoding from "dict" to "int; length=8" .... ========================================================== Those 2 high cardinality fields (one from fact table and one from the big dimension (see above) we need to use in distinc_count measure for our calculations. I wonder if this is somewhat related ??? ========================================================== I am looking for any clues to debug the cause of this error and way how to circumwent this ... was: Getting exception in Step 4 - Build Dimension Dictionary: java.lang.IllegalArgumentException: Value not exists! at org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) at org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) at org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) at org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) at org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) at org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) result code:2 The code which generates the exception is: org.apache.kylin.dimension.Dictionary.java: /** * A lower level API, return ID integer from raw value bytes. In case of not found * <p> * - if roundingFlag=0, throw IllegalArgumentException; <br> * - if roundingFlag<0, the closest smaller ID integer if exist; <br> * - if roundingFlag>0, the closest bigger ID integer if exist. <br> * <p> * Bypassing the cache layer, this could be significantly slower than getIdFromValue(T value). * * @throws IllegalArgumentException * if value is not found in dictionary and rounding is off; * or if rounding cannot find a smaller or bigger ID */ final public int getIdFromValueBytes(byte[] value, int offset, int len, int roundingFlag) throws IllegalArgumentException { if (isNullByteForm(value, offset, len)) return nullId(); else { int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag); if (id < 0) throw new IllegalArgumentException("Value not exists!"); return id; } } ========================================================== The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 mio entries. I have increases the JVM -Xmx to 16gb and set the kylin.table.snapshot.max_mb=20148 in kylin.properties to make sure the Cube build doesn't fail (previously we were getting exception complaining about the 300MB limit for Dimension dictionary size (req. approx 700MB)). ========================================================== Before that we were getting exception complaining about the Dictionary encoding problem - "Too high cardinality is not suitable for dictionary -- cardinality: 10873977" - this we resolved by changing the affected Encoding from "dict" to "int; length=8" .... ========================================================== Those 2 high cardinality fields (one from fact table and one from the big dimension (see above) we need to use in distinc_count measure for our calculations. I wonder if this is somewhat related ??? ========================================================== I am looking for any clues to debug the cause of this error and way how to circumwent this ... > java.lang.IllegalArgumentException: Value not exists! - in Step 4 - Build > Dimension Dictionary > ---------------------------------------------------------------------------------------------- > > Key: KYLIN-1834 > URL: https://issues.apache.org/jira/browse/KYLIN-1834 > Project: Kylin > Issue Type: Bug > Affects Versions: v1.5.2, v1.5.2.1 > Reporter: Richard Calaba > Priority: Critical > > Getting exception in Step 4 - Build Dimension Dictionary: > java.lang.IllegalArgumentException: Value not exists! > at > org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160) > at > org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96) > at > org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76) > at > org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96) > at > org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106) > at > org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59) > at > org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42) > at > org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50) > at > org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114) > at > org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > result code:2 > The code which generates the exception is: > org.apache.kylin.dimension.Dictionary.java: > /** > * A lower level API, return ID integer from raw value bytes. In case of > not found > * <p> > * - if roundingFlag=0, throw IllegalArgumentException; <br> > * - if roundingFlag<0, the closest smaller ID integer if exist; <br> > * - if roundingFlag>0, the closest bigger ID integer if exist. <br> > * <p> > * Bypassing the cache layer, this could be significantly slower than > getIdFromValue(T value). > * > * @throws IllegalArgumentException > * if value is not found in dictionary and rounding is off; > * or if rounding cannot find a smaller or bigger ID > */ > final public int getIdFromValueBytes(byte[] value, int offset, int len, > int roundingFlag) throws IllegalArgumentException { > if (isNullByteForm(value, offset, len)) > return nullId(); > else { > int id = getIdFromValueBytesImpl(value, offset, len, > roundingFlag); > if (id < 0) > throw new IllegalArgumentException("Value not exists!"); > return id; > } > } > ========================================================== > The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15 > mio rows. I have increased the JVM -Xmx to 16gb and set the > kylin.table.snapshot.max_mb=2048 in kylin.properties to make sure the Cube > build doesn't fail (previously we were getting exception complaining about > the 300MB limit for Dimension dictionary size (req. approx 700MB)). > ========================================================== > Before that we were getting exception complaining about the Dictionary > encoding problem - "Too high cardinality is not suitable for dictionary -- > cardinality: 10873977" - this we resolved by changing the affected Encoding > from "dict" to "int; length=8" .... > ========================================================== > Those 2 high cardinality fields (one from fact table and one from the big > dimension (see above) we need to use in distinc_count measure for our > calculations. I wonder if this is somewhat related ??? > ========================================================== > I am looking for any clues to debug the cause of this error and way how to > circumwent this ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)