Richard Calaba created KYLIN-1834:
-------------------------------------
Summary: java.lang.IllegalArgumentException: Value not exists! -
in Step 4 - Build Dimension Dictionary
Key: KYLIN-1834
URL: https://issues.apache.org/jira/browse/KYLIN-1834
Project: Kylin
Issue Type: Bug
Affects Versions: v1.5.2, v1.5.2.1
Reporter: Richard Calaba
Priority: Critical
Getting exception in Step 4 - Build Dimension Dictionary:
java.lang.IllegalArgumentException: Value not exists!
at
org.apache.kylin.dimension.Dictionary.getIdFromValueBytes(Dictionary.java:160)
at
org.apache.kylin.dict.TrieDictionary.getIdFromValueImpl(TrieDictionary.java:158)
at
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:96)
at
org.apache.kylin.dimension.Dictionary.getIdFromValue(Dictionary.java:76)
at
org.apache.kylin.dict.lookup.SnapshotTable.takeSnapshot(SnapshotTable.java:96)
at
org.apache.kylin.dict.lookup.SnapshotManager.buildSnapshot(SnapshotManager.java:106)
at
org.apache.kylin.cube.CubeManager.buildSnapshotTable(CubeManager.java:215)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:59)
at
org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:42)
at
org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:56)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at
org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:60)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at
org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:50)
at
org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:114)
at
org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:124)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
result code:2
The code which generates the exception is:
org.apache.kylin.dimension.Dictionary.java:
/**
* A lower level API, return ID integer from raw value bytes. In case of
not found
* <p>
* - if roundingFlag=0, throw IllegalArgumentException; <br>
* - if roundingFlag<0, the closest smaller ID integer if exist; <br>
* - if roundingFlag>0, the closest bigger ID integer if exist. <br>
* <p>
* Bypassing the cache layer, this could be significantly slower than
getIdFromValue(T value).
*
* @throws IllegalArgumentException
* if value is not found in dictionary and rounding is off;
* or if rounding cannot find a smaller or bigger ID
*/
final public int getIdFromValueBytes(byte[] value, int offset, int len, int
roundingFlag) throws IllegalArgumentException {
if (isNullByteForm(value, offset, len))
return nullId();
else {
int id = getIdFromValueBytesImpl(value, offset, len, roundingFlag);
if (id < 0)
throw new IllegalArgumentException("Value not exists!");
return id;
}
}
==========================================================
The Cube is big - fact 110 mio rows, the largest dimension (customer) has 15
mio entries. I have increases the JVM -Xmx to 16gb and set the
kylin.table.snapshot.max_mb=20148 in kylin.properties to make sure the Cube
build doesn't fail (previously we were getting exception complaining about the
300MB limit for Dimension dictionary size (req. approx 700MB)).
==========================================================
Before that we were getting exception complaining about the Dictionary encoding
problem - "Too high cardinality is not suitable for dictionary -- cardinality:
10873977" - this we resolved by changing the affected Encoding from "dict" to
"int; length=8" ....
==========================================================
Those 2 high cardinality fields (one from fact table and one from the big
dimension (see above) we need to use in distinc_count measure for our
calculations. I wonder if this is somewhat related ???
==========================================================
I am looking for any clues to debug the cause of this error and way how to
circumwent this ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)