[ https://issues.apache.org/jira/browse/KYLIN-4106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894676#comment-16894676 ]
langdamao commented on KYLIN-4106: ---------------------------------- It's my pleasure :D > Illegal partition for SelfDefineSortableKey when “Extract Fact Table Distinct > Columns” > -------------------------------------------------------------------------------------- > > Key: KYLIN-4106 > URL: https://issues.apache.org/jira/browse/KYLIN-4106 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Affects Versions: v2.6.1, v2.6.2 > Reporter: langdamao > Assignee: langdamao > Priority: Critical > Labels: easyfix > Fix For: v2.6.4 > > > We got this error when Extract Fact Table Distinct Columns @kylin 2.6.1 > > {code:java} > Error: java.io.IOException: Illegal partition for > org.apache.kylin.engine.mr.steps.SelfDefineSortableKey@6b69761b (254) > at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1096) > at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:727) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.writeFieldValue(FactDistinctColumnsMapper.java: > 281) at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:186) > at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1685) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > {code} > I've found the problem in the follow code in > *FactDistinctColumnsReducerMapping.java – engine-mr* > {code:java} > public int getReducerIdForCol(int colId, Object fieldValue) { > int begin = colIdToReducerBeginId[colId]; > int span = colIdToReducerBeginId[colId + 1] - begin; > > if (span == 1) > return begin; > > int hash = fieldValue == null ? 0 : fieldValue.hashCode(); > return begin + Math.abs(hash) % span; > } > {code} > for the error rowkey it's begin=1, span=5 ,and we got hash=-2147483648 > ,meanwhile Math.abs(-2147483648) return -2147483648 ,so for the above code it > return -2 ( which was 254 while unsigned). > this will also cause problem bellow when Function getReduerIdForCol return > -1 (when begin=1,span=3,hash= -2147483648) ,because value write to rowkey > reducer is empty_text , but No. -1 reducer need value text > {code:java} > Error: java.nio.BufferUnderflowException at > java.nio.Buffer.nextGetIndex(Buffer.java:500) > at java.nio.HeapByteBuffer.get(Heap.ByteBuffer.java:135) > at org.apache.kylin.measure.hllc.HLLCounter.readRegisters(HLLCounter.java:327) > at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doReduce(FactDistinctColumnsReducer.java:145) > org.apache.kylin.engine.mr.steps.FactDistinctColumnsReducer.doReduce(FactDistinctColumnsReducer.java:60) > ...{code} > > -- This message was sent by Atlassian JIRA (v7.6.14#76016)