[ https://issues.apache.org/jira/browse/KYLIN-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785204#comment-16785204 ]
zhao jintao commented on KYLIN-3845: ------------------------------------ I fix this bug. Can I push my code to the master of Kylin code? > Kylin build error If the Kafka data source lacks selected dimensions or > metrics in the kylin stream build. > ---------------------------------------------------------------------------------------------------------- > > Key: KYLIN-3845 > URL: https://issues.apache.org/jira/browse/KYLIN-3845 > Project: Kylin > Issue Type: Bug > Components: Job Engine > Affects Versions: v2.5.2 > Environment: Fusion Insight > Reporter: zhao jintao > Priority: Major > Labels: easyfix > Fix For: Future > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > Hi dear team: > I'm developing OLAP Platform based on Kylin2.5.2. During my work, I build a > streaming cube from Kafka source using kafka demo. > In my streaming project, I set country、currency as dimensions and userId as > metrics. But the cube build failed in 3rd step("Extract Fact Table Distinct > Columns"). The exception is java.lang.ArrayIndexOutOfBoundsException. > This is logs: > 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: > Do cleanup, available memory: 1334m > 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: > Total rows: 127 > 2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.MapTask: > Finished spill 0 > 2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.YarnChild: > Exception running child: java.lang.ArrayIndexOutOfBoundsException:2 > 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: > Do cleanup, available memory: 1334m > at > org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:177) > at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) > at org.apache.hadoop.mapreduce.Mapper.run(MapperTask.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:187) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java;180) > > Then I find that in Kafka datasource, some streaming data lack the userId > column. Most of the streaming data(contry, currency,userId) is > ("China","CNY","843c4d");but a small amount of data lack userId, some data is > ("China","CNY"). so when run the 3rd step("Extract Fact Table Distinct > Columns"),MR engine will throw exception if the streaming data lack userId. > The I check the source of Kylin, FactDistinctColumnsMapper.java: > public void doMap(KEYIN key, Object record, Context context) throws > IOException, InterruptedException { > Collection<String[]> rowCollection = > flatTableInputFormat.parseMapperInput(record); > for (String[] row : rowCollection) { > context.getCounter(RawDataCounter.BYTES).increment(countSizeInBytes(row)); > for (int i = 0; i < allCols.size(); i++) { > String fieldValue = row[columnIndex[i]]; > if (fieldValue == null) > continue; > final DataType type = allCols.get(i).getType(); > ... > I find that columnIndex[i] is equal with the size of row if the streaming > data lack one column. So the row[columnIndex[i]] will throw the > ArrayIndexOutOfBoundsException. So I change this code, check the > columnIndex[i] and the size of row. If columnIndex[i] is equal with or larger > than the size of row, I set fieldValue empty value. And After I change my > code, the 3rd step("Extract Fact Table Distinct Columns") will run success. > Those are what I found, which will cause problem for developers. > How do you think? > Best regard > jintao -- This message was sent by Atlassian JIRA (v7.6.3#76005)