[jira] [Commented] (KYLIN-3845) Kylin build error If the Kafka data source lacks selected dimensions or metrics in the kylin stream build.

zhao jintao (JIRA) Tue, 05 Mar 2019 20:07:39 -0800


    [ 
https://issues.apache.org/jira/browse/KYLIN-3845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785204#comment-16785204
 ]


zhao jintao commented on KYLIN-3845:
------------------------------------

I fix this bug. Can I push my code to the master of Kylin code?

> Kylin build error If the Kafka data source lacks selected dimensions or 
> metrics in the kylin stream build.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3845
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3845
>             Project: Kylin
>          Issue Type: Bug
>          Components: Job Engine
>    Affects Versions: v2.5.2
>         Environment: Fusion Insight
>            Reporter: zhao jintao
>            Priority: Major
>              Labels: easyfix
>             Fix For: Future
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> Hi dear team:
> I'm developing OLAP Platform based on Kylin2.5.2. During my work, I build a 
> streaming cube from Kafka source using kafka demo.
> In my streaming project, I set country、currency as dimensions and userId as 
> metrics. But the cube build failed in 3rd step("Extract Fact Table Distinct 
> Columns"). The exception is java.lang.ArrayIndexOutOfBoundsException.
> This is logs:
> 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: 
> Do cleanup, available memory: 1334m
> 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: 
> Total rows: 127
> 2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.MapTask: 
> Finished spill 0
> 2019-03-02 14:21:01,492 INFO [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child: java.lang.ArrayIndexOutOfBoundsException:2
> 2019-03-02 14:21:01,492 INFO [main] org.apache.kylin.engine.mr.KylinReducer: 
> Do cleanup, available memory: 1334m
>  at 
> org.apache.kylin.engine.mr.steps.FactDistinctColumnsMapper.doMap(FactDistinctColumnsMapper.java:177)
>  at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77)
>  at org.apache.hadoop.mapreduce.Mapper.run(MapperTask.java:146)
>  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793)
>  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:187)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java;180)
>  
> Then I find that in Kafka datasource, some streaming data lack the userId 
> column. Most of the streaming data(contry, currency,userId) is 
> ("China","CNY","843c4d");but a small amount of data lack userId, some data is 
> ("China","CNY"). so when run the 3rd step("Extract Fact Table Distinct 
> Columns"),MR engine will throw exception if the streaming data lack userId.
> The I check the source of Kylin, FactDistinctColumnsMapper.java:
> public void doMap(KEYIN key, Object record, Context context) throws 
> IOException, InterruptedException {
>  Collection<String[]> rowCollection = 
> flatTableInputFormat.parseMapperInput(record);
> for (String[] row : rowCollection) {
>  context.getCounter(RawDataCounter.BYTES).increment(countSizeInBytes(row));
>  for (int i = 0; i < allCols.size(); i++) {
>  String fieldValue = row[columnIndex[i]];
>  if (fieldValue == null)
>  continue;
> final DataType type = allCols.get(i).getType();
>  ...
> I find that columnIndex[i] is equal with the size of row if the streaming 
> data lack one column. So the row[columnIndex[i]] will throw the 
> ArrayIndexOutOfBoundsException. So I change this code, check the 
> columnIndex[i] and the size of row. If columnIndex[i] is equal with or larger 
> than the size of row, I set fieldValue empty value. And After I change my 
> code， the 3rd step("Extract Fact Table Distinct Columns") will run success.
> Those are what I found, which will cause problem for developers.
> How do you think?
> Best regard
> jintao



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KYLIN-3845) Kylin build error If the Kafka data source lacks selected dimensions or metrics in the kylin stream build.

Reply via email to