[ https://issues.apache.org/jira/browse/KYLIN-4683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17173887#comment-17173887 ]
ASF GitHub Bot commented on KYLIN-4683: --------------------------------------- hit-lacus edited a comment on pull request #1351: URL: https://github.com/apache/kylin/pull/1351#issuecomment-671059337 Before we scale up the topic partition , we should doing the following steps: 1. Disable the cube, thus all consumption task will be cacelled. 2. Use the REST API `http://${KYLIN_INSTANCE_IP}7236/kylin/cubes/view/${CUBE_NAME}/instancejson` to check the `CubeInstance.json`. You will find each READY segment has a property named `stream_source_checkpoint`. Here s part of its content: ```json { "uuid": "aab58181-29f7-0593-d7f8-ca9d9b97d49b", "name": "20200809200000_20200809210000", "storage_location_identifier": "APACHE:REALTIME_OLAP_ISH2P3HM30", "date_range_start": 1597003200000, "date_range_end": 1597006800000, "source_offset_start": 0, "source_offset_end": 0, "status": "READY", "size_kb": 102984, "is_merged": false, "estimate_ratio": null, "input_records": 118216, "input_records_size": 0, "last_build_time": 1596982095554, "last_build_job_id": "8f3660b8-72f7-7aac-e3bd-1e06ea30d8e2", "create_time_utc": 1596981735548, "cuboid_shard_nums": {}, "total_shards": 1, "blackout_cuboids": [], "binary_signature": null, "dictionaries": { "USERACTIONSTREAM.DEVICE_BRAND": "/dict/APACHE.USERACTIONSTREAM/DEVICE_BRAND/afb80e63-4fef-d575-3483-1e0314bf4bef.dict", "USERACTIONSTREAM.DEVIDE_TYPE": "/dict/APACHE.USERACTIONSTREAM/DEVIDE_TYPE/61bf9051-3cdd-ec82-bc0f-1bb3226bb411.dict", "USERACTIONSTREAM.LOCATION_CITY": "/dict/APACHE.USERACTIONSTREAM/LOCATION_CITY/292ce446-62a1-0b12-e4d0-13bc249b0dbe.dict", "USERACTIONSTREAM.PAGE_ID": "/dict/APACHE.USERACTIONSTREAM/PAGE_ID/12fa188b-0f89-db3d-560a-8aea5b970349.dict", "USERACTIONSTREAM.NETWORK_TYPE": "/dict/APACHE.USERACTIONSTREAM/NETWORK_TYPE/99d38dea-25ef-c03d-73b7-19a6fda2ce4c.dict", "USERACTIONSTREAM.STR_MINUTE_SECOND": "/dict/APACHE.USERACTIONSTREAM/STR_MINUTE_SECOND/2b23e9aa-dee0-b88f-7e3e-0a3d74d89f89.dict", "USERACTIONSTREAM.ACT_TYPE": "/dict/APACHE.USERACTIONSTREAM/ACT_TYPE/e77f008d-6bd8-bc1f-ba94-ff62764c3e14.dict", "USERACTIONSTREAM.UID": "/dict/APACHE.USERACTIONSTREAM/UID/665546b1-424a-fc42-a35b-1a58fcd1fb5f.dict" }, "snapshots": null, "rowkey_stats": [ [ "ACT_TYPE", 10, 1 ], [ "NETWORK_TYPE", 4, 1 ], [ "LOCATION_CITY", 7, 1 ], [ "STR_MINUTE_SECOND", 3600, 2 ], [ "PAGE_ID", 50, 1 ], [ "DEVICE_BRAND", 5, 1 ], [ "DEVIDE_TYPE", 60, 1 ], [ "UID", 19543, 2 ] ], "stream_source_checkpoint": "{\"0\":363171,\"1\":363198,\"2\":363249,\"3\":363171,\"4\":363199,\"5\":363250,\"6\":363170,\"7\":363196,\"8\":363250,\"9\":363170}" } ``` 3. Check `${KYLIN_RECEIVER_HOME}/logs/kylin_streaming_receiver.log`, you can find the some output: ``` 2020-08-09 22:34:05,381 INFO [UserAnalysisCube_channel] storage.StreamingSegmentManager:645 : Print check point for cube UserAnalysisCube ,CheckPoint{sourceConsumePosition='{"0":381733,"1":381763,"2":381820,"3":381733,"4":381764,"5":381820,"6":381732,"7":381760,"8":381821,"9":381733}', persistedIndexes={1597006800000=13, 1597010400000=8}, longLatencyInfo=LongLatencyInfo{longLatencyEventCnts={20200808000000_20200808010000=3, 20200808060000_20200808070000=2, 20200809000000_20200809010000=2, 20200809060000_20200809070000=2}, totalLongLatencyEventCnt=9}, segmentSourceStartPosition={1597006800000={"0":363171,"1":363198,"2":363249,"3":363171,"4":363199,"5":363250,"6":363170,"7":363196,"8":363250,"9":363170}, 1597010400000={"0":375013,"1":375040,"2":375099,"3":375013,"4":375041,"5":375099,"6":375012,"7":375038,"8":375100,"9":375013}}, checkPointTime=1596983645381, totalCount=3817689, checkPointCount=5801} ``` These logs indicated that the data ingetsed and indexed in receiver side is checkpointed at position : ```json { "0":375013, "1":375040, "2":375099, "3":375013, "4":375041, "5":375099, "6":375012, "7":375038, "8":375100, "9":375013 } ``` 4. When disable cube , data ingetsed and indexed in receiver side will be removed, so when scaled up, we expected receiver will continue its consumpution after following position: ```json { "0":363171, "1":363198, "2":363249, "3":363171, "4":363199, "5":363250, "6":363170, "7":363196, "8":363250, "9":363170 } ``` 5. So let's check if it is correct ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Fail to consume kafka when partition number get larger > ------------------------------------------------------ > > Key: KYLIN-4683 > URL: https://issues.apache.org/jira/browse/KYLIN-4683 > Project: Kylin > Issue Type: Bug > Affects Versions: v3.0.2 > Reporter: tianhui > Priority: Major > Attachments: image-2020-08-05-17-20-37-270.png > > > I run a testing streaming cube with kafka. At first, the topic has 3 > partitions, and the cube running smoothly. But after I alter kafka topic to 7 > partitions, all receivers stop consuming. !image-2020-08-05-17-20-37-270.png! -- This message was sent by Atlassian Jira (v8.3.4#803005)