[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1422075410 @danny0405 No. I mean the job stuck after plan execution and can't enter the second around. In our scenario, the compaction plan is generated by streaming ingesting job. And the async table service in job manager seems to be problemlic. I suggest the code of plan discovery also the service mode cycle could be moved into source function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1416264406 I think it's better to move the compaction plan generate code inside the source function. I can make a pr if needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1416200519 @danny0405 Maybe I know what's going on. It is not work at all in HA standalone cluster like Aliyun VVP. ** The `compact()` doesn't quit after prev patch, it just do nothing after the task has finished.** The job don't get "done" when service mode disabled. It hangs there and doing nothing. The job don't get second around when service mode enabled. Hanging after first round.So the timeline service rollback again and again. ** The compacation is succeed ** There is a commit file under .hoodie directory after first around. Although there is no logs at all. So maybe it's not working when using new StreamEnviroment to execute the job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1415619208 @danny0405 I've noticed that there is warning and recovered log and don't know if it related. I don't see these log while service mode is disabled. ```LOG 2023-02-02 22:05:18,868 INFO org.apache.flink.api.java.typeutils.TypeExtractor[] - class org.apache.hudi.common.model.CompactionOperation does not contain a setter for field baseInstantTime 2023-02-02 22:05:18,869 INFO org.apache.flink.api.java.typeutils.TypeExtractor[] - Class class org.apache.hudi.common.model.CompactionOperation cannot be used as a POJO type because not all fields are valid POJO fields, and must be processed as GenericType. Please read the Flink documentation on "Data Types & Serialization" for details of the effect on performance. 2023-02-02 22:05:18,884 WARN org.apache.flink.resourceplan.applyagent.StreamGraphModifier [] - Path of resource plan is not specified, do nothing. 2023-02-02 22:05:18,884 INFO org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - Job 6e03ee3092954b338d7b984d6918ce32 was recovered successfully. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1413832186 @danny0405 Here is the case. The command line to startup the offline program(service mode): ```SHELL --path oss://dengine-lake-zjk/cloudcode_prod/dwd_egc_adv_resp_intra --compaction-max-memory 3072 --archive-min-commits 180 --archive-max-commits 2016 --seq LIFO --compaction-tasks 16 --plan-select-strategy num_instants --max-num-plans 16 --min-compaction-interval-seconds 30 --spillable_map_path /opt/flink/flink-tmp-dir/ --service -Dhadoop.fs.AbstractFileSystem.oss.impl=com.aliyun.jindodata.oss.OSS -Dhadoop.fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem -Dhadoop.fs.oss.endpoint=cn-zhangjiakou.oss.aliyuncs.com -Dhadoop.fs.oss.credentials.provider=com.aliyun.jindodata.oss.auth.SimpleCredentialsProvider -Dhadoop.fs.oss.accessKeyId= -Dhadoop.fs.oss.accessKeySecret=*** ``` https://user-images.githubusercontent.com/5518468/216350958-72f669a3-a721-4c34-ab6c-533c74840d6d.png";> The job runs for first round(16 instants for 120 files). And then TM stucks here while JM still rollback the compaction again and again, it don't commit finished at all. The issue should be reopened. [tm-compaction.log](https://github.com/apache/hudi/files/10569942/tm-compaction.log) [jm-compaction.log](https://github.com/apache/hudi/files/10569944/jm-compaction.log) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1373053740 @danny0405 Here is the situation I've meet: 1. start a job with service mode enabled. It's quite a large job (200+ filegroup with 1GB+ each, 100+ compaction tasks). 2. the first round(load all instants) finished, and the second round(newly added compaction task) start to rollback the tasks which just has been done within first round. 3. looking into the log, I've found there is no committing after each compaction task.So when enter second round, the task has to rollback all the task just been done and do it again(although the file has been created, but with no instant.commit file). 4. the dirty files keeps second round compaction failing(the final parquet file already exists), I have to replace CREATE with OVERWRITE within the code to avoid failure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1372153368 @danny0405 I know.There is only one job doing offline compaction, and this job contains multiple slot/parallism to do compaction. You can see there is no commit after compaction finished, it is abnormal comparing with service mode disabled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1371761701 @danny0405 It seems not commit compaction after compaction finished in service mode.A lot of rollback have been found. [logs.zip](https://github.com/apache/hudi/files/10349252/logs.zip) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1368700671 @danny0405 Thanks for the fix, I will give it a try. Yeah I'm using aliyun vvp/vvr, but the bundle is compiled by myself and manually provided as individual jar.So it will not depend on the enviroment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.
Leoyzen commented on issue #7546: URL: https://github.com/apache/hudi/issues/7546#issuecomment-1368346154 @danny0405 @yihua Is there any solution right now? We have a source which produce 100k tps+ and taskmanager keeps crash timeout when using online compaction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org