[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-02-07 Thread via GitHub


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1422075410

   @danny0405 
   No. I mean the job stuck after plan execution and can't enter the second 
around.
   
   In our scenario, the compaction plan is generated by streaming ingesting job.
   And the async table service in job manager seems to be problemlic.
   I suggest the code of plan discovery also the service mode cycle could be 
moved into source function.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-02-03 Thread via GitHub


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1416264406

   I think it's better to move the compaction plan generate code inside the 
source function.
   
   I can make a pr if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-02-03 Thread via GitHub


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1416200519

   @danny0405 Maybe I know what's going on.
   
   It is not work at all in HA standalone cluster like Aliyun VVP.
   
   ** The `compact()` doesn't quit after prev patch, it just do nothing after 
the task has finished.** 
   The job don't get "done" when service mode disabled. It hangs there and 
doing nothing.
   
   The job don't get second around when service mode enabled. Hanging after 
first round.So the timeline service rollback again and again.
   
   
   
   ** The compacation is succeed **
   There is a commit file under .hoodie directory after first around.
   Although there is no logs at all.
   
   
   So maybe it's not working when using new StreamEnviroment to execute the job.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-02-03 Thread via GitHub


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1415619208

   @danny0405 I've noticed that there is warning and recovered log and don't 
know if it related.
   
   I don't see these log while service mode is disabled.
   ```LOG
   2023-02-02 22:05:18,868 INFO  
org.apache.flink.api.java.typeutils.TypeExtractor[] - class 
org.apache.hudi.common.model.CompactionOperation does not contain a setter for 
field baseInstantTime
   2023-02-02 22:05:18,869 INFO  
org.apache.flink.api.java.typeutils.TypeExtractor[] - Class class 
org.apache.hudi.common.model.CompactionOperation cannot be used as a POJO type 
because not all fields are valid POJO fields, and must be processed as 
GenericType. Please read the Flink documentation on "Data Types & 
Serialization" for details of the effect on performance.
   2023-02-02 22:05:18,884 WARN  
org.apache.flink.resourceplan.applyagent.StreamGraphModifier [] - Path of 
resource plan is not specified, do nothing.
   2023-02-02 22:05:18,884 INFO  
org.apache.flink.client.deployment.application.executors.EmbeddedExecutor [] - 
Job 6e03ee3092954b338d7b984d6918ce32 was recovered successfully.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-02-02 Thread via GitHub


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1413832186

   @danny0405 Here is the case.
   The command line to startup the offline program(service mode):
   
   ```SHELL
   --path oss://dengine-lake-zjk/cloudcode_prod/dwd_egc_adv_resp_intra 
   --compaction-max-memory 3072 
   --archive-min-commits 180 
   --archive-max-commits 2016 
   --seq LIFO 
   --compaction-tasks 16 
   --plan-select-strategy num_instants 
   --max-num-plans 16 
   --min-compaction-interval-seconds 30 
   --spillable_map_path /opt/flink/flink-tmp-dir/ 
   --service 
   -Dhadoop.fs.AbstractFileSystem.oss.impl=com.aliyun.jindodata.oss.OSS 
   -Dhadoop.fs.oss.impl=com.aliyun.jindodata.oss.JindoOssFileSystem 
   -Dhadoop.fs.oss.endpoint=cn-zhangjiakou.oss.aliyuncs.com 
   
-Dhadoop.fs.oss.credentials.provider=com.aliyun.jindodata.oss.auth.SimpleCredentialsProvider
 
   -Dhadoop.fs.oss.accessKeyId= 
   -Dhadoop.fs.oss.accessKeySecret=***
   ```
   https://user-images.githubusercontent.com/5518468/216350958-72f669a3-a721-4c34-ab6c-533c74840d6d.png";>
   The job runs for first round(16 instants for 120 files).
   
   And then TM stucks here while JM still rollback the compaction again and 
again, it don't commit finished at all.
   
   
   The issue should be reopened.
   
[tm-compaction.log](https://github.com/apache/hudi/files/10569942/tm-compaction.log)
   
[jm-compaction.log](https://github.com/apache/hudi/files/10569944/jm-compaction.log)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-01-05 Thread GitBox


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1373053740

   @danny0405 Here is the situation I've meet:
   
   1. start a job with service mode enabled. It's quite a large job (200+ 
filegroup with 1GB+ each, 100+ compaction tasks).
   2. the first round(load all instants) finished, and the second round(newly 
added compaction task) start to rollback the tasks which just has been done 
within first round.
   3. looking into the log, I've found there is no committing after each 
compaction task.So when enter second round, the task has to rollback all the 
task just been done and do it again(although the file has been created, but 
with no instant.commit file).
   4. the dirty files keeps second round compaction failing(the final parquet 
file already exists), I have to replace CREATE with OVERWRITE within the code 
to avoid failure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-01-05 Thread GitBox


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1372153368

   @danny0405 I know.There is only one job doing offline compaction, and this 
job contains multiple slot/parallism to do compaction.
   
   You can see there is no commit after compaction finished, it is abnormal 
comparing with service mode disabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-01-04 Thread GitBox


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1371761701

   @danny0405 It seems not commit compaction after compaction finished in 
service mode.A lot of rollback have been found.
   [logs.zip](https://github.com/apache/hudi/files/10349252/logs.zip)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2023-01-01 Thread GitBox


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1368700671

   @danny0405 Thanks for the fix, I will give it a try.
   Yeah I'm using aliyun vvp/vvr, but the bundle is compiled by myself and 
manually provided as individual jar.So it will not depend on the enviroment.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] Leoyzen commented on issue #7546: [SUPPORT]Fail to execute offline flink compactor in service mode.

2022-12-31 Thread GitBox


Leoyzen commented on issue #7546:
URL: https://github.com/apache/hudi/issues/7546#issuecomment-1368346154

   @danny0405 @yihua Is there any solution right now? We have a source which 
produce 100k tps+ and taskmanager keeps crash timeout when using online 
compaction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org