Hi JianFeng
It seems that there might be something wrong with the image so that I'm
not able to get the image in my side. Pleased to share some info about your
first question.
The name of baseFile is comprised by {fileID}_writeToken_instant. For
write token, the method makeWriteToken in org.apache.hudi.common.fs.FSUtils
indicates how it is generated with three spark task information. As far as
I know, write token is designed to distinguish the files in same filegroup
generated by different task attempt.
Let me share a scenario. In spark compaction job, speculation is
allowed. Two task attempt try to generate base file for the same filegroup,
so only the file written by the succeeded task can finally be picked by
hudi. We will use the file name returned by succeeded task to get the one
we want. reconcileAgainstMarkers method in class HoodieTable shows how this
process work.
No idea on how this problem occur, it should not happen with default
config and hdfs. Hope these info could help you.
By the way, there is a Wechat account shared some perfect articles in
chinese about hudi. For guys who are good at chinese, following article may
provide more information. Great thanks to the author.
https://mp.weixin.qq.com/s?__biz=MzIyMzQ0NjA0MQ==&mid=2247484306&idx=1&sn=1d853469159a600d82050c17e6a2a075&chksm=e81f56e4df68dff2da417109c4a971aef54f056bc0519558c58e23fe60b90dc6e4f8d7e92774&token=1688466117&lang=zh_CN#rd
On Wed, Oct 6, 2021 at 1:35 PM Jian Feng <[email protected]> wrote:
> when I run delta streamer(version 0.9) to ingest data from kafka to a
> Hbase indexed mor table , after few commits, met this error when
> compaction running
> [image: image.png]
>
> In hdfs there is a file has same fileId and commit instant but different
> in the middle:
> hdfs://tl5/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_db__item_v4_tab_newHbase/BR/2021-10/813800cd-1aaf-43ea-829f-4feef4a51cb3-0_19-2672-4427765_
> *20211006051032*.parquet
>
> below is 20211006051032.commit's content,
>
>
> [image: image.png]
>
>
> What does 2672-4427765 and 2657-4368242 mean? and how can I fix this error?
>
> I tried recreate table , it happens again
>
>
> --
> *Jian Feng,冯健*
> Shopee | Engineer | Data Infrastructure
>