Re: [DISCUSS] Hudi data TTL

2022-10-18 Thread Jian Feng
Good idea,
this is definitely worth an  RFC
btw should it only depend on Hudi's partition? I feel it should be a more
common feature since sometimes customers' data can not update across
partitions


On Wed, Oct 19, 2022 at 11:07 AM stream2000 <18889897...@163.com> wrote:

> Hi all, we have implemented a partition based data ttl management, which
> we can manage ttl for hudi partition by size, expired time and
> sub-partition count. When a partition is detected as outdated, we use
> delete partition interface to delete it, which will generate a replace
> commit to mark the data as deleted. The real deletion will then done by
> clean service.
>
>
> If community is interested in this idea, maybe we can propose a RFC to
> discuss it in detail.
>
>
> > On Oct 19, 2022, at 10:06, Vinoth Chandar  wrote:
> >
> > +1 love to discuss this on a RFC proposal.
> >
> > On Tue, Oct 18, 2022 at 13:11 Alexey Kudinkin 
> wrote:
> >
> >> That's a very interesting idea.
> >>
> >> Do you want to take a stab at writing a full proposal (in the form of
> RFC)
> >> for it?
> >>
> >> On Tue, Oct 18, 2022 at 10:20 AM Bingeng Huang 
> >> wrote:
> >>
> >>> Hi all,
> >>>
> >>> Do we have plan to integrate data TTL into HUDI, so we don't have to
> >>> schedule a offline spark job to delete outdated data, just set a TTL
> >>> config, then writer or some offline service will delete old data as
> >>> expected.
> >>>
> >>
>
>

-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


Re: [DISCUSS] New RFC to support Lock-free concurrency control on Merge-on-read tables

2022-03-24 Thread Jian Feng
sure, I'm working on it, will add you as a co-author when create a pr

On Fri, Mar 25, 2022 at 1:17 AM Vinoth Chandar  wrote:

> +1. Love to be a co-author on the RFC, if you are open to it.
>
> On Mon, Mar 21, 2022 at 12:31 PM 冯健  wrote:
>
> > Hi team,
> >
> > The situation is Optimistic concurrency control(OCC) has some limitation
> >
> >-
> >
> >When conflicts do occur, they may waste massive resources during every
> >attempt (lakehouse-concurrency-control-are-we-too-optimistic
> ><
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hudi.apache.org_blog_2021_12_16_lakehouse-2Dconcurrency-2Dcontrol-2Dare-2Dwe-2Dtoo-2Doptimistic=DwIFaQ=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw=bXAq09cDo2vOJ-2Uz9h3CslJmeCj9JMbo5X-gCHPF24=rz6Mo5568KcwmokXd967obpw0RNDcDJepfrUmf9KUxgfK14-uOfJSLb4l7xpCxqp=GFRt00qSBTRTWbGjUo-UBInLiU88zE_YbvHP0UO_geE=
> > >
> >).
> >-
> >
> >multiple writers may cause data duplicates when records with same new
> >record-key arrives.multi-writer-guarantees
> ><
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__hudi.apache.org_docs_concurrency-5Fcontrol-23multi-2Dwriter-2Dguarantees=DwIFaQ=R1GFtfTqKXCFH-lgEPXWwic6stQkW4U7uVq33mt-crw=bXAq09cDo2vOJ-2Uz9h3CslJmeCj9JMbo5X-gCHPF24=rz6Mo5568KcwmokXd967obpw0RNDcDJepfrUmf9KUxgfK14-uOfJSLb4l7xpCxqp=H7a3yrvObNIz8WpuChSWN9X8fKpMslfTeiRJ29U3Tkg=
> >
> >
> > There is some background information, with OCC, we assume Multiple
> writers
> > won't write data to same FileID in most of time, if there is a FileId
> level
> > conflict, the commit will be rollbacked. and FileID level conflict can't
> > guarantee no duplicate if two records with same new record-key arrives in
> > multiple writers, since the mapping of key-bucket is not consistent with
> > bloom index.
> >
> > What I plan to do is support Lock-free concurrency control with a
> > non-duplicates guarantee in hudi(only for Merge-On-Read tables).
> >
> >-
> >
> >With canIndexLogfiles index , multiple writers ingesting data into
> >Merge-on-read tables can only append data to delta logs. This is a
> >lock-free process if we can make sure they don’t write data to the
> same
> > log
> >file (plan to create multiple marker files to achieve this). And with
> > log
> >merge API(preCombine logic in Payload class), data in log files can be
> > read
> >properly
> >-
> >
> >Since hudi already has an index type like Bucket index which can map
> >key-bucket in a consistent way.  Data duplicates can be eliminated
> >
> >
> > Thanks,
> > Jian Feng
> >
>


-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


Re: [DISCUSS] Trino Plugin for Hudi

2021-10-20 Thread Jian Feng
When can Trino support snapshot queries on the Merge-on-read table?

On Mon, Oct 18, 2021 at 9:06 PM 周康  wrote:

> +1 i have send a message on trino slack, really appreciate for the new
> trino plugin/connector.
> https://trinodb.slack.com/archives/CP1MUNEUX/p1623838591370200
>
> looking forward to the RFC and more discussion
>
> On 2021/10/17 06:06:09 sagar sumit wrote:
> > Dear Hudi Community,
> >
> > I would like to propose the development of a new Trino plugin/connector
> for
> > Hudi.
> >
> > Today, Hudi supports snapshot queries on Copy-On-Write (COW) tables and
> > read-optimized queries on Merge-On-Read tables with Trino, through the
> > input format based integration in the Hive connector [1
> > <https://github.com/prestodb/presto/commits?author=vinothchandar>]. This
> > approach has known performance limitations with very large tables, which
> > has been since fixed on PrestoDB [2
> > <https://prestodb.io/blog/2020/08/04/prestodb-and-hudi>]. We are
> working on
> > replicating the same fixes on Trino as well [3
> > <https://github.com/trinodb/trino/pull/9641>].
> >
> > However, as Hudi keeps getting better, a new plugin to provide access to
> > Hudi data and metadata will help in unlocking those capabilities for the
> > Trino users. Just to name a few benefits, metadata-based listing, full
> > schema evolution, etc [4
> > <
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> >].
> > Moreover, a separate Hudi connector would allow its independent evolution
> > without having to worry about hacking/breaking the Hive connector.
> >
> > A separate connector also falls in line with our vision [5
> > <
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> >]
> > when we think of a standalone timeline server or a lake cache to balance
> > the tradeoff between writing and querying. Imagine users having read and
> > write access to data and metadata in Hudi directly through Trino.
> >
> > I did some prototyping to get the snapshot queries on a Hudi COW table
> > working with a new plugin [6
> > <https://github.com/codope/trino/tree/hudi-plugin>], and I feel the
> effort
> > is worth it. High-level approach is to implement the connector SPI [7
> > <https://trino.io/docs/current/develop/connectors.html>] provided by
> Trino
> > such as:
> > a) HudiMetadata implements ConnectorMetadata to fetch table metadata.
> > b) HudiSplit and HudiSplitManager implement ConnectorSplit and
> > ConnectorSplitManager to produce logical units of data partitioning, so
> > that Trino can parallelize reads and writes.
> >
> > Let me know your thoughts on the proposal. I can draft an RFC for the
> > detailed design discussion once we have consensus.
> >
> > Regards,
> > Sagar
> >
> > References:
> > [1] https://github.com/prestodb/presto/commits?author=vinothchandar
> > [2] https://prestodb.io/blog/2020/08/04/prestodb-and-hudi
> > [3] https://github.com/trinodb/trino/pull/9641
> > [4]
> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+33++Hudi+supports+more+comprehensive+Schema+Evolution
> > [5]
> >
> https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform#timeline-metaserver
> > [6] https://github.com/codope/trino/tree/hudi-plugin
> > [7] https://trino.io/docs/current/develop/connectors.html
> >
>


-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


Re: [Phishing Risk] [External] is there solution to solve hbase data screw issue

2021-10-17 Thread Jian Feng
I have a question, when Delta streamer does delta commit with BloomIndex,
if data is new , it may need to append them to the existing file group.
meanwhile may cause concurrent issue with async compaction thread if
compaction plan contains same file group,how Hudi avoid that?

On Fri, Oct 15, 2021 at 12:50 AM Vinoth Chandar  wrote:

> Yeah all the rate limiting code in HBaseIndex is working around for these
> large bulk writes.
>
> On Tue, Oct 5, 2021 at 11:16 AM Jian Feng  wrote:
>
> > actually I met this problem when bootstrap a huge table,after changed
> > region key split strategy,problem solved.
> > Im glad to hear that hfile solution will work in the future,since
> > bloomindex cannot index mor log file,hence new insert data still write
> into
> > parquet ,that why I choose hbase index ,get better performance.
> >
> > Vinoth Chandar 于2021年10月5日 周二下午7:29写道:
> >
> > > +1 on that answer. It's pretty spot on.
> > >
> > > Even as random prefix helps with HBase balancing, the issue then
> becomes
> > > that you lose all the key ordering inside the Hudi table, which
> > > can be a nice thing if you even want range pruning/indexing to be
> > > effective.
> > >
> > > To paint a picture of all the work being done around this area. This
> > work,
> > > driven by uber engineers https://github.com/apache/hudi/pull/3508
> could
> > > technically solve the issue by directly reading HFiles
> > > for the indexing, avoiding going to HBase servers. But obviously, it
> > could
> > > be less performant for small upsert batches than HBase (given the
> region
> > > servers will cache etc).
> > > If your backing storage is a cloud/object storage, which again
> throttles
> > by
> > > prefixes etc, then we could run into the same hotspotting problem
> again.
> > > Otherwise, for larger batches, this would be far more scalable.
> > >
> > >
> > > On Mon, Oct 4, 2021 at 7:06 PM 管梓越 
> wrote:
> > >
> > > > Hi jianfeng
> > > >   As far as I know, there may not be a solution in hudi side yet.
> > > > However, I have met this problem before so hope my experience could
> > help.
> > > > Just like other usages of hbase, adding a random prefix to rowkey may
> > be
> > > > the most universal solution to this problem.
> > > > We may change the primary key for hudi by adding such prefix before
> the
> > > > data is ingested into hudi. A new column could be added to save
> > original
> > > > primary key for query and hide the pk of hudi.
> > > > Also, we may have a small modification to hbase index. Copy the code
> of
> > > > hbase index, add the prefix on the aspect of query and update hbase.
> By
> > > > this way, the pk in hbase will be different with the one in hudi but
> > such
> > > > logic will be transparent to business logic. I have adopted this
> method
> > > in
> > > > prod environment. Using withIndexClass config in IndexConfig could
> > > specify
> > > > custom index which allows the change of index without re compilation
> of
> > > the
> > > > whole hudi project.
> > > >
> > > > On Mon, Oct 4, 2021, 11:29 PM  wrote:
> > > > when I bootstrape a huge hbase index table, I found all keys have a
> > > prefix
> > > > 'itemid:', then it caused data skew, there are 100 region servers in
> > > hbase
> > > > but only one was handle datas Is there any way to avoid this issue on
> > the
> > > > Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure
> > > >
> > >
> > --
> > Full jian
> >  | 
> > Mobile 
> > Address 
> >
>


-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


Re: [Phishing Risk] [External] [Delta Streamer] file name mismatch with meta when compaction running

2021-10-06 Thread Jian Feng
Can you see pictures here? https://github.com/apache/hudi/issues/3755
Thanks!  let me read that article , Im trying to create another Bloom index
mor table to see if problem still exists

On Wed, Oct 6, 2021 at 2:54 PM 管梓越  wrote:

> Hi JianFeng
> It seems that there might be something wrong with the image so that I'm
> not able to get the image in my side. Pleased to share some info about your
> first question.
> The name of baseFile is comprised by {fileID}_writeToken_instant. For
> write token, the method makeWriteToken in org.apache.hudi.common.fs.FSUtils
> indicates how it is generated with three spark task information. As far as
> I know, write token is designed to distinguish the files in same filegroup
> generated by different task attempt.
> Let me share a scenario. In spark compaction job, speculation is
> allowed. Two task attempt try to generate base file for the same filegroup,
> so only the file written by the succeeded task can finally be picked by
> hudi. We will use the file name returned by succeeded task to get the one
> we want. reconcileAgainstMarkers method in class HoodieTable shows how this
> process work.
> No idea on how this problem occur, it should not happen with default
> config and hdfs. Hope these info could help you.
> By the way, there is a Wechat account shared some perfect articles in
> chinese about hudi. For guys who are good at chinese, following article may
> provide more information. Great thanks to the author.
>
>
> https://mp.weixin.qq.com/s?__biz=MzIyMzQ0NjA0MQ===2247484306=1=1d853469159a600d82050c17e6a2a075=e81f56e4df68dff2da417109c4a971aef54f056bc0519558c58e23fe60b90dc6e4f8d7e92774=1688466117=zh_CN#rd
>
> On Wed, Oct 6, 2021 at 1:35 PM Jian Feng  wrote:
>
> > when I run delta streamer(version 0.9) to ingest data from kafka to a
> > Hbase indexed mor table ,  after few commits, met this error when
> > compaction running
> > [image: image.png]
> >
> >  In hdfs there is a file has same fileId and commit instant but different
> > in the middle:
> >
> hdfs://tl5/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_db__item_v4_tab_newHbase/BR/2021-10/813800cd-1aaf-43ea-829f-4feef4a51cb3-0_19-2672-4427765_
> > *20211006051032*.parquet
> >
> > below is 20211006051032.commit's content,
> >
> >
> > [image: image.png]
> >
> >
> > What does 2672-4427765 and 2657-4368242 mean? and how can I fix this
> error?
> >
> > I tried recreate table , it happens again
> >
> >
> > --
> > *Jian Feng,冯健*
> > Shopee | Engineer | Data Infrastructure
> >
>


-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


[Delta Streamer] file name mismatch with meta when compaction running

2021-10-05 Thread Jian Feng
when I run delta streamer(version 0.9) to ingest data from kafka to a Hbase
indexed mor table ,  after few commits, met this error when compaction
running
[image: image.png]

 In hdfs there is a file has same fileId and commit instant but different
in the middle:
hdfs://tl5/projects/data_vite/mysql_ingestion/rti_vite/shopee_item_v4_db__item_v4_tab_newHbase/BR/2021-10/813800cd-1aaf-43ea-829f-4feef4a51cb3-0_19-2672-4427765_
*20211006051032*.parquet

below is 20211006051032.commit's content,


[image: image.png]


What does 2672-4427765 and 2657-4368242 mean? and how can I fix this error?

I tried recreate table , it happens again


-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


Re: [Phishing Risk] [External] is there solution to solve hbase data screw issue

2021-10-05 Thread Jian Feng
actually I met this problem when bootstrap a huge table,after changed
region key split strategy,problem solved.
Im glad to hear that hfile solution will work in the future,since
bloomindex cannot index mor log file,hence new insert data still write into
parquet ,that why I choose hbase index ,get better performance.

Vinoth Chandar 于2021年10月5日 周二下午7:29写道:

> +1 on that answer. It's pretty spot on.
>
> Even as random prefix helps with HBase balancing, the issue then becomes
> that you lose all the key ordering inside the Hudi table, which
> can be a nice thing if you even want range pruning/indexing to be
> effective.
>
> To paint a picture of all the work being done around this area. This work,
> driven by uber engineers https://github.com/apache/hudi/pull/3508 could
> technically solve the issue by directly reading HFiles
> for the indexing, avoiding going to HBase servers. But obviously, it could
> be less performant for small upsert batches than HBase (given the region
> servers will cache etc).
> If your backing storage is a cloud/object storage, which again throttles by
> prefixes etc, then we could run into the same hotspotting problem again.
> Otherwise, for larger batches, this would be far more scalable.
>
>
> On Mon, Oct 4, 2021 at 7:06 PM 管梓越  wrote:
>
> > Hi jianfeng
> >   As far as I know, there may not be a solution in hudi side yet.
> > However, I have met this problem before so hope my experience could help.
> > Just like other usages of hbase, adding a random prefix to rowkey may be
> > the most universal solution to this problem.
> > We may change the primary key for hudi by adding such prefix before the
> > data is ingested into hudi. A new column could be added to save original
> > primary key for query and hide the pk of hudi.
> > Also, we may have a small modification to hbase index. Copy the code of
> > hbase index, add the prefix on the aspect of query and update hbase. By
> > this way, the pk in hbase will be different with the one in hudi but such
> > logic will be transparent to business logic. I have adopted this method
> in
> > prod environment. Using withIndexClass config in IndexConfig could
> specify
> > custom index which allows the change of index without re compilation of
> the
> > whole hudi project.
> >
> > On Mon, Oct 4, 2021, 11:29 PM  wrote:
> > when I bootstrape a huge hbase index table, I found all keys have a
> prefix
> > 'itemid:', then it caused data skew, there are 100 region servers in
> hbase
> > but only one was handle datas Is there any way to avoid this issue on the
> > Hudi side ? -- *Jian Feng,冯健* Shopee | Engineer | Data Infrastructure
> >
>
-- 
Full jian
 | 
Mobile 
Address 


is there solution to solve hbase data screw issue

2021-10-04 Thread Jian Feng
when I bootstrape a huge hbase index table, I found all keys have a prefix
'itemid:', then it caused data skew, there are 100 region servers in hbase
but only one was handle datas

Is there any way to avoid this issue on the Hudi side ?
-- 
*Jian Feng,冯健*
Shopee | Engineer | Data Infrastructure


How to read hudi files with Mapreduce?

2021-08-11 Thread Jian Feng
Hi all,  anyone can give me a sample?
-- 

FengJian

Data Infrastructure Team

Mobile +65 90388153

Address 5 Science Park Drive, Shopee Building, Singapore 118265


ingetst avro nested array field error occur

2021-07-22 Thread Jian Feng
anyone can help to see this issue ?
https://github.com/apache/hudi/issues/3327
when ingest data into hudi table ,I found if the avro schema has a array
field with nested array field and has no other fields, error happens

but if I add a dummy field , ingestion works fine
-- 

FengJian

Data Infrastructure Team

Mobile +65 90388153

Address 5 Science Park Drive, Shopee Building, Singapore 118265


what's different between Append only and insert in Flink stream?

2021-07-10 Thread Jian Feng
I saw a pr here https://github.com/apache/hudi/pull/3252
-- 

FengJian

Data Infrastructure Team

Mobile +65 90388153

Address 5 Science Park Drive, Shopee Building, Singapore 118265