kbuci commented on code in PR #11555:
URL: https://github.com/apache/hudi/pull/11555#discussion_r1828525835


##########
rfc/rfc-79/rfc-79.md:
##########
@@ -0,0 +1,140 @@
+w<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Add support for cancellable table service plans
+
+## Proposers
+
+
+## Approvers
+
+## Status
+
+JIRA: HUDI-7946
+
+
+## Abstract
+Table service plans can delay ingestion writes from updating a dataset with 
recent data if potential write conflicts are detected. Furthermore, a table 
service plan that isn't executed to completion for a large amount of time (due 
to repeated failures, application misconfiguration, or insufficient resources) 
will degrade the read/write performance of a dataset due to delaying clean, 
archival, and metadata table compaction. This is because currently HUDI table 
service plans, upon being scheduled, must be executed to completion. And 
additonally will prevent any ingestion write targeting the same files from 
succeeding (due to posing as a write conflict) as well as can prevent new table 
service plans from targeting the same files. Enabling a user to configure a 
table service plan as "cancellable" can prevent frequent or repeatedly failing 
table service plans from delaying ingestion. Support for cancellable plans will 
provide HUDI an avenue to fully cancel a table service plan and allow 
 other table service and ingestion writers to proceed.
+
+
+## Background
+### Execution of table services 
+The table service operations compact and cluster are by default "immutable" 
plans, meaning that once a plan is scheduled it will stay as as a pending 
instant until a caller invokes the table service execute API on the table 
service instant and sucessfully completes it. Specifically, if an inflight 
execution fails after transitioning the instant to inflight, the next execution 
attempt will implictly create and execute a rollback plan (which will delete 
all new instant/data files), but will keep the table service plan. This process 
will repeat until the instant is completed. The below visualization captures 
these transitions at a high level 
+
+![table service lifecycle 
(1)](https://github.com/user-attachments/assets/4a656bde-4046-4d37-9398-db96144207aa)
+
+## Clean and rollback of failed writes
+The clean table service, in addition to performing a clean action, is 
responsible for rolling back any failed ingestion writes 
(non-clustering/non-compaction inflight instants that are not being 
concurrently executed by a writer). This means that table services plans are 
not currently subject to clean's rollback of failed writes. As detailed below, 
this proposal for supporting cancellable table service will benefit from 
enabling clean be capable of targeting table service plans.
+
+## Goals
+### (A) A cancellable table service plan should be capable of preventing 
itself from committing upon presence of write conflict
+The current requirement of HUDI needing to execute a table service plan to 
completion forces ingestion writers to abort a commit if a table service plan 
is conflicting. Becuase an ingestion writer typically determines the exact file 
groups it will be updating/replacing after building a workload profile and 
performing record tagging, the writer may have already spent a lot of time and 
resources before realizing that it needs to abort. In the face of frequent 
table service plans or an old inflight plan, this will cause delays in adding 
recent upstream records to the dataset as well as unecessairly take away 
resources (such as Spark executors in the case of the Spark engine) from other 
applications in the data lake. A cancellable table service plan should avoid 
this situation by preventing itself from being committed if a conflicting 
ingestion job has been comitted already, and cancel itself. In conjunction, any 
ingestion writer or non-cancellable table service writer should be able to
  infer that a conflicting inflight table service plan is cancellable, and 
therefore can be ignored when attempting to commit the instant. 

Review Comment:
   Sure let's take this up an an enhancement once we start working on 
implementation



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to