eugenegujing commented on PR #5643:
URL: https://github.com/apache/texera/pull/5643#issuecomment-4705572243

   @Yicong-Huang Thanks! I think your idea helps a lot. I'm thinking of 
transforming this PR from deleting any uncommitted trash files to delivering an 
audit summary over a retention window instead.
   
   For this PR, I am considering the safest first step:
   
   A. Audit-only scheduled scan
   - The job still scans expired upload sessions and stale uncommitted LakeFS 
objects using the same candidate-detection logic that cleanup would need.
   - It does not abort multipart uploads, delete DB rows, or reset LakeFS 
objects.
   - It only emits a round summary, for example: expiredSessionsFound, 
staleObjectsFound, errors, truncated.
   - Candidate-level details can stay at debug level so we can inspect them 
when needed without making normal logs noisy.
   - Cleanup stays disabled / non-destructive by default.
   
   The important connection is that A is not a throwaway version: it 
establishes the candidate model and the scheduled scan path, but makes the 
output observational only. Then follow-up PRs can gradually evolve the same 
audit output into B:
   
   B. Persisted audit history + optional cleanup action
   - Persist audit runs and candidates in DB so admins can inspect historical 
trends and exact candidates.
   - Build on the same fields emitted by the audit summary in A, instead of 
introducing a separate cleanup path later.
   - Add an explicit admin-facing cleanup action or config-gated cleanup mode 
only after the persisted audit flow exists.
   - If real deletion is added, it should be opt-in and reviewed in a separate 
PR, not enabled by default.
   
   So this PR would focus on non-destructive observability first, while keeping 
the implementation shaped so later PRs can step-by-step develop it into 
persisted audit history and, eventually, an explicit cleanup action. What do 
you think?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to