gongxun0928 opened a new pull request, #1397:
URL: https://github.com/apache/cloudberry/pull/1397

   Previously, statistics (min-max, sum, count, etc.) were computed 
synchronously during data insertion, causing significant slowdowns due to heavy 
computational overhead.
   
   This change introduces an asynchronous approach to maintain statistics:
   - Add a GUC parameter to control statistics collection during writes 
(disabled by default)
   - Skip statistics computation during INSERT to ensure fast writes
   - Update statistics asynchronously during VACUUM on PAX tables by scanning 
file metadata
   - Re-read files and refresh statistics only when metadata indicates they are 
stale
   
   ```
   create table t1(c1 int, c2 int, c3 int, c4 int, c5 int, c6 int) using pax 
with(minmax_columns='c1,c2,c3,c4,c5,c6');
   set pax.enable_sync_collect_stats to on; -- collect stats synchronously
   insert into t1 select i,i,i,i,i,i from generate_series(1,1000000) i;
   INSERT 0 1000000
   Time: 2733.731 ms (00:02.734)
   
   create table t2(c1 int, c2 int, c3 int, c4 int, c5 int, c6 int) using pax;
   insert into t2 select i,i,i,i,i,i from generate_series(1,1000000) i;
   INSERT 0 1000000
   Time: 1816.836 ms (00:01.817)
   
   ```
   
   <!-- Thank you for your contribution to Apache Cloudberry (Incubating)! -->
   
   Fixes #ISSUE_Number
   
   ### What does this PR do?
   <!-- Brief overview of the changes, including any major features or fixes -->
   
   ### Type of Change
   - [ ] Bug fix (non-breaking change)
   - [ ] New feature (non-breaking change)
   - [ ] Breaking change (fix or feature with breaking changes)
   - [ ] Documentation update
   
   ### Breaking Changes
   <!-- Remove if not applicable. If yes, explain impact and migration path -->
   
   ### Test Plan
   <!-- How did you test these changes? -->
   - [ ] Unit tests added/updated
   - [ ] Integration tests added/updated
   - [ ] Passed `make installcheck`
   - [ ] Passed `make -C src/test installcheck-cbdb-parallel`
   
   ### Impact
   <!-- Remove sections that don't apply -->
   **Performance:**
   <!-- Any performance implications? -->
   
   **User-facing changes:**
   <!-- Any changes visible to users? -->
   
   **Dependencies:**
   <!-- New dependencies or version changes? -->
   
   ### Checklist
   - [ ] Followed [contribution 
guide](https://cloudberry.apache.org/contribute/code)
   - [ ] Added/updated documentation
   - [ ] Reviewed code for security implications
   - [ ] Requested review from [cloudberry 
committers](https://github.com/orgs/apache/teams/cloudberry-committers)
   
   ### Additional Context
   <!-- Any other information that would help reviewers? Remove if none -->
   
   ### CI Skip Instructions
   <!--
   To skip CI builds, add the appropriate CI skip identifier to your PR title.
   The identifier must:
   - Be in square brackets []
   - Include the word "ci" and either "skip" or "no"
   - Only use for documentation-only changes or when absolutely necessary
   -->
   
   ---
   <!-- Join our community:
   - Mailing list: 
[[email protected]](https://lists.apache.org/[email protected])
 (subscribe: [email protected])
   - Discussions: https://github.com/apache/cloudberry/discussions -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to