yjhjstz opened a new issue, #1293:
URL: https://github.com/apache/cloudberry/issues/1293

   ### Apache Cloudberry version
   
   2.1.0-devel
   
   ### What happened
   
   When attempting to collect extended statistics (dependencies) on a large 
table with 10 million rows, ANALYZE fails with the following error:
   
   ERROR:  too many sample rows received from gp_acquire_sample_rows 
(analyze.c:2841)
   This appears to be a failure in the sampling process used by extended 
statistics.
   
   ### What you think should happen instead
   
   ANALYZE should successfully collect dependency statistics for the specified 
columns.
   
   ### How to reproduce
   
   ```sql
   -- Step 1: Create test table
   CREATE TABLE tbl (
       col1 int,
       col2 int
   );
   
   -- Step 2: Insert 10 million rows with grouped values
   INSERT INTO tbl
   SELECT i / 10000, i / 100000
   FROM generate_series(1, 10000000) s(i);
   
   -- Step 3: Run initial ANALYZE
   ANALYZE tbl;
   
   -- Step 4: Create extended statistics on col1, col2
   CREATE STATISTICS s1 (dependencies) ON col1, col2 FROM tbl;
   
   -- Step 5: Trigger extended stats collection
   ANALYZE tbl;
   ```
   
   ### Operating System
   
   centos 9
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes, I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/cloudberry/blob/main/CODE_OF_CONDUCT.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to