This is an automated email from the ASF dual-hosted git repository.
djwang pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/cloudberry-site.git
The following commit(s) were added to refs/heads/main by this push:
new a92a44f598 Blog: what's new in Cloudberry 2.0.0
a92a44f598 is described below
commit a92a44f5983fb456e4cd3907b4a0cb100ff49911
Author: Dianjin Wang <[email protected]>
AuthorDate: Tue Aug 26 18:15:22 2025 +0800
Blog: what's new in Cloudberry 2.0.0
---
...5-08-28-whats-new-in-apache-cloudberry-2.0.0.md | 628 +++++++++++++++++++++
.../blog/whats-new-in-apache-cloudberry-2.0.0.png | Bin 0 -> 195901 bytes
2 files changed, 628 insertions(+)
diff --git a/blog/2025-08-28-whats-new-in-apache-cloudberry-2.0.0.md
b/blog/2025-08-28-whats-new-in-apache-cloudberry-2.0.0.md
new file mode 100644
index 0000000000..c803e247fe
--- /dev/null
+++ b/blog/2025-08-28-whats-new-in-apache-cloudberry-2.0.0.md
@@ -0,0 +1,628 @@
+---
+slug: whats-new-in-apache-cloudberry-2.0.0
+title: "What's New in Apache Cloudberry 2.0.0"
+description: "Dive into the new features and enhancements in Apache Cloudberry
2.0.0"
+authors: [asfcloudberry]
+tags: [Release]
+image: /img/blog/whats-new-in-apache-cloudberry-2.0.0.png
+---
+
+Apache Cloudberry (Incubating) 2.0.0 is the first Apache release since joining
the ASF Incubator. This major version delivers significant enhancements to the
database kernel, representing a substantial leap forward in performance,
reliability, and manageability. The release also includes hundreds of bug fixes
and stability improvements.
+
+In this article, we highlight only the most important features, enhancements,
and fixes to help you quickly understand the key improvements in this release.
For more details, please refer to the [Apache Cloudberry 2.0.0
Changelog](https://cloudberry.apache.org/releases/2.0.0-incubating/).
+
+## New features
+
+### Query processing and optimization
+
+#### Index and scan
+
+##### Enhanced index-only scan capabilities
+
+- Supports index-only scans on a broader range of index types when using the
GPORCA optimizer, including those with covering indexes using `INCLUDE`
columns. This helps improve query performance.
+
+- Supports dynamic index-only scan when using the GPORCA optimizer to
accelerate queries on partitioned tables. This feature combines partition
pruning with index-only access to avoid heap lookups, significantly reducing
I/O and improving performance. It is ideal for wide tables with narrow covering
indexes and can be enabled using `SET optimizer_enable_dynamicindexonlyscan =
on`.
+
+- Supports index-only scans when using the GPORCA optimizer on append-only
(AO) tables and PAX tables, enabling faster query execution by avoiding block
access when possible. This improves performance in scenarios where traditional
index scans on AO and PAX tables were previously inefficient.
+
+##### Improved index scan performance and flexibility
+
+- Supports backward index scans when using the GPORCA optimizer for queries
with `ORDER BY ... DESC`, eliminating the need for explicit sorting when a
B-tree index exists in the opposite order. This optimization reduces resource
usage and improves performance, especially for top-N and pagination queries.
+
+- The GPORCA optimizer supports triggering Bitmap Index Scans using array
comparison predicates like `col IN (...)` or `col = ANY(array)`, including for
hash indexes. This improves query performance on large datasets by enabling
more efficient multi-value matching. The optimizer automatically chooses the
bitmap scan path based on cost estimation.
+
+- The GPORCA optimizer now considers the width of `INCLUDE` columns when
costing index-only scans, favoring narrower indexes that return fewer unused
columns. This improves plan selection for queries where multiple covering
indexes are available. The cost model also more accurately estimates I/O by
refining how `relallvisible` is used in index-only scan costing.
+
+##### BRIN index enhancements
+
+- Redesigns BRIN index internals for AO/CO tables to replace the `UPPER` page
structure with a more efficient chaining model. This significantly reduces disk
space usage for empty indexes and improves performance by avoiding unnecessary
page access. The new design better handles the unique layout of AO/CO tables
while maintaining correctness and compatibility.
+
+- BRIN indexes on AO/CO tables now support summarizing specific logical heap
block ranges using `brin_summarize_range()`, enabling more precise control
during index maintenance and testing. This enhancement also adds improved
coverage for scenarios involving aborted rows, increasing robustness and
correctness in edge cases.
+
+- Supports generating `IndexScan` plans when using the GPORCA optimizer with
`ScalarArrayOp` qualifiers (for example, `col = ANY(array)`) for B-tree
indexes. This enhancement aligns ORCA with the planner's behavior and allows
more efficient execution of array comparison queries, as long as the predicate
column is the first key in a multicolumn index.
+
+#### View and materialized view
+
+- Improves performance of `REFRESH MATERIALIZED VIEW WITH NO DATA` by avoiding
full query execution. The command now behaves like a `TRUNCATE`,significantly
reducing execution time while preserving proper dispatch to segments.
+
+#### Join
+
+- Supports left join pruning when using the GPORCA optimizer, allowing
unnecessary left joins to be eliminated during query optimization. This applies
when the query only uses columns from the outer table and the join condition
fully covers the inner table's unique or primary keys. This can lead to more
efficient query plans.
+
+- Supports `FULL JOIN` using the `Hash Full Join` strategy when using the
GPORCA optimizer. This approach avoids sorting join keys and reduces data
redistribution, making it suitable for large datasets or joins on non-aligned
distribution keys. All `FULL JOIN` queries now use `Hash Full Join`.
+
+- The GPORCA optimizer now avoids unnecessary data redistribution for
multi-way self joins using left or right outer joins when the join keys are
symmetric. This optimization improves performance by recognizing that such
joins preserve data colocation, eliminating redundant motion operations.
+
+- The GPORCA optimizer no longer penalizes broadcast plans for `NOT IN`
queries (Left Anti Semi Join), regardless of the
`optimizer_penalize_broadcast_threshold` setting. This change improves
performance and avoids potential OOM issues by enabling parallel execution
instead of concentrating large tables on the coordinator node.
+
+#### Function & aggregate
+
+- Supports intermediate aggregates when using the GPORCA optimizer, enabling
more efficient execution of queries that include both `DISTINCT` aggregates and
regular aggregates. This ensures correct handling of aggregation stages using
`AGGSPLIT`. In addition, ORCA introduces an optimization for `MIN()` and
`MAX()` functions by using index scans with a limit, instead of full table
scans with regular aggregation. This optimization also supports `IS NULL` and
`IS NOT NULL` conditions on ind [...]
+
+- Enables more `HashAggregate` plan alternatives for queries that include
`DISTINCT` aggregates when using the GPORCA optimizer. By generating a
two-stage aggregation plan that avoids placing `DISTINCT` functions in
hash-based nodes, ORCA ensures compatibility with the executor and expands the
range of supported query plans. This improvement enhances optimization choices
for group-by queries.
+
+- Supports queries using `GROUP BY CUBE`, enabling multi-dimensional grouping
sets in query plans. This expands analytic query capabilities. Note that
optimization time for `CUBE` queries may be high due to the large number of
generated plan alternatives.
+
+#### Preprocessing
+
+- Inlines Common Table Expressions (CTEs) that contain outer references,
allowing such queries to be planned and explained successfully. Previously,
these queries would fall back to the legacy planner due to limitations in
handling shared scans with outer references. This change improves compatibility
and enables ORCA to optimize a broader range of CTE-based queries.
+
+- No longer rewrites `IN` queries to `EXISTS` when the inner subquery contains
a set-returning function. This prevents invalid query transformations that
could previously result in execution errors. The change ensures correct
handling of queries like `a IN (SELECTgenerate_series(1, a))`.
+
+#### Optimization and performance enhancements
+
+##### Dynamic Table
+
+- Introduces dynamic tables, a new feature that enables automatic, scheduled
refresh of query results. Dynamic tables are similar to materialized views but
are designed for scenarios requiring up-to-date data, such as real-time
analytics, lakehouse architectures, and automated ETL pipelines.
+- Supports creating dynamic tables from base tables, external tables, or
materialized views. Users can define refresh intervals using standard cron
expressions, ensuring data is kept current without manual intervention.
+- For more details, see the [official
documentation](https://cloudberry.apache.org/docs/performance/use-dynamic-tables/).
+
+##### Plan hint
+
+- Supports plan hints for scan types and join row estimates when using the
GPORCA optimizer, enabling users to guide query planning using
`pg_hint_plan`-style comments. Supports scan hints include `SeqScan`,
`IndexScan`, `BitmapScan`, and their negations, while row hints allow users to
specify expected join cardinalities.
+
+- The `plan hint` field is now required in the ORCA optimizer configuration.
This change simplifies internal parsing logic and ensures consistent handling
of optimizer configuration files.
+
+- Supports join order hints for left and right outer joins when using the
GPORCA optimizer, extending the existing hint framework beyond inner joins.
This enhancement allows users to guide the optimizer's join order more
precisely in complex queries involving outer joins, improving plan control and
potentially execution performance.
+
+##### Enhancements to ORCA
+
+- Supports table aliases in query plans when using the GPORCA optimizer,
making `EXPLAIN` outputs more descriptive and aligned with user-defined query
syntax. In addition, ORCA adds support for query parameters, including those
used in functions and prepared statements, enabling better compatibility with
parameterized workloads and dynamic SQL execution.
+
+- When using the GPORCA optimizer, supports generating plans for queries on
tables with row-level security (RLS) enabled. Security policies are enforced
during plan generation, ensuring only permitted rows are visible to each user.
ORCA still falls back to the planner for RLS queries with sublinks, foreign
tables, or for `INSERT` and `UPDATE` statements.
+
+- The GPORCA optimizer now gracefully falls back to the Postgres planner when
a function in the `FROM` clause uses `WITH ORDINALITY`, which is not currently
supported. The fallback includes a clear error message indicating the
unsupported feature.
+
+- When using the GPORCA optimizer, supports pushing down filters with
`BETWEEN` predicates when combined with constant filters, enabling more
effective predicate propagation. This enhancement can reduce the number of rows
processed during joins, improving query performance in applicable cases.
+
+- When using the GPORCA optimizer, supports hashed subplans when the subquery
expression is hashable and contains no outer references. This enhancement can
significantly improve query performance by reducing execution time in
applicable cases.
+
+- ORCA now supports executing foreign tables with `mpp_execute='ANY'` on
either the coordinator or segments, depending on cost. This allows more
flexible and efficient execution plans for foreign data sources. A new
"Universal" distribution type is introduced to support this behavior, similar
to how `generate_series()` is handled.
+
+- ORCA now supports direct dispatch for randomly distributed tables when the
query includes a filter on `gp_segment_id`. This enhancement improves query
performance by routing execution directly to the relevant segment, reducing
unnecessary data processing across the cluster.
+
+- ORCA now supports generating plans with the `ProjectSet` node, enabling
correct execution of queries that include set-returning functions (SRFs) in the
target list. This enhancement prevents fallback to the legacy planner and
ensures compatibility with PostgreSQL 11+ behavior.
+
+- ORCA now supports the `FIELDSELECT` node, which allows it to optimize a
broader range of queries involving composite data types. Previously, such
queries would fall back to the legacy planner. This enhancement improves
compatibility and reduces unnecessary planner fallbacks.
+
+- ORCA now derives statistics only for the columns used in `UNION ALL`
queries, instead of all output columns from the input tables. This optimization
reduces unnecessary computation and can improve planning performance for large
queries.
+
+- Updates naming in logs and `EXPLAIN` output to refer to the optimizers as
"GPORCA" and "Postgres based planner" for improved clarity and consistency.
+
+- Optimizes ORCA's `Union All` performance by deriving statistics only for
columns used in the query output. This reduces unnecessary computation and
improves planning efficiency for queries with unused columns.
+
+### Transaction management
+
+#### Lock management
+
+- Updates logic to ignore invalidated slots while computing the oldest catalog
Xmin, reducing the risk of deadlocks and improving transaction concurrency.
+
+- Performs serializable isolation checks early for AO/CO tables, ensuring
stricter consistency guarantees and reducing the likelihood of isolation
conflicts.
+
+- Enhances the index creation process to prevent deadlocks by ensuring the
coordinator acquires an `AccessShareLock` on `pg_index` before dispatching a
synchronization query to segments, thus aligning `indcheckxmin` and avoiding
conflicts that GDD cannot resolve.
+
+#### Transaction performance and reliability
+
+- Avoids replaying DTX information in checkpoints for newly expanded segments,
preventing potential inconsistencies during recovery.
+
+- Adds `gp_stat_progress_dtx_recovery` for better observability of distributed
transaction recovery progress.
+
+- Improves error reporting for DTX protocol command dispatch errors, making it
easier to diagnose and resolve issues.
+
+- Allows utility mode on the coordinator to skip upgrading locks for `SELECT`
locking clauses, improving efficiency for maintenance operations.
+
+### Storage
+
+#### PAX table format
+
+- Introduces support for the PAX (Partition Attributes Across) storage format,
a hybrid approach that combines the advantages of row-based and column-based
storage. PAX is designed to deliver high performance for both data writes and
analytical queries, making it well-suited for OLAP workloads and large-scale
data analysis. For more details, see the [official
documentation](https://cloudberry.apache.org/docs/operate-with-data/pax-table-format/).
+
+#### AO/CO table enhancements
+
+- Optimizes `CREATE INDEX` operations on AO tables with scan progress
reporting, enhancing the efficiency of index creation.
+
+- Declares the connected variable as "volatile" to ensure proper handling
across `PG_TRY` and `PG_CATCH` blocks, mirroring PostgreSQL's best practices
for exception-safe variable usage in transaction control.
+
+#### Partitioning
+
+- Extends Orca's planning capabilities to include support for foreign
partitions, enabling optimized query execution for tables with a mix of foreign
and non-foreign partitions. The implementation introduces new logical and
physical operators for foreign partitions, supports static and dynamic
partition elimination, and integrates with any foreign data wrapper compatible,
enhancing performance and flexibility for external data queries.
+
+- Optimizes the analysis of leaf partitions in multi-level partition tables to
avoid unnecessary resampling of intermediate partitions.
+
+- Supports dynamic partition elimination (DPE) when using the GPORCA optimizer
for plans involving duplicate-sensitive random motions. This allows partition
selectors to pass through segment filters, enabling more efficient query plans
and reducing the number of scanned partitions.
+
+- Adds Dynamic Partition Elimination for Hash Right Joins, which enhances the
efficiency of join operations on partitioned tables.
+
+- Supports boolean static partition pruning in ORCA, enhancing the efficiency
of partition pruning during query optimization.
+
+- Enhances ORCA's query planning by incorporating partition key opfamily
checks during partition pruning to optimize data distribution and partition
scanning, ensuring correct motion triggering and partition scanning by aligning
predicate operators with the distribution or partition key's opfamily,
addressing issues with missing motion, incorrect direct dispatch, and
ineffective partition pruning.
+
+- Caches the last found partition in `ExecFindPartition` to improve
performance for repeated partition lookups.
+
+- Enables ORCA to derive dynamic table scan cardinality from leaf partitions,
addressing limitations in handling date and time-related data types by changing
their internal representation to doubles.
+
+- Enhances the DPv2 algorithm to include distribution spec information with
partition selectors, improving the efficiency of distributed query execution.
+
+- Introduces a new Non-Replicated distribution specification to optimize join
operations in database processing. By relaxing the enforcement of singleton
distribution for outer tables when the inner table is universally distributed,
it aims to reduce unnecessary data gathering and duplicate-sensitive motions,
thereby generating more efficient execution plans.
+
+#### Memory management
+
+- Implements a custom allocator to enable ORCA to use standard C++ containers,
addressing heap allocation management.
+
+- Refactors ORCA's memory pool by making several methods static and adds
assertions to ensure pointer safety.
+
+- Optimizes serialization of IMDId objects in ORCA to be lazy, improving
performance by deferring serialization until necessary. Improves optimization
time when loading objects into the relcache and when involving large and wide
partition tables.
+
+- Ensures that strings returned by `GetDatabasePath` are always freed using
`pfree`, preventing memory leaks.
+
+- Enables MPP (Massively Parallel Processing) support for `pg_buffercache` and
builds it by default, making buffer cache management more scalable and
efficient in distributed environments.
+
+- Introduces `pg_buffercache_summary()` to offer a high-level overview of
buffer cache activity.
+
+#### Metadata and access methods
+
+- Allows the definition of lock modes for custom reloptions, providing more
control over table and index access.
+
+- Supports specification of reloptions when switching storage models, allowing
seamless transitions between different storage formats.
+
+- Introduces a new struct member in `CreateStmt` to indicate the origin of the
statement, specifying if it was generated from GP style classic partitioning
syntax.
+
+- Adds syscache lookup for `pg_attribute_encoding` and `pg_appendonly`,
improving performance and efficiency in metadata access.
+
+- Introduces a new catalog entry in `pg_aggregate` to store replication safety
information for aggregates, allowing users to mark specific aggregates as safe
for execution on replicated slices via an optional repsafe parameter during the
`CREATE AGGREGATE` command. This helps optimize performance by avoiding
unnecessary broadcasts on large replicated datasets.
+
+- Enhances the dispatch of `ALTER DATABASE` commands by allowing options like
`ALLOW_CONNECTIONS` and `IS_TEMPLATE` to be dispatched to segments, ensuring
catalog changes are reflected everywhere.
+
+### Data loading and external tables
+
+#### External table enhancements
+
+- Adds clearer restrictions and warnings when exchanging or attaching external
tables. Writable external tables can no longer be used as partitions, and
attaching readable external tables without validation now triggers a warning
instead of requiring a no-op clause.
+
+- Disables `SET DISTRIBUTED REPLICATED` for `ALTER EXTERNAL TABLE` to prevent
misuse and ensure consistency.
+
+#### Foreign data wrapper
+
+- Improves performance and stability for `gpfdist` external tables. Adds TCP
keepalive support for more reliable reads, and increases the default buffer
size to enhance write throughput for writable external tables.
+
+- ORCA now falls back to the planner for queries involving foreign partitions
using `greenplum_fdw`, preventing crashes caused by incompatible execution
behavior. Queries on non-partitioned foreign tables using `greenplum_fdw`
remain supported by ORCA.
+
+### High availability and high reliability
+
+#### Backup and disaster recovery
+
+- Improves archiver performance when handling many `.ready` files by reducing
redundant directory scans. This change speeds up WAL archiving, especially when
`archive_command` has been failing and many files have accumulated.
+
+- `gp_create_restore_point()` can only be executed on the Coordinator node.
Calling this function on a segment node will result in an error. The function
returns a structured record value, including the restore point name and LSN,
which you can view directly by running `SELECT * FROM
gp_create_restore_point()`.
+
+#### WAL
+
+- Improves WAL replication management by restricting a coordinator-specific
tracking mechanism to the coordinator only. This change simplifies primary
segment behavior and aligns replication practices more closely across segments.
No functional change for users, but helps reduce unnecessary complexity in WAL
retention logic.
+
+- Enhances WAL retention logic to improve reliability of incremental recovery
using `pg_rewind`. Physical replication slots now retain WAL files up to the
last common checkpoint, reducing risk of missing WAL during recovery. This
change also simplifies the underlying logic and adds test coverage for WAL
recycling.
+
+- Switches WAL replication connections to use the standard libpq protocol
instead of a legacy internal one. This improves compatibility and reliability
of replication behavior. Also fixed test failures and improved error handling
for replication connections.
+
+### Security
+
+#### DB Operations
+
+- `REFRESH MATERIALIZED VIEW CONCURRENTLY` runs all internal operations in the
correct security context to prevent potential privilege escalation. This change
ensures safer execution by restricting operations to the appropriate permission
level.
+
+- Improves internal handling of new `aoseg` and `aocsseg` tuples by aligning
tuple freezing behavior with other catalog operations. This change enhances
consistency with upstream PostgreSQL practices and removes the need for
`CatalogTupleFrozenInsert`.
+
+#### System processes
+
+- Orphaned file checks now exclude idle sessions during safety validation.
This prevents unnecessary errors when persistent connections from services are
active, allowing the detection process to complete successfully.
+
+- Adds a safety check in backend signal handlers to ensure signals are handled
by the correct process. This prevents unintended shared memory access by child
processes and improves overall process isolation and stability.
+
+- Improves process safety by preventing child processes spawned via `system()`
from calling `proc_exit()`. This avoids potential corruption of shared memory
structures and ensures only the parent process performs cleanup operations.
+
+- Removes the permission check for `cpu.pressure` when using
`gp_resource_manager='group-v2'`. This prevents startup failures on systems
where PSI is disabled, without affecting resource management functionality.
+
+#### Replication/Mirrorless clusters
+
+- Improves replication error reporting by setting persistent `WalSndError`
when a replication slot is invalidated. This ensures accurate error visibility
in `gp_stat_replication`.
+
+#### Permission management
+
+- Strengthens security by rejecting extension schema or owner substitutions
containing potentially unsafe characters like `$`, `'`, or `\`. This prevents
SQL injection in extension scripts and protects against privilege escalation in
certain non-bundled extensions.
+
+- Creating or assigning roles to the `system_group` resource group now results
in an error, as this group is reserved for internal system processes only.
+
+- Reverts the restriction requiring superuser privileges to set the
`gp_resource_group_bypass` GUC. This allows applications like GPCC to function
more easily while still limiting resource impact.
+
+- Altering the `mpp_execute` option of a foreign server or wrapper is now
disallowed to prevent inconsistencies in foreign table distribution policies.
Changing these options previously could result in outdated cached plans and
incorrect query execution. This update ensures plan correctness by enforcing
cache invalidation only when appropriate.
+
+#### pgcrypto
+
+- Adds support for FIPS mode in `pgcrypto`, controlled by a GUC. This allows
Cloudberry to operate in FIPS-compliant environments when linked with a
supported FIPS-enabled OpenSSL version. Certain ciphers are disabled in this
mode to comply with FIPS requirements.
+
+- `pgcrypto` now allows enabling FIPS mode even on systems where FIPS is not
pre-enabled by the OS or environment. This change removes the dependency on
`FIPS_mode()` checks, offering more flexibility in managing FIPS compliance
through the database.
+
+### Resource management
+
+#### Resource group management
+
+- Renames the `memory_limit` parameter to `memory_quota` in `CREATE/ALTER
RESOURCE GROUP` to clarify its meaning and unit.
+
+- Adds a new system view `gp_toolkit.gp_resgroup_status_per_segment` to
monitor memory usage per resource group on each segment. This view helps
database administrators track real-time vmem consumption (in MB) when resource
group-based resource management is enabled.
+
+- Improves logging behavior when memory usage reaches Vmem or resource group
limits. The system now prints log messages directly to stderr to avoid stack
overflow errors during allocation failures.
+
+- Removes unnecessary permission check for `cpu.pressure` when using the
`group-v2` resource manager. This prevents startup failures on systems where
PSI is not enabled, improving compatibility across Linux distributions.
+
+#### Logging and monitoring
+
+- Adds additional log messages for GDD backends to help investigate
memory-related issues. These logs provide better visibility into backend
behavior during high memory usage scenarios.
+
+- Adds a log ignore rule for "terminating connection" messages to reduce noise
in test outputs. This helps avoid unnecessary diffs in CI for tests that
involve connection termination.
+
+- Adds more verbose logging to `ResCheckSelfDeadlock()`.
+
+- Logs queue IDs and portal IDs in resource queue logs.
+
+- Dumps more information when releasing resource queue locks to aid in
troubleshooting and monitoring.
+
+- Uses `ERROR` for dispatcher liveness checks.
+
+- Enhances logging for dispatch connection liveness checks to improve clarity
during connection failures. Logs now include more accurate error messages based
on socket state and system errors.
+
+#### Platform compatibility and build
+
+- Improves `gp_sparse_vector` compatibility with ARM platforms by fixing type
handling in serialization logic. This ensures consistent behavior across
different architectures.
+
+- Adds support for `sigaction()` on Windows to align signal handling behavior
with other platforms. This reduces platform-specific differences and improves
code consistency.
+
+- Updates ACL mode type in ORCA to match the parser's definition, ensuring
consistent type usage.
+
+#### System views and statistics
+
+- Improves join cardinality estimation for projected columns that preserve the
number of distinct values (NDVs), such as additions or subtractions with
constants. This allows the optimizer to use underlying column histograms for
more accurate estimates, improving plan quality for queries with scalar
projections in join conditions.
+
+- Increases precision for frequency and NDV values in ORCA when processing
metadata population scripts (MDPs). This change ensures consistent behavior
between MDPs and live database queries, reducing discrepancies caused by
rounding small values.
+
+- ORCA now considers null value skew when costing redistribute motions,
improving plan accuracy for queries involving columns with many nulls. This
helps avoid performance issues caused by data being unevenly distributed across
segments.
+
+- ORCA now supports extended statistics to improve cardinality estimation for
queries with correlated columns. This allows the optimizer to use real
data-driven correlation factors instead of relying on arbitrary GUC settings,
leading to more accurate query plans.
+
+- Introduces `gp_log_backend_memory_contexts` to log memory contexts across
segments, with optional targeting by content ID. This enhances observability
and helps diagnose memory issues in distributed queries.
+
+- ORCA now supports statistics derivation for predicates involving different
time-related data types, such as date and timestamp. This improves plan
accuracy and performance for queries comparing mixed temporal types.
+
+- Autostats now uses `SKIP LOCKED` for `ANALYZE` operations to avoid blocking
on locks, reducing the risk of deadlocks and improving predictability. This
behavior is enabled by default and can be controlled using the
`gp_autostats_lock_wait` GUC.
+
+- ORCA now supports `STATS_EXT_NDISTINCT` extended statistics for estimating
cardinality on correlated columns. This improves accuracy for queries using
`GROUP BY` or `DISTINCT` on such columns.
+
+#### Network connections
+
+- Marks `gp_reject_internal_tcp_connection` as defunct to improve reliability
of internal QD-to-entry DB connections. These connections over TCP/IP are now
treated as authenticated by default, preventing authentication errors caused by
`pg_hba.conf` settings.
+
+### Tools and utilities
+
+#### analyzedb
+
+- `analyzedb` now includes materialized views in its list of tables to
analyze. This improves the performance immediately after analysis.
+
+#### gpexpand
+
+- `gpexpand` now includes a cluster health check to ensure all segments are up
and in their preferred roles before proceeding. This prevents incorrect port
assignments and avoids potential issues during expansion when nodes are not in
a stable state.
+
+#### gp_toolkit
+
+- Added an update path for the `gp_toolkit` extension to version 1.6. This
update renames the column `memory_limit` to `memory_quota` in the
`gp_resgroup_config` view for improved clarity. Users can apply the update
using `ALTER EXTENSION gp_toolkit UPDATE TO '1.6'`.
+
+## Bug fixes
+
+- Fixed data loss caused by incorrect shared snapshot handling.
+- Fixed memory corruption during AOCO ADD COLUMN abort.
+- Fixed checkpoint WAL replay failure.
+- Fixed incorrect results when using UNION for RECURSIVE_CTE.
+- Fixed incorrect results from hash joins on char columns.
+- Fixed incorrect results produced by WITH RECURSIVE queries.
+- Fixed incorrect results when a REPLICATED table is unioned with a
DISTRIBUTED table.
+- Fixed incorrect results when the outer query had ORDER BY after a LATERAL
subquery.
+- Fixed incorrect behavior of DELETE with split update.
+- Fixed incorrect results when using direct dispatch.
+- Fixed memory leaks in ORCA and various components.
+- Fixed long-running execution with bitmap indexes.
+- Fixed redundant SORT enforcement on group aggregates.
+- Fixed incorrect index position in target list in ExecTupleSplit.
+- Fixed incorrect value in the cpu_usage column returned by
`pg_resgroup_get_status()`.
+- Fixed incorrect behavior of gp_toolkit.gp_move_orphaned_files.
+- Fixed incorrect results in multi-stage aggregate queries.
+- Fixed incorrect plan and output in multi-stage aggregate queries.
+- Fixed incorrect reltuples value after VACUUM.
+- Fixed incorrect index->reltuples value after VACUUM.
+- Fixed a vulnerability where LDAP leaked user information.
+- Fixed incorrect permissions warning on the pgpass file.
+- Fixed incorrect handling of ONLY keyword for multiple tables in GRANT/REVOKE
statements.
+- Fixed incorrect permissions in resource management DDL.
+- Fixed incorrect security context in REFRESH MATERIALIZED VIEW CONCURRENTLY.
+- Fixed deadlock between coordinator and segments.
+- Fixed race condition in CTE reader-writer communication.
+- Fixed race condition when invalidating obsolete replication slots.
+- Fixed deadlock by allowing concurrent creation of non-first indexes on AO
tables.
+- Fixed locking issue when opening range tables inside `ExecInitModifyTable()`.
+- Fixed incorrect unlock mode in DefineRelation.
+- Fixed incorrect locking in partition distribution policies.
+- Fixed issues with rle_type when converting a table from AO to AOCO.
+- Fixed incorrect handling of empty ranges and NULL values in BRIN indexes.
+- Fixed incorrect handling of NULL values when merging BRIN summaries.
+- Fixed incorrect TIDs order when building bitmap indexes.
+- Fixed possible inconsistency between bitmap LOV table and its index.
+- Fixed incorrect behavior of VACUUM in AO tables with indexes.
+- Fixed incorrect handling of TOAST values for invisible AppendOptimized
tuples during VACUUM.
+- Fixed ORCA's invalid processing of nested SubLinks under aggregates.
+- Fixed ORCA's invalid processing of nested SubLinks referenced in GROUP BY
clauses.
+- Fixed ORCA's invalid processing of nested SubLinks with GROUP BY attributes.
+- Fixed incorrect predicate pushdown when using casted columns.
+- Fixed incorrect join condition loss after pulling up sublinks to join nodes.
+- Fixed incorrect hash-key generation for Redistribute Motion in multi-DQA
expressions.
+- Fixed incorrect plan generation for SEMI JOIN with RANDOM distributed tables.
+- Fixed incorrect behavior of gp_stat_bgwriter.
+- Fixed incorrect monitoring in pg_stat_slru.
+- Fixed incorrect monitoring in gp_stat_progress_dtx_recovery.
+- Fixed incorrect monitoring in pg_resgroup_get_status().
+- Fixed incorrect monitoring in gp_toolkit.gp_resgroup_config.
+- Fixed compilation issues on various platforms.
+- Fixed documentation and comment typos.
+- Fixed build system and Makefile issues.
+- Fixed various memory leaks and resource management issues.
+- Fixed various error handling and logging improvements.
+- Fixed mismatched types.
+- Fixed the ORCA preprocess step for queries with the Select-Project-NaryJoin
pattern.
+- Fixed the missing discard_output variable in shared scan node functions.
+- Fixed the crash caused by running VACUUM AO_AUX_ONLY on an AO-partitioned
table.
+- Fixed an obvious memory leak in _bitmap_xlog_insert_bitmapwords().
+- Fixed a memory leak in the merge join implementation.
+- Fixed the issue where the token for user ID xxx did not exist.
+- Fixed the issue where plan hints could not derive table descriptors.
+- Fixed the issue where inject_fault suspend could not be canceled.
+- Fixed fallback in debug builds due to scalars with invalid return types.
+- Fixed relptr encoding of the base address.
+- Fixed visimap consults for unique checks during UPDATE operations.
+- Fixed the issue where external table location URIs containing | caused
errors.
+- Fixed handling of the time command output containing commas.
+- Fixed a small overestimation of the output length of base64 encoding.
+- Fixed gp_toolkit.__gp_aocsseg_history crash on non-AO columnar tables.
+- Fixed a race condition between termination and resqueue wakeup.
+- Fixed a statement leak involving self-deadlocks.
+- Fixed the detection of child output columns when the parent is a UNION
during join pruning.
+- Fixed a query crash when using a negative memory_limit value in resource
groups.
+- Fixed issues in pgarch new directory-scanning logic.
+- Fixed a memory leak in the FTS PROBE process.
+- Fixed check_multi_column_list_partition_keys.
+- Fixed a memory leak caught via ICW with memory check enabled.
+- Fixed query hang and fallback issues involving CTEs on replicated tables.
+- Fixed the unrecognized join type error with LASJ Not-In and network types.
+- Fixed issues in upgrade_adapt.sql related to queries using WITH OIDS.
+- Fixed the double declaration of check_ok() in pg_upgrade.h.
+- Fixed logic error with subdirectories generated by pg_upgrade for internal
files.
+- Fixed a typo in the pg_upgrade file header.
+- Fixed the bug where PL/Python functions caused the master process to reset.
+- Fixed the Shared Scan hang issue involving initplans.
+- Fixed motion toast error.
+- Fixed a memory leak related to fsync in AO tables.
+- Fixed CDatumSortedSet handling of empty arrays that caused errors in ORCA.
+- Fixed ORCA returning incorrect column type modifier information.
+- Fixed DbgStr output when printing DP structs in ORCA.
+- Fixed the comment on performDtxProtocolPrepare.
+- Fixed a memory leak in Dynamic Index, IndexOnly, and BitmapIndex scans
during execution.
+- Fixed the memory accounting bug when moving MemoryContext under another
accounting node.
+- Fixed the ALTER TABLE ALTER COLUMN TYPE issue that reuses an incorrect index.
+- Fixed query fallback when a subquery is present within LEAST() or GREATEST().
+- Fixed the typo in timestamp.
+- Fixed unexpected warnings related to pg_stat_statements node types.
+- Fixed the crash involving initplan in MPP.
+- Fixed LeftJoinPruning pruning essential LEFT JOINs.
+- Fixed the SET command that incorrectly sends DTX protocol commands.
+- Fixed the segmentation fault in addOneOption().
+- Fixed parallel_retrieve_cursor diffs.
+- Fixed gpdiff.pl to ignore information when EXPLAIN ignores costs.
+- Fixed the uninitialized-use warning in CTranslatorDXLToPlStmt.cpp.
+- Fixed the bug where the LOCALE flag cannot be used with a string pattern.
+- Fixed a typo in cdbmutate.c.
+- Fixed CColRefSet debug printing.
+- Fixed ORCA producing incorrect plans when handling SEMI JOIN with RANDOM
distributed tables.
+- Fixed orphaned temp tables on the coordinator.
+- Fixed the segmentation fault caused by concurrent INSERT ON CONFLICT and
DROP TABLE.
+- Fixed redundant columns in a multi-stage aggregate plan.
+- Fixed the import of ICU collations in pg_import_system_collations().
+- Fixed the error: "Cannot add cell to table content: total cell count of XXX
exceeded."
+- Fixed orphaned temporary namespace catalog entries left on the coordinator.
+- Fixed REFRESH MATERIALIZED VIEW on AO tables with indexes.
+- Fixed the use of PORTNAME in the gp_toolkit Makefile.
+- Fixed pg_stat_activity display for bypassed and unassigned queries.
+- Fixed the recursive CTE MergeJoin that involved a motion on WTS.
+- Fixed the column width display for partitioned tables.
+- Fixed the LDAP crash when ldaptls=1 and ldapscheme is not set.
+- Fixed the gpstop pipeline flakiness after the referenced change.
+- Fixed the ANALYZE bug in expand_vacuum_rels.
+- Fixed the compilation error.
+- Fixed the ORCA crash due to improper colref mapping with CTEs.
+- Fixed the bug where gpload insert mode was not included in a transaction.
+- Fixed the bug where resgroup total wait time was always zero.
+- Fixed the gpcheckcat error against pg_description.
+- Fixed flakiness caused by waiting for a different number of fault triggers.
+- Fixed the bug involving RelabelType in the GROUP BY clause.
+- Fixed the planner error with multiple copies of an AlternativeSubPlan.
+- Fixed the issue with bitmap indexes.
+- Fixed the bug in HashAgg related to selective-column-spilling logic.
+- Fixed the bug in disk-based hash aggregation.
+- Fixed the pipeline stall issue in LookupTupleHashEntryHash().
+- Fixed the use of version in ArgumentParser, which is deprecated.
+- Fixed the use of BaseException.message, which has been deprecated since
Python 2.6.
+- Fixed the case pg_rewind_fail_missing_xlog.
+- Fixed the compiler warning for gcc-12.
+- Fixed support for the DEFERRABLE keyword on primary and unique keys.
+- Fixed the unlocking of pruned partitions in partitioned tables.
+- Fixed the crash in ORCA involving skip-level correlated queries.
+- Fixed the removal of Assert statements in release builds.
+- Fixed the typo in comments: JOIN_SEMI_DEDUP/JOIN_SEMI_DEDUP_REVERSE.
+- Fixed the issue where REORGANIZE=TRUE did not redistribute
randomly-distributed tables.
+- Fixed the core dump caused by concurrent updates on partition tables in
DynamicScan.
+- Fixed the typo: ANALZE to ANALYZE.
+- Fixed the issue where cgroup v1 cpu_quota_us cannot be larger than its
parent's value.
+- Fixed indentation and trailing whitespace in UDFs in
resgroup/resgroup_auxiliary_tools_v1.
+- Fixed the name of cpu_hard_quota_limit in resgroup_syntax.sql.
+- Fixed multi-row DEFAULT handling in INSERT ... SELECT rules.
+- Fixed invalid function references in several comments.
+- Fixed the bug where COPY FORM does not throw ERROR: extra data after last
expected column.
+- Fixed the issue where file .204800 was not being checked in
ao_foreach_extent_file.
+- Fixed the issue of incorrectly incrementing the command counter.
+- Fixed the coordinator crash in MPPnoticeReceiver.
+- Fixed the dangling pointer in ExecDynamicIndexScan().
+- Fixed the ORCA bug that incorrectly removed required redistribution motion
when using GROUP BY over gp_segment_id.
+- Fixed header handling in url_curl.c.
+- Fixed ao_filehandler to support new attnum to filenum mapping changes.
+- Fixed pg_aocsseg to work with attnum to filenum mapping.
+- Fixed a comment in pg_dump.
+- Fixed the ORCA build break.
+- Fixed the gpconfig SSH retry undefined parameter issue.
+- Fixed the stale gp_default_storage_options comment.
+- Fixed the bug: unrecognized node type: 147.
+- Fixed spelling errors identified by lintian.
+- Fixed the bypass catalog unit test.
+- Fixed erroneous Valgrind markings in AllocSetRealloc.
+- Fixed the legacy bug in the DatabaseFrozenIds lock.
+- Fixed the mirror checkpointer error on the ALTER DATABASE query.
+- Fixed the bug: get_ao_compression_ratio() failed on root partitioned tables
with AO children.
+- Fixed the issue where InterruptHoldoffCount was not being reset.
+- Fixed gpexpand failure caused by an event trigger.
+- Fixed missing redistribute for CTAS or INSERT INTO on randomly distributed
tables when using ORCA.
+- Fixed the double free of remapper->typmodmap in TeardownUDPIFCInterconnect().
+- Fixed the bug in the upstream-merged COMMIT AND CHAIN feature.
+- Fixed inconsistency between gp_fastsequence row and index after a crash.
+- Fixed the typo allocatd to allocated.
+- Fixed the error: unrecognized node type: 145 in transformExpr.
+- Fixed build error caused by unused variable.
+- Fixed the issue where the distribution key was missing when creating a stage
table.
+- Fixed the regex for etc/environment.d.
+- Fixed the string comparison warning.
+- Fixed obsolete references to SnapshotNow in comments.
+- Fixed pull-up error when the target list contains a RelabelType node.
+- Fixed the issue where index DDL operations were recorded in QEs'
pg_last_stat_operation.
+- Fixed two compiler warnings.
+- Fixed the wrong value of maxAttrNum in TupleSplitState.
+- Fixed the bug of incorrect index position in target list in ExecTupleSplit.
+- Fixed the format error of the library name on Mac M1.
+- Fixed the pg_resgroup_get_status_kv() function.
+- Fixed interconnect bugs in ic_proxy_ibuf_push().
+- Fixed memory leaks in auto_explain.
+- Fixed ic_proxy compilation when HOST_NAME_MAX is unavailable.
+- Fixed duplicate filters caused by reversed operator argument order.
+- Fixed pg_rewind when the log file is a symbolic link.
+- Fixed and enabled 64-bit bitmapset and updated visimap.
+- Fixed the hang caused by multi-DQA with filters in the planner.
+- Fixed the bogus ORCA plan that incorrectly joins a CTE and a REPLICATED
table.
+- Fixed the error in ATSETAM when applied to ao_column with a dropped column.
+- Fixed the LWLockHeldByMe assert failure in SharedSnapshotDump.
+- Fixed the KeepLogSeg() unit test.
+- Fixed the race condition when invalidating obsolete replication slots.
+- Fixed the uninitialized value in segno calculation.
+- Fixed issues in the invalidation logic for obsolete replication slots.
+- Fixed checkpoint signalling.
+- Fixed memory overrun when querying pg_stat_slru.
+- Fixed the bug where ORCA fails to decorrelate subqueries ordered by outer
references.
+- Fixed unused variable compile warnings.
+- Fixed the bug where NestLoop join fails to materialize the inner child in
some cases.
+- Fixed COPY execution via FDW on coordinator as executor.
+- Fixed inFunction usage for auto_stats in CTAS.
+- Fixed a compiler warning.
+- Fixed the syntax error with CREATE MATERIALIZED VIEW.
+- Fixed the issue preventing temporary table creation LIKE existing tables
with comments.
+- Fixed and rewrote IndexOpProperties API.
+- Removed redundant Get/SetStaticPruneResult usage.
+- Fixed EPQ handling for DML operations.
+- Fixed gpcheckperf failure when using -V with -f option.
+- Fixed possible mirror startup failure triggered by FTS promotion.
+- Fixed the parallel retrieve cursor issue when selecting transient record
types.
+- Fixed the resource management DDL warning: unrecognized node type when
log_statement='ddl'
+- Fixed the resgroup init error when many cores are present in cpuset.cpus.
+- Fixed resqueue malfunction when using JDBC extended protocol.
+- Fixed the missing LOCKING CLAUSE on foreign tables when ORCA is enabled.
+- Fixed the test_consume_xids behavior where it consumes one more transaction
ID than expected.
+- Fixed the ONLY keyword handling for multiple tables in GRANT/REVOKE
statements.
+- Fixed the regression test to ignore memory usage values in JSON format
EXPLAIN output.
+- Fixed relcache lookup in ORCA when selecting from sequences.
+- Fixed missing WAL files required by pg_rewind.
+- Fixed the gp_dqa test to explicitly ANALYZE tables.
+- Fixed the crash of AggNode in the executor caused by an ORCA plan.
+- Fixed the resource group cpuset test case.
+- Fixed the compiler warning caused by gpfdist with compressed external tables.
+- Fixed link issues on macOS and Windows.
+- Fixed failure when DynamicSeqScan contains a SubPlan.
+- Fixed the error: cache lookup failed for type 0.
+- Fixed the multi-level correlated subquery bug.
+- Fixed checkpoint WAL replay failure.
+- Fixed the check for BufFileRead() in ExecHashJoinGetSavedTuple().
+- Fixed the test extension to allow executing SQL code inside a Portal.
+- Fixed resgroup view test cases.
+- Fixed incorrect DISTKEY assignment when copying partitions on segments.
+- Fixed ic-proxy mis-disconnecting addresses after reloading the config file.
+- Fixed the gpcheckcat check on partition distribution policies.
+- Fixed colid remapping in disjunctive constraints.
+- Fixed the Makefile by removing the tablespace-step target from all.
+- Fixed CBitSet intersection logic in ORCA.
+- Fixed the query preprocessor for nested Select-Project-NaryJoin patterns.
+- Fixed incorrect unlock mode in DefineRelation.
+- Fixed the upgrade process for external tables with dropped columns.
+- Fixed the formatting issue in SECURITY.md.
+- Fixed gp_gettmid to return the correct startup timestamp.
+- Fixed the gpload regression test failure when the OS user is not gpadmin.
+- Fixed the compiler warning in appendonlyblockdirectory.c.
+- Fixed missing reloptions in partition roots created using Cloudberry syntax.
+- Fixed the crash when calling get_ao_compression_ratio on HEAP tables.
+- Fixed incorrect sortOp and eqOp values generated by
IsCorrelatedEqualityOpExpr.
+- Fixed the dependency bug involving minirepro and materialized views.
+- Fixed recursion handling in ALTER TABLE ... ENABLE/DISABLE TRIGGER.
+- Fixed SPE plans to display Partitions selected: 1 (out of 5).
+- Fixed incorrect hash-key generation for Redistribute Motion when creating
paths for multi-DQA expressions.
+- Removed gp_enable_sort_distinct and noduplicates optimizations.
+- Fixed gpinitsystem Behave tests that use environment variables.
+- Fixed false alarms in gpcheckcat for pg_default_acl.
+- Fixed gpinitsystem failure with custom locale settings.
+- Fixed a panic in the greenplum_fdw test.
+- Fixed the failure in bitmap index null-array condition.
+- Fixed the compilation warning in gram.y.
+- Fixed multiple issues related to DistributedTransaction handling.
+- Fixed compile-time warnings in pg_basebackup code.
+- Fixed gplogfilter to correctly generate CSV output.
+- Fixed the assert in the OpExecutor node.
+- Fixed improper copying of group statistics in ORCA.
+- Fixed error reporting after ioctl() call in pg_upgrade --clone mode.
+- Fixed replay of CREATE DATABASE records on standby.
+- Fixed a minor memory leak in pg_dump.
+- Fixed parallel restore of foreign keys to partitioned tables.
+- Fixed the issue where the pg_appendonly entry was not removed during
AO-to-HEAP table conversion.
+- Fixed assertion failure and segmentation fault in the backup code.
+- Fixed fallback behavior for non-default collations.
+- Fixed the subtransaction test for Python 3.10.
+- Fixed Windows client compilation of libpgcommon.
+- Fixed compiler warnings introduced by the Dynamic Scan commit.
+- Fixed the issue where CREATE OR REPLACE TRANSFORM failed.
+- Fixed compiler warnings for non-assert builds.
+- Fixed lock assertions in dshash.c.
+- Fixed \watch interaction with libedit on C.
\ No newline at end of file
diff --git a/static/img/blog/whats-new-in-apache-cloudberry-2.0.0.png
b/static/img/blog/whats-new-in-apache-cloudberry-2.0.0.png
new file mode 100644
index 0000000000..5d46da82e8
Binary files /dev/null and
b/static/img/blog/whats-new-in-apache-cloudberry-2.0.0.png differ
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]