Feng Zhang created SEDONA-630:
---------------------------------
Summary: Improve ST_Union_Aggr performance
Key: SEDONA-630
URL: https://issues.apache.org/jira/browse/SEDONA-630
Project: Apache Sedona
Issue Type: Improvement
Reporter: Feng Zhang
The function ST_Union_Aggr is slow. This is because this function is
implemented in the most inefficient way: union geometry one after another.
```scala
class ST_Union_Aggr extends Aggregator[Geometry, Geometry, Geometry] with
TraitSTAggregateExec {
def reduce(buffer: Geometry, input: Geometry): Geometry = {
if (buffer.equalsExact(initialGeometry)) input
else buffer.union(input)
}
def merge(buffer1: Geometry, buffer2: Geometry): Geometry = {
if (buffer1.equals(initialGeometry)) buffer2
else if (buffer2.equals(initialGeometry)) buffer1
else buffer1.union(buffer2)
}
}
```
There are multiple improvements in JTS that improved the performance of union
multiple geometry objects:
- Cascaded union:
https://lin-ear-th-inking.blogspot.com/2007/11/fast-polygon-merging-in-jts-using.html
- Better overlay implementation (OverlayNG):
https://lin-ear-th-inking.blogspot.com/2020/05/jts-overlay-next-generation.html
We should switch to the new `OverlayNGRobust.*union*` interface to get the best
thing JTS provides. This also requires us to keep all reduced geometries in
memory, we may maintain a buffer with maximum size, and run
OverlayNGRobust.union when the maximum size is exceeded. This will help
balancing the speed and memory usage.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)