jiayuasu commented on pull request #488: URL: https://github.com/apache/incubator-sedona/pull/488#issuecomment-731953167
Hi all @Imbruced @netanel246 @Sarwat I managed to fix all the issues. In a nutshell: ## Changes on JTS side This PR is relevant to two JTS PRs made by me: * Add the check of userData in equals(object o): locationtech/jts#633 : this PR is no longer needed. * Change the access modifiers of tree indexes and add setter/getters: https://github.com/locationtech/jts/pull/634: this PR has been updated. We only need to change a few variables to package-private. I am confident that JTS committers will accept this PR soon. ## Changes on Sedona side 1. The following spatial partitioning methods, Equal grids, R-Tree, Voronoi, and Hilbert curve, have been removed because they can no longer yield correct results due to the change in JTS. Only Quad-Tree and KDB-Tree are kept. 2. JoinQuery.SpatialJoinQuery/DistanceJoinQuery now returns <Geometry, List<Geometry>> instead of <Geometry, HashSet<Geometry>> because we can no longer use HashSet in Sedona for duplicates removal. 3. The duplicates preserving strategy in JoinQuery.SpatialJoinQuery/DistanceJoinQuery is changed. 1. After this PR, duplicate geometries present in the input queryWindowRDD, regardless of their non-spatial attributes, will not be reflected in the join results. Duplicate geometries present in the input spatialRDD, regardless of their non-spatial attributes, will be reflected in the join results. 2. Before this PR, duplicate geometries present in the input queryWindowRDD, if their non-spatial attributes are also same, will be reflected in the final result. Duplicate geometries present in the input SpatialRDD (if their non-spatial attributes are also same), will not be reflected in the final result. 4. Now use Sedona Core GeomUtils to do Geometry print and equalityCheck. Adapter.scala in Sedona SQL has been updated accordingly but no API change needed. 5. JoinQuery.SpatialJoinQueryFlat/DistanceJoinQueryPlat still have the same signature. No change in terms of API and behavior. Duplicate geometries present in both input datasets will be reflected in the output. The query correctness is still guaranteed. 6. IndexSerializer of JTS Quad-Tree and R-Tree are now under org.locationtech.jts.index.quadtree and strtree. The purpose is to leverage "package private" to minimize the Sedona changes to JTS core. 7. JTS version has been upgrade to JTS 1.18.0-SNAPSHOT is to get ready for the next JTS release which includes my JTS PR. 8. GeoTools upgraded to 24.0 and Jts2GeoJSON upgraded to 1.4.3. Jts2GeoJSON 1.4.3 (the latest version) does not support JTS 1.17+. To overcome this, I copied "GeoJSONWRITER" from Jts2GeoJSON to Sedona "GeoJSONWRITERNew". This fixed the version conflict but we may need to list Jts2GeoJSON's MIT license somewhere in Sedona. ## To-Dos 1. Wait for JTS to accept the PR and release JTS 1.18 on Maven Central. Then I will change the JTS dependency in Sedona to its Maven coordinate. I expect this to be done in the coming days. 2. @Imbruced Can you please fix the Python APIs based on the aforementioned Change 1, 2, 3, 4. I am sure that Python APIs cannot compile now. And some test cases may fail because 1 and 3. For Change 6,7,8, I am not sure whether Python APIs should be updated. Please check. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
