Re: [I] Breaking change between 1.5.3 and 1.6.0 affecting RASTER functions java.lang.NoSuchMethodError: void org.geotools.coverage.grid.GridGeometry2D [sedona]

2024-06-14 Thread via GitHub


golfalot closed issue #1477: Breaking change between 1.5.3 and 1.6.0 affecting 
RASTER functions java.lang.NoSuchMethodError: void 
org.geotools.coverage.grid.GridGeometry2D
URL: https://github.com/apache/sedona/issues/1477


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] Breaking change between 1.5.3 and 1.6.0 affecting RASTER functions java.lang.NoSuchMethodError: void org.geotools.coverage.grid.GridGeometry2D [sedona]

2024-06-13 Thread via GitHub


golfalot opened a new issue, #1477:
URL: https://github.com/apache/sedona/issues/1477

   ## Expected behavior
   
   return result rows/table
   
   ## Actual behavior
   crash with stack trace
   
   java.lang.NoSuchMethodError: 'void 
org.geotools.coverage.grid.GridGeometry2D.(org.opengis.coverage.grid.GridEnvelope,
 org.opengis.referencing.datum.PixelInCell, 
org.opengis.referencing.operation.MathTransform, 
org.opengis.referencing.crs.CoordinateReferenceSystem, 
org.geotools.util.factory.Hints)
   
   ## Steps to reproduce the problem
   ```python
   from sedona.spark import SedonaContext
   config = SedonaContext.builder() .\
   config('spark.jars.packages',
  'org.apache.sedona:sedona-spark-shaded-3.4_2.12-1.6.0,'
  'org.datasyslab:geotools-wrapper:1.6.0-28.2'). \
   getOrCreate()
   ```
   
   ```python
   from pyspark.sql import functions as f
   df = 
sedona.read.format("binaryFile").load("/raw/GIS_Raster_Data/samples/test.nc")
   df2 = df.withColumn("raster", f.expr("RS_FromNetCDF(content, 'O3')"))
   df2.createOrReplaceTempView("raster_table")
   
   # this command throws the error
   sedona.sql("SELECT RS_Value(raster, 3, 4, 1) FROM raster_table").show()
   ```
   Raster sources from: 
https://github.com/apache/sedona/blob/master/spark/common/src/test/resources/raster/netcdf/test.nc
   
   
   
   
   
   sedona = SedonaContext.create(config)
   
   ## Settings
   
   Sedona version = 1.6.0
   
   Apache Spark version = 3.4
   
   API type =Python
   
   Scala version = 2.12.17
   
   Java version = 11
   
   Python version = 3.10
   
   Environment = Azure Synapse Spark Pool
   
   # Additional background
   
   We're using Azure Synapse with DEP (data exfiltration protection enabled) 
which means no outbound internet access, so all packages must be obtained 
manually before being uploaded as "Workspace packages" which can then enabled 
on the spark pools.
   
   ## A configuration that works (no error)
   
   ### Spark pool
   
   - Apache Spark version = 3.4
   - Scala version = 2.12.17
   - Java version = 11
   - Python version = 3.10
   
    Java
   
   - sedona-spark-shaded-3.4_2.12-1.5.3.jar
   - geotools-wrapper-1.5.3-28.2.jar
   
    Python
   
   - 
apache_sedona-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
   - shapely-2.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
   
   ## A configuration that causes the error
   
   ### Spark pool (identical to above)
   
   - Apache Spark version = 3.4
   - Scala version = 2.12.17
   - Java version = 11
   - Python version = 3.10
   
   ### Packages
   
    Java
   
   - sedona-spark-shaded-3.4_2.12-1.6.0.jar
   - geotools-wrapper-1.6.0-28.2.jar
   
    Python
   
   - click_plugins-1.1.1-py2.py3-none-any.whl
   - affine-2.4.0-py3-none-any.whl
   - 
apache_sedona-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
   - cligj-0.7.2-py3-none-any.whl
   - rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl
   - shapely-2.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
   - snuggs-1.4.7-py3-none-any.whl
   
   **stating the obvious:** There are many packages listed in the failing 
scenario. See below the convaluted steps need to establish what packages are 
required for a baseline Synapse Spark pool.
   
   
   # How to establish Python package dependencies for Synsapse Spark pool 
   ## Identify Operating System
   
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-34-runtime
   
   => Mariner 2.0
   
   ## Create a VM and apply baseline configuration
   
https://github.com/microsoft/azurelinux/blob/2.0/toolkit/docs/quick_start/quickstart.md
   
   ### Get conda
   ```bash
   wget 
https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
   sudo bash Miniforge3-Linux-x86_64.sh -b -p /usr/lib/miniforge3
   export PATH="/usr/lib/miniforge3/bin:$PATH"
   ```
   
   ### Apply baseline Synapse configuration
   ```bash
   sudo tdnf -y install gcc g++
   wget 
https://raw.githubusercontent.com/Azure-Samples/Synapse/main/Spark/Python/Synapse-Python310-CPU.yml
   conda env create -n synapse-env -f Synapse-Python310-CPU.yml
   source activate synapse-env
   ```
   
   ### Install pip packages and determine which packages are Downloaded above 
and beyond the baseline packages
    requirements.txt
   ```bash
   # echo "apache-sedona==1.5.3" > input-user-req.txt
   echo "apache-sedona==1.6.0" > input-user-req.txt
   ```
   
    install apache-sedona and dependencies
   ```bash
   pip install -r input-user-req.txt > pip_output.txt
   ```
   
    install apache-sedona and dependencies
   ```bash
   cat pip_output.txt | grep Downloading
   ```
   
   Use the above output to identify the `.whl` files to download add to Synapse.
   
   
   # Full stack trace of error
   ```python
   ---
   Py4JJavaError