golfalot opened a new issue, #1477:
URL: https://github.com/apache/sedona/issues/1477
## Expected behavior
return result rows/table
## Actual behavior
crash with stack trace
java.lang.NoSuchMethodError: 'void
org.geotools.coverage.grid.GridGeometry2D.<init>(org.opengis.coverage.grid.GridEnvelope,
org.opengis.referencing.datum.PixelInCell,
org.opengis.referencing.operation.MathTransform,
org.opengis.referencing.crs.CoordinateReferenceSystem,
org.geotools.util.factory.Hints)
## Steps to reproduce the problem
```python
from sedona.spark import SedonaContext
config = SedonaContext.builder() .\
config('spark.jars.packages',
'org.apache.sedona:sedona-spark-shaded-3.4_2.12-1.6.0,'
'org.datasyslab:geotools-wrapper:1.6.0-28.2'). \
getOrCreate()
```
```python
from pyspark.sql import functions as f
df =
sedona.read.format("binaryFile").load("/raw/GIS_Raster_Data/samples/test.nc")
df2 = df.withColumn("raster", f.expr("RS_FromNetCDF(content, 'O3')"))
df2.createOrReplaceTempView("raster_table")
# this command throws the error
sedona.sql("SELECT RS_Value(raster, 3, 4, 1) FROM raster_table").show()
```
Raster sources from:
https://github.com/apache/sedona/blob/master/spark/common/src/test/resources/raster/netcdf/test.nc
sedona = SedonaContext.create(config)
## Settings
Sedona version = 1.6.0
Apache Spark version = 3.4
API type =Python
Scala version = 2.12.17
Java version = 11
Python version = 3.10
Environment = Azure Synapse Spark Pool
# Additional background
We're using Azure Synapse with DEP (data exfiltration protection enabled)
which means no outbound internet access, so all packages must be obtained
manually before being uploaded as "Workspace packages" which can then enabled
on the spark pools.
## A configuration that works (no error)
### Spark pool
- Apache Spark version = 3.4
- Scala version = 2.12.17
- Java version = 11
- Python version = 3.10
#### Java
- sedona-spark-shaded-3.4_2.12-1.5.3.jar
- geotools-wrapper-1.5.3-28.2.jar
#### Python
-
apache_sedona-1.5.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- shapely-2.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
## A configuration that causes the error
### Spark pool (identical to above)
- Apache Spark version = 3.4
- Scala version = 2.12.17
- Java version = 11
- Python version = 3.10
### Packages
#### Java
- sedona-spark-shaded-3.4_2.12-1.6.0.jar
- geotools-wrapper-1.6.0-28.2.jar
#### Python
- click_plugins-1.1.1-py2.py3-none-any.whl
- affine-2.4.0-py3-none-any.whl
-
apache_sedona-1.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- cligj-0.7.2-py3-none-any.whl
- rasterio-1.3.10-cp310-cp310-manylinux2014_x86_64.whl
- shapely-2.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- snuggs-1.4.7-py3-none-any.whl
**stating the obvious:** There are many packages listed in the failing
scenario. See below the convaluted steps need to establish what packages are
required for a baseline Synapse Spark pool.
# How to establish Python package dependencies for Synsapse Spark pool
## Identify Operating System
https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-34-runtime
=> Mariner 2.0
## Create a VM and apply baseline configuration
https://github.com/microsoft/azurelinux/blob/2.0/toolkit/docs/quick_start/quickstart.md
### Get conda
```bash
wget
https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
sudo bash Miniforge3-Linux-x86_64.sh -b -p /usr/lib/miniforge3
export PATH="/usr/lib/miniforge3/bin:$PATH"
```
### Apply baseline Synapse configuration
```bash
sudo tdnf -y install gcc g++
wget
https://raw.githubusercontent.com/Azure-Samples/Synapse/main/Spark/Python/Synapse-Python310-CPU.yml
conda env create -n synapse-env -f Synapse-Python310-CPU.yml
source activate synapse-env
```
### Install pip packages and determine which packages are Downloaded above
and beyond the baseline packages
#### requirements.txt
```bash
# echo "apache-sedona==1.5.3" > input-user-req.txt
echo "apache-sedona==1.6.0" > input-user-req.txt
```
#### install apache-sedona and dependencies
```bash
pip install -r input-user-req.txt > pip_output.txt
```
#### install apache-sedona and dependencies
```bash
cat pip_output.txt | grep Downloading
```
Use the above output to identify the `.whl` files to download add to Synapse.
# Full stack trace of error
```python
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
Cell In[19], line 1
----> 1 sedona.sql("SELECT RS_Value(raster, ST_Point(507573, 103477)) FROM
raster_table").show()
File /opt/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py:899, in
DataFrame.show(self, n, truncate, vertical)
893 raise PySparkTypeError(
894 error_class="NOT_BOOL",
895 message_parameters={"arg_name": "vertical", "arg_type":
type(vertical).__name__},
896 )
898 if isinstance(truncate, bool) and truncate:
--> 899 print(self._jdf.showString(n, 20, vertical))
900 else:
901 try:
File
~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/java_gateway.py:1322,
in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File
/opt/spark/python/lib/pyspark.zip/pyspark/errors/exceptions/captured.py:169, in
capture_sql_exception.<locals>.deco(*a, **kw)
167 def deco(*a: Any, **kw: Any) -> Any:
168 try:
--> 169 return f(*a, **kw)
170 except Py4JJavaError as e:
171 converted = convert_exception(e.java_exception)
File
~/cluster-env/clonedenv/lib/python3.10/site-packages/py4j/protocol.py:326, in
get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
329 else:
330 raise Py4JError(
331 "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
332 format(target_id, ".", name, value))
Py4JJavaError: An error occurred while calling o4346.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0
in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0
(TID 5) (vm-32f63676 executor 1): java.lang.NoSuchMethodError: 'void
org.geotools.coverage.grid.GridGeometry2D.<init>(org.opengis.coverage.grid.GridEnvelope,
org.opengis.referencing.datum.PixelInCell,
org.opengis.referencing.operation.MathTransform,
org.opengis.referencing.crs.CoordinateReferenceSystem,
org.geotools.util.factory.Hints)'
at
org.apache.sedona.common.raster.RasterConstructors.makeNonEmptyRaster(RasterConstructors.java:375)
at
org.apache.sedona.common.raster.netcdf.NetCdfReader.getRasterHelper(NetCdfReader.java:282)
at
org.apache.sedona.common.raster.netcdf.NetCdfReader.getRaster(NetCdfReader.java:77)
at
org.apache.sedona.common.raster.RasterConstructors.fromNetCDF(RasterConstructors.java:79)
at
org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromNetCDF$$anonfun$$lessinit$greater$17.apply(RasterConstructors.scala:196)
at
org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromNetCDF$$anonfun$$lessinit$greater$17.apply(RasterConstructors.scala:196)
at
org.apache.spark.sql.sedona_sql.expressions.InferrableFunctionConverter$.$anonfun$inferrableFunction2$2(InferrableFunctionConverter.scala:53)
at
org.apache.spark.sql.sedona_sql.expressions.InferredExpression.evalWithoutSerialization(InferredExpression.scala:70)
at
org.apache.spark.sql.sedona_sql.expressions.raster.implicits$RasterInputExpressionEnhancer.toRaster(implicits.scala:32)
at
org.apache.spark.sql.sedona_sql.expressions.InferrableRasterTypes$.rasterExtractor(InferrableRasterTypes.scala:43)
at
org.apache.spark.sql.sedona_sql.expressions.InferredRasterExpression$.$anonfun$rasterExtractor$2(InferredRasterExpression.scala:48)
at
org.apache.spark.sql.sedona_sql.expressions.InferrableFunctionConverter$.$anonfun$inferrableFunction2$2(InferrableFunctionConverter.scala:50)
at
org.apache.spark.sql.sedona_sql.expressions.InferredExpression.eval(InferredExpression.scala:69)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:425)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2799)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2735)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2734)
at
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2734)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1218)
at
org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1218)
at scala.Option.foreach(Option.scala:407)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1218)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2998)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2937)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2926)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at
org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:977)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2418)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2439)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2458)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:566)
at
org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:519)
at
org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:61)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:4203)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:3174)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$2(Dataset.scala:4193)
at
org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:642)
at
org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:4191)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:125)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:214)
at
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:827)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:67)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:4191)
at org.apache.spark.sql.Dataset.head(Dataset.scala:3174)
at org.apache.spark.sql.Dataset.take(Dataset.scala:3395)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:297)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:336)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.NoSuchMethodError: 'void
org.geotools.coverage.grid.GridGeometry2D.<init>(org.opengis.coverage.grid.GridEnvelope,
org.opengis.referencing.datum.PixelInCell,
org.opengis.referencing.operation.MathTransform,
org.opengis.referencing.crs.CoordinateReferenceSystem,
org.geotools.util.factory.Hints)'
at
org.apache.sedona.common.raster.RasterConstructors.makeNonEmptyRaster(RasterConstructors.java:375)
at
org.apache.sedona.common.raster.netcdf.NetCdfReader.getRasterHelper(NetCdfReader.java:282)
at
org.apache.sedona.common.raster.netcdf.NetCdfReader.getRaster(NetCdfReader.java:77)
at
org.apache.sedona.common.raster.RasterConstructors.fromNetCDF(RasterConstructors.java:79)
at
org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromNetCDF$$anonfun$$lessinit$greater$17.apply(RasterConstructors.scala:196)
at
org.apache.spark.sql.sedona_sql.expressions.raster.RS_FromNetCDF$$anonfun$$lessinit$greater$17.apply(RasterConstructors.scala:196)
at
org.apache.spark.sql.sedona_sql.expressions.InferrableFunctionConverter$.$anonfun$inferrableFunction2$2(InferrableFunctionConverter.scala:53)
at
org.apache.spark.sql.sedona_sql.expressions.InferredExpression.evalWithoutSerialization(InferredExpression.scala:70)
at
org.apache.spark.sql.sedona_sql.expressions.raster.implicits$RasterInputExpressionEnhancer.toRaster(implicits.scala:32)
at
org.apache.spark.sql.sedona_sql.expressions.InferrableRasterTypes$.rasterExtractor(InferrableRasterTypes.scala:43)
at
org.apache.spark.sql.sedona_sql.expressions.InferredRasterExpression$.$anonfun$rasterExtractor$2(InferredRasterExpression.scala:48)
at
org.apache.spark.sql.sedona_sql.expressions.InferrableFunctionConverter$.$anonfun$inferrableFunction2$2(InferrableFunctionConverter.scala:50)
at
org.apache.spark.sql.sedona_sql.expressions.InferredExpression.eval(InferredExpression.scala:69)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
Source)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
at
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:425)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:895)
at
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:895)
at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:57)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
... 1 more
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]