MDiakhate12 opened a new issue, #1685: URL: https://github.com/apache/sedona/issues/1685
## Expected behavior I want to use Apache Sedona in pyspark in an AWS glue environment. ## Actual behavior The sedona librarie does not work when following the steps described in the doc : https://sedona.apache.org/latest-snapshot/setup/glue/ Error sent : ```python Traceback (most recent call last): File "/path/to/my/file.py", line 23, in <module> sedona = SedonaContext.create(spark) File "/home/glue_user/.local/lib/python3.10/site-packages/sedona/spark/SedonaContext.py", line 38, in create spark._jvm.SedonaContext.create(spark._jsparkSession, "python") TypeError: 'JavaPackage' object is not callable ``` Error happens in line `sedona = SedonaContext.create(spark)` ```python # -*- coding: utf-8 -*- import sys from awsglue.context import GlueContext from awsglue.job import Job from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from sedona.spark import SedonaContext # JOB CONTEXT SETUP args = getResolvedOptions(sys.argv, ["JOB_NAME", "environment", "additional-python-modules", "extra-jars", "extra-py-files"]) print(args) # Method 1 glue_context = GlueContext(SparkContext()) spark = glue_context.spark_session job = Job(glue_context) sedona = SedonaContext.create(spark) print(SedonaContext) print(sedona) ``` It seems that apache-sedona is not able to find the jar file. ## Steps to reproduce the problem Create a pyspark ETL Job on AWS Glue web console : 1. Go to AWS Glue > Data Integration and ETL > ETL Jobs 2. Click on Create job > Script editor > Engine = Spark 3. Documentation steps: . From your job's page, navigate to the "Job details" tab. At the bottom of the page expand the "Advanced properties" section. In the "Dependent JARs path" field, add the paths to the jars, separated by a comma (it corresponds to versions **[sedona-spark-shaded-3.3_2.12/1.6.1/sedona-spark-shaded-3.3_2.12-1.6.1.jar](https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/1.6.1/sedona-spark-shaded-3.3_2.12-1.6.1.jar)** and **[geotools-wrapper-1.6.1-28.2.jar](https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.6.1-28.2/geotools-wrapper-1.6.1-28.2.jar)**) . Add the Sedona Python package by navigating to the "Job Parameters" section and add a new parameter with the key --additional-python-modules and the value apache-sedona==1.6.1 5. Use the code show above 6. Click Save and Run You can repeat the same step using aws glue locally in docker by following [this official documentation on AWS](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html#develop-local-docker-image-setup-visual-studio) and then run your script inside the container using : ```bash /usr/local/bin/python3 "$PYTHON_SCRIPT" \ --JOB_NAME "$JOB_NAME" \ --environment "$ENVIRONMENT" \ --enable-glue-datacatalog \ --extra-jars https://repo1.maven.org/maven2/org/apache/sedona/sedona-spark-shaded-3.3_2.12/1.6.1/sedona-spark-shaded-3.3_2.12-1.6.1.jar, https://repo1.maven.org/maven2/org/datasyslab/geotools-wrapper/1.6.1-28.2/geotools-wrapper-1.6.1-28.2.jar \ --additional-python-modules apache-sedona==1.6.1 \ --job-language python ``` ## Settings Sedona version = 1.6.1 Apache Spark version = 3.3.0 Apache Flink version = ? API type = Python Scala version = 2.12 Python version = 3.12 Environment = AWS Glue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org