adamaps opened a new issue, #1345: URL: https://github.com/apache/sedona/issues/1345
## Expected behavior `ShapefileReader.readToGeometryRDD(sedona_context, shp_file)` should use the `sedona.global.charset` configuration property set in the spark session when reading shapefiles containing non-ASCII characters. E.g. A shapefile containing an attribute value `"Ariñiz/Aríñez"` should appear in a dataframe as `"Ariñiz/Aríñez"`. ## Actual behavior `ShapefileReader.readToGeometryRDD(sedona_context, shp_file)` is not using the charset configuration property set in the spark context. E.g. A shapefile containing an attribute value `"Ariñiz/Aríñez"` appears in a dataframe as `"Ariñiz/ArÃñez"` instead. ## Steps to reproduce the problem ```python from pyspark.conf import SparkConf from pyspark.sql import SparkSession from sedona.core.formatMapper.shapefileParser import ShapefileReader from sedona.spark import SedonaContext from sedona.utils.adapter import Adapter conf = SparkConf() conf.set("sedona.global.charset", "utf8") spark = SparkSession.builder.config(conf=conf).getOrCreate() sedona = SedonaContext.create(spark) sedona_context = sedona.sparkContext shp_file = '[aws s3 path to shapefile]' shp_rdd = ShapefileReader.readToGeometryRDD(sedona_context, shp_file) shp_df = Adapter.toDf(shp_rdd, sedona) ``` I can confirm that `("sedona.global.charset", "utf8")` appears in the configuration settings by using: ```python print(sedona_context.getConf().getAll()) ``` I also tried setting the charset property after creating the sedona context as follows (although this appears to be an older solution): ```python sedona_context.setSystemProperty("sedona.global.charset", "utf8") ``` Please confirm how to set this configuration property correctly. ## Settings Sedona version = 1.5.1 Apache Spark version = 3.3.0 API type = Python Python version = 3.10 Environment = AWS Glue 4.0 using `sedona-spark-shaded-3.0_2.12-1.5.1.jar` and `geotools-wrapper-1.5.1-28.2.jar` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org