adamaps opened a new issue, #1345:
URL: https://github.com/apache/sedona/issues/1345

   ## Expected behavior
   
   `ShapefileReader.readToGeometryRDD(sedona_context, shp_file)` should use the 
`sedona.global.charset` configuration property set in the spark session when 
reading shapefiles containing non-ASCII characters.
   
   E.g. A shapefile containing an attribute value `"Ariñiz/Aríñez"` should 
appear in a dataframe as `"Ariñiz/Aríñez"`.
   
   ## Actual behavior
   
   `ShapefileReader.readToGeometryRDD(sedona_context, shp_file)` is not using 
the charset configuration property set in the spark context.
   
   E.g. A shapefile containing an attribute value `"Ariñiz/Aríñez"` appears in 
a dataframe as `"Ariñiz/Aríñez"` instead.
   
   ## Steps to reproduce the problem
   
   ```python
   from pyspark.conf import SparkConf
   from pyspark.sql import SparkSession
   from sedona.core.formatMapper.shapefileParser import ShapefileReader
   from sedona.spark import SedonaContext
   from sedona.utils.adapter import Adapter
   
   conf = SparkConf()
   conf.set("sedona.global.charset", "utf8")
   spark = SparkSession.builder.config(conf=conf).getOrCreate()
   
   sedona = SedonaContext.create(spark)
   sedona_context = sedona.sparkContext
   
   shp_file = '[aws s3 path to shapefile]'
   shp_rdd = ShapefileReader.readToGeometryRDD(sedona_context, shp_file)
   shp_df = Adapter.toDf(shp_rdd, sedona)
   ```
   
   I can confirm that `("sedona.global.charset", "utf8")` appears in the 
configuration settings by using:
   ```python 
   print(sedona_context.getConf().getAll())
   ```
   
   I also tried setting the charset property after creating the sedona context 
as follows (although this appears to be an older solution):
   ```python
   sedona_context.setSystemProperty("sedona.global.charset", "utf8")
   ```
   
   Please confirm how to set this configuration property correctly.
   
   ## Settings
   
   Sedona version = 1.5.1
   
   Apache Spark version = 3.3.0
   
   API type = Python
   
   Python version = 3.10
   
   Environment = AWS Glue 4.0 using `sedona-spark-shaded-3.0_2.12-1.5.1.jar` 
and `geotools-wrapper-1.5.1-28.2.jar`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to