[GitHub] [sedona] PrathameshDhapodkar opened a new issue, #987: Shpefile attributes do not produce values in the column after joining.

via GitHub Fri, 25 Aug 2023 15:31:05 -0700


PrathameshDhapodkar opened a new issue, #987:
URL: https://github.com/apache/sedona/issues/987


   ## Expected behavior
   value| current_timestamp | network_operator_name | dl_load_date | 
isDishNrCell | isSimDish | nrNci | AOI_ID | Cluster_ID
   <nestedjsonvalue> | 8/22/2023 5:44:11 PM | Digicel | 8/22/2023 | FALSE | 
TRUE | 3569856325 | ALB | ALB-01-Downtown
   
   ## Actual behavior
   value| current_timestamp | network_operator_name | dl_load_date | 
isDishNrCell | isSimDish | nrNci | AOI_ID | Cluster_ID
   <nestedjsonvalue> | 8/22/2023 5:44:11 PM | Digicel | 8/22/2023 | FALSE | 
TRUE | 3569856325 |  | 
   
   ## Steps to reproduce the problem
   actual dataframe is a streaming dataset running on spark cluster.
   
   1. Create spark session
   2. get shape file from location(s3 here)
   code:
     def getAoiShapeDf: DataFrame  = {
       val aoiShapefileLocation = 
"s3://ue-bronze-dish-wireless-source-data-np/opensource_loc/top_shp/aoi_oto/"
       val aoiShapeRdd = 
ShapefileReader.readToGeometryRDD(session.sparkContext, aoiShapefileLocation)
       aoiShapeRdd.CRSTransform("epsg:4326", "epsg:5070", false)
       val aoiShapeDf = Adapter.toDf(aoiShapeRdd, session)
       aoiShapeDf
     }
   
   3. join shape file dataframe with actual dataframe on ST_Contains join 
condition.
   code:
     def enrichWithAoi(dataframe:DataFrame,clientLatColumn: String, 
clientLongColumn: String): DataFrame = {
       val networkAoiShape = 
broadcast(this.getAoiShapeDf.select("geometry","AOI_ID"))
       val ueDataWithGeom = dataframe.withColumn("aoiGeoPoint",
         expr(s"ST_TRANSFORM(ST_POINT(CAST($clientLatColumn AS DOUBLE), 
CAST($clientLongColumn AS DOUBLE)), 'EPSG:4326', 'EPSG:5070')"))
       val aoiShapeJoin = 
ueDataWithGeom.alias("roamingAoiData").join(networkAoiShape.alias("shapeData"),
         
expr("ST_Contains(shapeData.geometry,roamingAoiData.aoiGeoPoint)"),"LeftOuter")
       aoiShapeJoin.drop("geometry","aoiGeoPoint")
     }
   
   I tried with schema for shape files as well. Still the same result.
   
   ## Settings
   - EMR Serverless 6.9.0
   - spark 3.3.2
   - scala 2.12
   - jdk 11
   
   
   Sedona version = ?
   implementation group: 'org.apache.sedona', name: 
'sedona-python-adapter-3.0_2.12', version: '1.3.1-incubating'
   implementation group: 'org.apache.sedona', name: 'sedona-viz-3.0_2.12', 
version: '1.4.1'
   implementation group: 'org.apache.sedona', name: 'sedona-common', version: 
'1.4.1'
   implementation group: 'org.apache.sedona', name: 'sedona-sql-3.0_2.12', 
version: '1.4.1'
   
   Apache Spark version = ?
   3.3.2
   
   API type = Scala, Java, Python?
   Scala
   
   Scala version = 2.11, 2.12, 2.13?
   2.12
   
   JRE version = 1.8, 1.11?
   jdk11
   
   Environment = Standalone, AWS EC2, EMR, Azure, Databricks?
   AWS EMR Serverless


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] PrathameshDhapodkar opened a new issue, #987: Shpefile attributes do not produce values in the column after joining.

Reply via email to