[ 
https://issues.apache.org/jira/browse/SPARK-23621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravikumar Ramasamy updated SPARK-23621:
---------------------------------------
    Attachment: sample_data.csv

> DataFrame.insertInto() is persisting all columns for mixed structured 
> data-type
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-23621
>                 URL: https://issues.apache.org/jira/browse/SPARK-23621
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Ravikumar Ramasamy
>            Priority: Major
>         Attachments: sample_data.csv
>
>
> The  configuration data is stored in Cassandra which is unstructured data 
> contains string columns and one json structure string column. In this case, 
> DataFrame saveAsTable is persisting all the column values properly but 
> insertInto function is not storing all the columns especially json data is 
> truncated and sub-sequent column in not stored. 
> To reproduce the issue, I stored the data into Hive table and reading from 
> there.
>  
> {code:java}
> CREATE TABLE zone_status (
> siteid string, 
> orgid string, 
> groupid string, 
> zoneid string, 
> parkingtype string, 
> capacity int, 
> config string, 
> ts bigint) 
> STORED AS TEXTFILE;
> {code}
> {code:java}
> val spark = SparkSession.builder().appName("Spark SQL Test").
>   config("hive.exec.dynamic.partition", "true").
>   config("hive.exec.dynamic.partition.mode", "nonstrict").
> enableHiveSupport().getOrCreate()
> val zoneStatus = spark.table("zone_status")
> zoneStatus.select(col("siteid"),col("orgid"), col("parkinggroupid"), 
> col("parkingzoneid"), col("parkingtype"), lit(0), col("config"), 
> unix_timestamp().alias("ts")).
>   write.mode(SaveMode.Overwrite).saveAsTable("dwh_zone_status_save")
> {code}
> Records in dwh_zone_status_save table
> {noformat}
> a8f11f90-20c9-11e8-b93e-2fc569d27605  efe5bdb3-baac-5d8e-6cae57771c13 Unknown 
> E657F298-2D96-4C7D-8516-E228153FE010    NonDemarcated   0       
> {"orgid":"efe5bdb3-baac-5d8e-6cae57771c13","nodeid":"N02c00056","parkingzoneid":"E657F298-2D96-4C7D-8516-E228153FE010","siteid":"a8f11f90-20c9-11e8-b93e-2fc569d27605","channel":1,"type":"NonDemarcatedParkingConfig","active":true,"tag":"","configured_date":"2017-10-23
>  
> 23:29:11.20","roi":{"roiid":"7854D5F1-9ECD-4E02-8364-7BFB15C2A01C","name":"Parking_Area_1","image_bounding_box":[[{"x":0.5083333253860474,"y":0.25468748807907104},{"x":0.6277777552604675,"y":0.45781248807907104},{"x":0.855555534362793,"y":0.42656248807907104},{"x":0.7138888835906982,"y":0.17656250298023224}]],"world_bounding_box":[[{"latitude":41.88759132852836,"longitude":-87.62231239554004},{"latitude":41.887652271934634,"longitude":-87.62230098708424},{"latitude":41.88765219325104,"longitude":-87.62227158629935},{"latitude":41.88757153728604,"longitude":-87.62227165116063}]],"vs":[5.0,1.7999999523162842,1.5]}}
>         1520453589{noformat}
>  
> {code:java}
> zoneStatus.
>   select(col("siteid"),col("orgid"), col("parkinggroupid"), 
> col("parkingzoneid"), col("parkingtype"), lit(0), col("config"), 
> unix_timestamp().alias("ts")).
>   write.mode(SaveMode.Overwrite).insertInto("dwh_zone_status_insert")
> {code}
> Records in dwh_zone_status_insert  table is 
> {noformat}
> 985feb70-18f4-11e8-9912-e9bbd4db7f62 efe5bdb3-baac-5d8e-6cae57771c13 Unknown 
> 04ABD29C-FA0F-4E4D-BFF2-4EC290DC29AE Demarcated 0 {"description":"" 
> NULL{noformat}
>  
>  The json string column is not storing entire content and sub-sequent columns 
> values also not stored in table.  The defined table is TEXT format only.
> Our Environment is :
> scala 2.11.8
> Spark 2.2.0
> Hive  
> EMR
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to