[ https://issues.apache.org/jira/browse/SPARK-20663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kobefeng updated SPARK-20663: ----------------------------- Docs Text: (was: -- create partition table first $ hadoop fs -mkdir /user/kofeng/partitioned_table $ /apache/spark-2.1.0-bin-hadoop2.7/bin/spark-sql spark-sql> create table kofeng.partitioned_table( > id bigint, > name string, > dt string > ) using parquet options ('compression'='snappy', 'path'='hdfs://ares-lvs-nn-ha/user/kofeng/partitioned_table') > partitioned by (dt); -- add partition with specific location spark-sql> alter table kofeng.partitioned_table add if not exists partition(dt='20170507') location '/user/kofeng/partitioned_table/20170507'; $ hadoop fs -ls /user/kofeng/partitioned_table drwxr-xr-x - kofeng kofeng 0 2017-05-08 17:00 /user/kofeng/partitioned_table/20170507 -- insert overwrite this partition, and the specific location folder gone, data is missing, job is success by attaching _SUCCESS spark-sql> insert overwrite table kofeng.partitioned_table partition(dt='20170507') select 123 as id, "kofeng" as name; $ hadoop fs -ls /user/kofeng/partitioned_table -rw-r--r-- 3 kofeng hdmi-technology 0 2017-05-08 17:06 /user/kofeng/partitioned_table/_SUCCESS ---- ---- -- Then drop this partition and use hive to add partition and insert overwrite this partition data, then verify: spark-sql> alter table kofeng.partitioned_table drop if exists partition(dt='20170507'); hive> alter table kofeng.partitioned_table add if not exists partition(dt='20170507') location '/user/kofeng/partitioned_table/20170507'; OK -- could see hive also drop the specific location but data is preserved on auto-created partition folder hive> insert overwrite table kofeng.partitioned_table partition(dt='20170507') select 123 as id, "kofeng" as name from kofeng.test; Loading data to table kofeng.partitioned_table partition (dt=20170507) Moved: 'hdfs://ares-lvs-nn-ha/user/kofeng/partitioned_table/dt=20170507/000000_0' to trash at: hdfs://ares-lvs-nn-ha/user/kofeng/.Trash/Current Partition kofeng.partitioned_table{dt=20170507} stats: [numFiles=1, numRows=1, totalSize=338, rawDataSize=2] MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Cumulative CPU: 10.61 sec HDFS Read: 9767 HDFS Write: 577 SUCCESS Stage-Stage-3: Map: 1 Cumulative CPU: 12.36 sec HDFS Read: 3635 HDFS Write: 338 SUCCESS hive> select * from kofeng.partitioned_table; OK 123 kofeng 20170507 $ hadoop fs -ls /user/kofeng/partitioned_table/dt=20170507 -rwxr-xr-x 3 kofeng hdmi-technology 338 2017-05-08 17:26 /user/kofeng/partitioned_table/dt=20170507/000000_0) > Data missing after insert overwrite table partition which is created on > specific location > ----------------------------------------------------------------------------------------- > > Key: SPARK-20663 > URL: https://issues.apache.org/jira/browse/SPARK-20663 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.1.0, 2.1.1 > Reporter: kobefeng > Priority: Minor > > Use spark sql to create partition table first, and alter table by adding > partition on specific location, then insert overwrite into this partition by > selection, which will cause data missing compared with HIVE. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org