Bigicecream created CARBONDATA-4279: ---------------------------------------
Summary: Insert data to table with a partitions resulting in 'Marked for Delete' segment in Spark in EMR Key: CARBONDATA-4279 URL: https://issues.apache.org/jira/browse/CARBONDATA-4279 Project: CarbonData Issue Type: Bug Affects Versions: 2.3.0 Environment: Release label:emr-5.24.1 Hadoop distribution:Amazon 2.8.5 Applications: Hue 4.4.0, Spark 2.4.5,JupyterHub 0.9.6 Jar complied with: apache-carbondata:2.3.0-SNAPSHOT spark:2.4.5 hadoop:2.8.3 Reporter: Bigicecream as decribed [here|https://github.com/apache/carbondata/issues/4212] After the commit [https://github.com/apache/carbondata/commit/42f69827e0a577b6128417104c0a49cd5bf21ad7] I have successfully created a table with partitions, but when I trying insert data the job end with a success but the segment is marked as "Marked for Delete" I am running: {code:sql} CREATE TABLE lior_carbon_tests.mark_for_del_bug( timestamp string, name string ) STORED AS carbondata PARTITIONED BY (dt string, hr string) {code} {code:sql} INSERT INTO lior_carbon_tests.mark_for_del_bug select '2021-07-07T13:23:56.012+00:00','spark','2021-07-07','13' {code} {code:sql} select * from lior_carbon_tests.mark_for_del_bug {code} gives: {code:java} +---------+----+---+---+ |timestamp|name| dt| hr| +---------+----+---+---+ +---------+----+---+---+ {code} And {code:java} show segments for TABLE lior_carbon_tests.mark_for_del_bug {code} gives {code:java} +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+ |ID |Status |Load Start Time |Load Time Taken|Partition|Data Size|Index Size|File Format| +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+ |0 |Marked for Delete|2021-09-02 15:24:21.022|11.798S |NA |NA |NA |columnar_v3| +---+-----------------+-----------------------+---------------+---------+---------+----------+-----------+ {code} I took a looking at the folder structure in S3 and it seems fine -- This message was sent by Atlassian Jira (v8.3.4#803005)