[jira] [Updated] (HUDI-2390) Create table by hudisql,write data into table by datasource,hudi delete cmd can not delete data

Raymond Xu (Jira) Sun, 19 Sep 2021 21:51:10 -0700


     [ 
https://issues.apache.org/jira/browse/HUDI-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Raymond Xu updated HUDI-2390:
-----------------------------
    Issue Type: Improvement  (was: Bug)

> Create table by hudisql,write data into table by datasource,hudi delete cmd 
> can not delete data
> -----------------------------------------------------------------------------------------------
>
>                 Key: HUDI-2390
>                 URL: https://issues.apache.org/jira/browse/HUDI-2390
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: Spark Integration
>    Affects Versions: 0.9.0
>            Reporter: renhao
>            Priority: Minor
>              Labels: features
>
> Test Case:
> {code:java}
>  import org.apache.hudi.QuickstartUtils._
>  import scala.collection.JavaConversions._
>  import org.apache.spark.sql.SaveMode._
>  import org.apache.hudi.DataSourceReadOptions._
>  import org.apache.hudi.DataSourceWriteOptions._
>  import org.apache.hudi.config.HoodieWriteConfig._{code}
> 1.准备数据
>  
> {code:java}
> spark.sql("create table test1(a int,b string,c string) using hudi partitioned 
> by(b) options(primaryKey='a')")
> spark.sql("insert into table test1 select 1,2,3")
> {code}
>  
> 2.创建hudi table test2
> {code:java}
> spark.sql("create table test2(a int,b string,c string) using hudi partitioned 
> by(b) options(primaryKey='a')"){code}
> 3.datasource向test2写入数据
>  
> {code:java}
> val base_data=spark.sql("select * from testdb.test1")
> base_data.write.format("hudi").
> option(TABLE_TYPE_OPT_KEY, COW_TABLE_TYPE_OPT_VAL).      
> option(RECORDKEY_FIELD_OPT_KEY, "a").      
> option(PARTITIONPATH_FIELD_OPT_KEY, "b").      
> option(KEYGENERATOR_CLASS_OPT_KEY, 
> "org.apache.hudi.keygen.SimpleKeyGenerator"). 
> option(OPERATION_OPT_KEY, "bulk_insert").      
> option(HIVE_SYNC_ENABLED_OPT_KEY, "true").      
> option(HIVE_PARTITION_FIELDS_OPT_KEY, "b").   
> option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY,"org.apache.hudi.hive.MultiPartKeysValueExtractor").
>       
> option(HIVE_DATABASE_OPT_KEY, "testdb").      
> option(HIVE_TABLE_OPT_KEY, "test2").      
> option(HIVE_USE_JDBC_OPT_KEY, "true").      
> option("hoodie.bulkinsert.shuffle.parallelism", 4).
> option("hoodie.datasource.write.hive_style_partitioning", "true").      
> option(TABLE_NAME, 
> "test2").mode(Append).save(s"/user/hive/warehouse/testdb.db/test2")
> {code}
>  
> 此时执行查询结果如下：
> {code:java}
> +---+---+---+
> | a| b| c|
> +---+---+---+
> | 1| 3| 2|
> +---+---+---+{code}
> 4.删除一条记录
> {code:java}
> spark.sql("delete from testdb.test2 where a=1"){code}
> 5.执行查询，a=1的记录未被删除
> {code:java}
> spark.sql("select a,b,c from testdb.test2").show{code}
> {code:java}
> +---+---+---+
> | a| b| c|
> +---+---+---+
> | 1| 3| 2|
> +---+---+---+{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-2390) Create table by hudisql,write data into table by datasource,hudi delete cmd can not delete data

Reply via email to