Hi Yixu, Can you please share the DDLs and the Data for the above problem with us?
Thanks Sounak On Wed, Oct 18, 2017 at 12:44 PM, yixu2001 <yixu2...@163.com> wrote: > dev > > > In carbondata version 1.2.0, I execute "update" statement with sub-query, > it failed. > All the rows in the 2 tables are not duplicated, and the same statement > will succeed in carbondata version 1.1.1. > > The test log as following: > scala> cc.sql("select count(*), count(distinct id) from > qqdata2.c_indextest1").show(100,false); > +--------+------------------+ > |count(1)|count(DISTINCT id)| > +--------+------------------+ > |300000 |300000 | > +--------+------------------+ > > scala> cc.sql("select count(*), count(distinct id) from > qqdata2.c_indextest2").show(100,false); > +--------+------------------+ > |count(1)|count(DISTINCT id)| > +--------+------------------+ > |71223220|71223220 | > +--------+------------------+ > > scala> cc.sql("update qqdata2.c_indextest2 a set(a.CUST_ORDER_ID,a.ORDER_ > ITEM_IDATTR_ID,a.ATTR_VALUE_IDATTR_VALUE,a.CREATE_DATE,a. > UPDATE_DATE,a.STATUS_CD,a.STATUS_DATE,a.AREA_ID,a. > REGION_CD,a.UPDATE_STAFF,a.CREATE_STAFF,a.SHARDING_ID,a.ORDER_ATTR_ID) = > (select b.CUST_ORDER_ID,b.ORDER_ITEM_IDATTR_ID,b.ATTR_VALUE_IDATTR_ > VALUE,b.CREATE_DATE,b.UPDATE_DATE,b.STATUS_CD,b.STATUS_ > DATE,b.AREA_ID,b.REGION_CD,b.UPDATE_STAFF,b.CREATE_STAFF,b.SHARDING_ID,b.ORDER_ATTR_ID > from qqdata2.c_indextest1 b where a.id = b.id)").show(100,false); > 17/10/18 11:32:46 WARN Utils: Truncated the string representation of a > plan since it was too large. This behavior can be adjusted by setting > 'spark.debug.maxToStringFields' in SparkEnv.conf. > 17/10/18 11:33:20 AUDIT deleteExecution$: [hdp84.ffcs.cn > > ][bigdata][Thread-1]Delete data operation is failed for > qqdata2.c_indextest2 > 17/10/18 11:33:20 ERROR deleteExecution$: main Delete data operation is > failed due to failure in creating delete delta file for segment : null > block : null > 17/10/18 11:33:20 ERROR ProjectForUpdateCommand$: main Exception in update > operationjava.lang.Exception: Multiple input rows matched for same row. > java.lang.RuntimeException: Update operation failed. Multiple input rows > matched for same row. > at scala.sys.package$.error(package.scala:27) > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand. > processData(IUDCommands.scala:239) > at org.apache.spark.sql.execution.command.ProjectForUpdateCommand.run( > IUDCommands.scala:141) > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > sideEffectResult$lzycompute(commands.scala:58) > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > sideEffectResult(commands.scala:56) > at org.apache.spark.sql.execution.command.ExecutedCommandExec. > executeTake(commands.scala:71) > at org.apache.spark.sql.execution.CollectLimitExec. > executeCollect(limit.scala:38) > at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$ > Dataset$$execute$1$1.apply(Dataset.scala:2378) > at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( > SQLExecution.scala:57) > at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780) > at org.apache.spark.sql.Dataset.org > > $apache$spark$sql$Dataset$$execute$1(Dataset.scala:2377) > at org.apache.spark.sql.Dataset.org > > $apache$spark$sql$Dataset$$collect(Dataset.scala:2384) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset. > scala:2120) > at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset. > scala:2119) > at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810) > at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) > at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) > at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) > at org.apache.spark.sql.Dataset.show(Dataset.scala:640) > ... 50 elided > > > > yixu2001 > -- Thanks Sounak