[jira] [Updated] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

Serge Rielau (Jira) Wed, 21 Sep 2022 09:35:06 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-40521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Serge Rielau updated SPARK-40521:
---------------------------------
    Description: 
PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
instead of the conflicting partition

When I run:
AlterTableAddPartitionSuiteBase for Hive
The test: partition already exists
Fails in my my local build ONLY in that mode because it reports two partitions 
as conflicting where there should be only one. In all other modes the test 
succeeds.
The test is passing on master because the test does not check the partitions 
themselves.

Repro on master: Note that c1 = 1 does not already exist. It should NOT be 
listed 

create table t(c1 int, c2 int) partitioned by (c1);

alter table t add partition (c1 = 2);

alter table t add partition (c1 = 1) partition (c1 = 2);

22/09/21 09:30:09 ERROR Hive: AlreadyExistsException(message:Partition already 
exists: Partition(values:[2], dbName:default, tableName:t, createTime:0, 
lastAccessTime:0, sd:StorageDescriptor(cols:[FieldSchema(name:c2, type:int, 
comment:null)], location:file:/Users/serge.rielau/spark/spark-warehouse/t/c1=2, 
inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
parameters:\{serialization.format=1}), bucketCols:[], sortCols:[], 
parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), parameters:null))

 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:2744)

 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:2442)

 at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:2560)

 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)

 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.base/java.lang.reflect.Method.invoke(Method.java:566)

 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)

 at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)

 at com.sun.proxy.$Proxy31.add_partitions_req(Unknown Source)

 at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)

 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)

 at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.base/java.lang.reflect.Method.invoke(Method.java:566)

 at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)

 at com.sun.proxy.$Proxy32.add_partitions(Unknown Source)

 at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2103)

 at 
org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:763)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:631)

 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:296)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)

 at 
org.apache.spark.sql.hive.client.HiveClientImpl.createPartitions(HiveClientImpl.scala:624)

 at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createPartitions$1(HiveExternalCatalog.scala:1039)

 at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)

 at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)

 at 
org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:1021)

 at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createPartitions(ExternalCatalogWithListener.scala:201)

 at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:1169)

 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17(ddl.scala:514)

 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17$adapted(ddl.scala:513)

 at scala.collection.Iterator.foreach(Iterator.scala:943)

 at scala.collection.Iterator.foreach$(Iterator.scala:943)

 at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)

 at 
org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:513)

 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)

 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)

 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)

 at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)

 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:111)

 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171)

 at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)

 at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)

 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)

 at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)

 at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)

 at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)

 at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)

 at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)

 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)

 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)

 at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)

 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)

 at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)

 at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)

 at 
org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)

 at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)

 at 
org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)

 at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)

...

 

*The following partitions already exists in table 't' database 'default':*

{color:#de350b}*Map(c1 -> 1)*{color}

{color:#de350b}*===*{color}

*Map(c1 -> 2)*

spark-sql> 

  was:
PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
instead of the conflicting partition

When I run:
AlterTableAddPartitionSuiteBase for Hive
The test: partition already exists
Fails in my my local build ONLY in that mode because it reports two partitions 
as conflicting where there should be only one. In all other modes the test 
succeeds.
The test is passing on master because the test does not check the partitions 
themselves.


> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
> instead of the conflicting partition
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-40521
>                 URL: https://issues.apache.org/jira/browse/SPARK-40521
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.4.0
>            Reporter: Serge Rielau
>            Priority: Minor
>
> PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions 
> instead of the conflicting partition
> When I run:
> AlterTableAddPartitionSuiteBase for Hive
> The test: partition already exists
> Fails in my my local build ONLY in that mode because it reports two 
> partitions as conflicting where there should be only one. In all other modes 
> the test succeeds.
> The test is passing on master because the test does not check the partitions 
> themselves.
> Repro on master: Note that c1 = 1 does not already exist. It should NOT be 
> listed 
> create table t(c1 int, c2 int) partitioned by (c1);
> alter table t add partition (c1 = 2);
> alter table t add partition (c1 = 1) partition (c1 = 2);
> 22/09/21 09:30:09 ERROR Hive: AlreadyExistsException(message:Partition 
> already exists: Partition(values:[2], dbName:default, tableName:t, 
> createTime:0, lastAccessTime:0, 
> sd:StorageDescriptor(cols:[FieldSchema(name:c2, type:int, comment:null)], 
> location:file:/Users/serge.rielau/spark/spark-warehouse/t/c1=2, 
> inputFormat:org.apache.hadoop.mapred.TextInputFormat, 
> outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, 
> compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, 
> serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, 
> parameters:\{serialization.format=1}), bucketCols:[], sortCols:[], 
> parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], 
> skewedColValueLocationMaps:{}), storedAsSubDirectories:false), 
> parameters:null))
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.startAddPartition(HiveMetaStore.java:2744)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:2442)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:2560)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
>  at com.sun.proxy.$Proxy31.add_partitions_req(Unknown Source)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:625)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>  at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
>  at com.sun.proxy.$Proxy32.add_partitions(Unknown Source)
>  at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2103)
>  at 
> org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:763)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:631)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:296)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:227)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:226)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:276)
>  at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createPartitions(HiveClientImpl.scala:624)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createPartitions$1(HiveExternalCatalog.scala:1039)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
>  at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:1021)
>  at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createPartitions(ExternalCatalogWithListener.scala:201)
>  at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:1169)
>  at 
> org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17(ddl.scala:514)
>  at 
> org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.$anonfun$run$17$adapted(ddl.scala:513)
>  at scala.collection.Iterator.foreach(Iterator.scala:943)
>  at scala.collection.Iterator.foreach$(Iterator.scala:943)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>  at 
> org.apache.spark.sql.execution.command.AlterTableAddPartitionCommand.run(ddl.scala:513)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:111)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
>  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>  at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
>  at 
> org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
>  at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
>  at 
> org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
>  at org.apache.spark.sql.Dataset.<init>(Dataset.scala:219)
> ...
>  
> *The following partitions already exists in table 't' database 'default':*
> {color:#de350b}*Map(c1 -> 1)*{color}
> {color:#de350b}*===*{color}
> *Map(c1 -> 2)*
> spark-sql> 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40521) PartitionsAlreadyExistException in Hive V1 Command V1 reports all partitions instead of the conflicting partition

Reply via email to