[jira] [Commented] (SPARK-33349) ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed

2022-06-07 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17550889#comment-17550889
 ] 

CHC commented on SPARK-33349:
-

Also met this problem at Spark 3.2.1 with `kubernetes-client 5.4.1`

After the exception about "too old resource", no more logs are produced and the 
app hangs.

> ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed
> --
>
> Key: SPARK-33349
> URL: https://issues.apache.org/jira/browse/SPARK-33349
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.0.1, 3.0.2, 3.1.0
>Reporter: Nicola Bova
>Priority: Critical
>
> I launch my spark application with the 
> [spark-on-kubernetes-operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator]
>  with the following yaml file:
> {code:yaml}
> apiVersion: sparkoperator.k8s.io/v1beta2
> kind: SparkApplication
> metadata:
>    name: spark-kafka-streamer-test
>    namespace: kafka2hdfs
> spec: 
>    type: Scala
>    mode: cluster
>    image: /spark:3.0.2-SNAPSHOT-2.12-0.1.0
>    imagePullPolicy: Always
>    timeToLiveSeconds: 259200
>    mainClass: path.to.my.class.KafkaStreamer
>    mainApplicationFile: spark-kafka-streamer_2.12-spark300-assembly.jar
>    sparkVersion: 3.0.1
>    restartPolicy:
>  type: Always
>    sparkConf:
>  "spark.kafka.consumer.cache.capacity": "8192"
>  "spark.kubernetes.memoryOverheadFactor": "0.3"
>    deps:
>    jars:
>  - my
>  - jar
>  - list
>    hadoopConfigMap: hdfs-config
>    driver:
>  cores: 4
>  memory: 12g
>  labels:
>    version: 3.0.1
>  serviceAccount: default
>  javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
>   executor:
>  instances: 4
>     cores: 4
>     memory: 16g
>     labels:
>   version: 3.0.1
>     javaOptions: 
> "-Dlog4j.configuration=file:///opt/spark/log4j/log4j.properties"
> {code}
>  I have tried with both Spark `3.0.1` and `3.0.2-SNAPSHOT` with the ["Restart 
> the watcher when we receive a version changed from 
> k8s"|https://github.com/apache/spark/pull/29533] patch.
> This is the driver log:
> {code}
> 20/11/04 12:16:02 WARN NativeCodeLoader: Unable to load native-hadoop library 
> for your platform... using builtin-java classes where applicable
> ... // my app log, it's a structured streaming app reading from kafka and 
> writing to hdfs
> 20/11/04 13:12:12 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed (this is expected if the application is shutting down.)
> io.fabric8.kubernetes.client.KubernetesClientException: too old resource 
> version: 1574101276 (1574213896)
>  at 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$1.onMessage(WatchConnectionManager.java:259)
>  at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:323)
>  at 
> okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
>  at 
> okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
>  at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:274)
>  at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:214)
>  at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203)
>  at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>  at java.base/java.lang.Thread.run(Unknown Source)
> {code}
> The error above appears after roughly 50 minutes.
> After the exception above, no more logs are produced and the app hangs.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31675) Fail to insert data to a table with remote location which causes by hive encryption check

2022-04-05 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517237#comment-17517237
 ] 

CHC commented on SPARK-31675:
-

Met the same problem, the SQL to reproduce the problem is shown below:
{code:sql}
CREATE TABLE `spark3_snap`( `id` string) PARTITIONED BY (`dt` string)
STORED AS ORC LOCATION 'hdfs://path/to/spark3_snap';

-- The file system of the partition location is different from the file system 
of the table location,
-- one is S3A, the other is HDFS
alter table tmp.spark3_snap add partition (dt='2020-09-10') 
LOCATION 's3a://path/to/spark3_snap/dt=2020-09-10';

insert overwrite table tmp.spark3_snap partition(dt)
select '10' id, '2020-09-09' dt
union
select '20' id, '2020-09-10' dt
;
{code}
And we will get an exception:
{code:none}
java.lang.IllegalArgumentException: Wrong FS: 
s3a://path/to/spark3_snap/dt=2020-09-10, expected: hdfs://cluster1
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:666)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:816)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:812)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:823)
at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.$anonfun$commitJob$6(HadoopMapReduceCommitProtocol.scala:194)
at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.$anonfun$commitJob$6$adapted(HadoopMapReduceCommitProtocol.scala:194)
at scala.collection.immutable.Set$Set1.foreach(Set.scala:141)
at 
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(HadoopMapReduceCommitProtocol.scala:194)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$20(FileFormatWriter.scala:240)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:605)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:240)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:187)
at ..
{code}
I will submit a PR later to fix rename and delete files with different 
filesystem at the `HadoopMapReduceCommitProtocol`

> Fail to insert data to a table with remote location which causes by hive 
> encryption check
> -
>
> Key: SPARK-31675
> URL: https://issues.apache.org/jira/browse/SPARK-31675
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.6, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Priority: Major
>
> Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 
> 2.2.0, when moving files from staging dir to the final table dir, Hive will 
> do encryption check for the srcPaths and destPaths
> {code:java}
> // Some comments here
>  if (!isSrcLocal) {
> // For NOT local src file, rename the file
> if (hdfsEncryptionShim != null && 
> (hdfsEncryptionShim.isPathEncrypted(srcf) || 
> hdfsEncryptionShim.isPathEncrypted(destf))
> && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf))
> {
>   LOG.info("Copying source " + srcf + " to " + destf + " because HDFS 
> encryption zones are different.");
>   success = FileUtils.copy(srcf.getFileSystem(conf), srcf, 
> destf.getFileSystem(conf), destf,
>   true,// delete source
>   replace, // overwrite destination
>   conf);
> } else {
> {code}
> The hdfsEncryptionShim instance holds a global FileSystem instance belong to 
> the default fileSystem. It causes failures when checking a path that belongs 
> to a remote file system.
> For example, I 
> {code:sql}
> key   int NULL
> # Detailed Table Information
> Database  bdms_hzyaoqin_test_2
> Table abc
> Owner bdms_hzyaoqin
> Created Time  Mon May 11 15:14:15 CST 2020
> Last Access   Thu Jan 01 08:00:00 CST 1970
> Created BySpark 2.4.3
> Type  MANAGED
> Provider  hive
> Table Properties  [transient_lastDdlTime=1589181255]
> Location  hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc
> Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> InputFormat   org.apache.hadoop.mapred.TextInputFormat
> OutputFormat  org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> Storage Properties[serialization.format=1]
> Partition ProviderCatalog
> Time taken: 0.224 

[jira] [Comment Edited] (SPARK-32838) Connot insert overwite different partition with same table

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194644#comment-17194644
 ] 

CHC edited comment on SPARK-32838 at 3/25/22, 10:02 AM:


After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

when set `spark.sql.hive.convertInsertingPartitionedTable`=false
will met this problem 
[SPARK-33144|https://issues.apache.org/jira/browse/SPARK-33144]


was (Author: chenxchen):
After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

wehn

> Connot insert overwite different partition with same table
> --
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32838) Connot insert overwite different partition with same table

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194644#comment-17194644
 ] 

CHC edited comment on SPARK-32838 at 3/25/22, 10:02 AM:


After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

also, when we set `spark.sql.hive.convertInsertingPartitionedTable`=false
will met this problem SPARK-33144


was (Author: chenxchen):
After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

when set `spark.sql.hive.convertInsertingPartitionedTable`=false
will met this problem 
[SPARK-33144|https://issues.apache.org/jira/browse/SPARK-33144]

> Connot insert overwite different partition with same table
> --
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32838) Connot insert overwite different partition with same table

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194644#comment-17194644
 ] 

CHC edited comment on SPARK-32838 at 3/25/22, 10:01 AM:


After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

wehn


was (Author: chenxchen):
After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 

> Connot insert overwite different partition with same table
> --
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512259#comment-17512259
 ] 

CHC edited comment on SPARK-33144 at 3/25/22, 9:51 AM:
---

also met this at Spark 3.2.1
when set `spark.sql.hive.convertInsertingPartitionedTable`=false
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}


was (Author: chenxchen):
also met this at Spark 3.2.1
when set spark.sql.hive.convertInsertingPartitionedTable=false
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
> hadoop 2.7.3 + spark 3.2.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> 

[jira] [Comment Edited] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512259#comment-17512259
 ] 

CHC edited comment on SPARK-33144 at 3/25/22, 9:51 AM:
---

also met this at Spark 3.2.1
when set spark.sql.hive.convertInsertingPartitionedTable=false
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}


was (Author: chenxchen):
also met this at Spark 3.2.1
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
> hadoop 2.7.3 + spark 3.2.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Updated] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-33144:

Priority: Major  (was: Critical)

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
> hadoop 2.7.3 + spark 3.2.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-2-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> 

[jira] [Comment Edited] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512259#comment-17512259
 ] 

CHC edited comment on SPARK-33144 at 3/25/22, 9:45 AM:
---

also met this at Spark 3.2.1
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}


was (Author: chenxchen):
also met this at Spark 3.2.1
{code:sql}
set hive.exec.dynamic.partition.mode=nonstrict;
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
> hadoop 2.7.3 + spark 3.2.1
>Reporter: CHC
>Priority: Critical
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Comment Edited] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512259#comment-17512259
 ] 

CHC edited comment on SPARK-33144 at 3/25/22, 8:50 AM:
---

also met this at Spark 3.2.1
{code:sql}
set hive.exec.dynamic.partition.mode=nonstrict;
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}


was (Author: chenxchen):
also met this at Spark 3.2.1
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
> hadoop 2.7.3 + spark 3.2.1
>Reporter: CHC
>Priority: Critical
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Updated] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-33144:

Environment: 
hadoop 2.7.3 + spark 3.0.1
hadoop 2.7.3 + spark 3.2.1

  was:hadoop 2.7.3 + spark 3.0.1


> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
> hadoop 2.7.3 + spark 3.2.1
>Reporter: CHC
>Priority: Critical
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Updated] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-33144:

Priority: Critical  (was: Major)

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Critical
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-2-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> 

[jira] [Updated] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-33144:

Affects Version/s: 3.2.1

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1, 3.2.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-2-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> 

[jira] [Comment Edited] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512259#comment-17512259
 ] 

CHC edited comment on SPARK-33144 at 3/25/22, 8:47 AM:
---

also met this at Spark 3.2.1
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}


was (Author: chenxchen):
also met this at Spark 3.2.1
{code:java}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> 

[jira] [Commented] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512259#comment-17512259
 ] 

CHC commented on SPARK-33144:
-

also met this at Spark 3.2.1
{code:java}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

insert overwrite table tmp.spark_multi_partition partition (name, version)
select
1 as id
, 'hadoop' as name
, '2.7.3' as version
;{code}

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 

[jira] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2022-03-25 Thread CHC (Jira)


[ https://issues.apache.org/jira/browse/SPARK-33144 ]


CHC deleted comment on SPARK-33144:
-

was (Author: chenxchen):
I met this problem SPARK-32838 , and change this configuration:
{code:sql}
set spark.sql.hive.convertInsertingPartitionedTable=false;
{code}
after change this, insert into multiple partition will get exception.
  
 

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Commented] (SPARK-32432) Add support for reading ORC/Parquet files with SymlinkTextInputFormat

2022-03-04 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501427#comment-17501427
 ] 

CHC commented on SPARK-32432:
-

As follows example, analyze the table size will be 100 instead of 1, which 
affects the join optimization, and it will attempt to deliver a large table 
broadcast if `manifest` file size is less than 
`spark.sql.autoBroadcastJoinThreshold`, and after this PR, analyze the table 
size will be 1


we have files:
```
size   filepath
100    hdfs:///path/to/table/manifest
   hdfs:///path/to/other/part-1.parquet.orc
1  hdfs:///path/to/other/part-2.parquet.orc
```

content of `hdfs:///path/to/table/manifest` :
```
hdfs:///path/to/other/part-1.parquet.orc
hdfs:///path/to/other/part-2.parquet.orc
```

table ddl:
```
CREATE EXTERNAL TABLE symlink_orc ( name STRING, version DOUBLE, sort INT )
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.SymlinkTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 'hdfs:///path/to/table';
```

> Add support for reading ORC/Parquet files with SymlinkTextInputFormat
> -
>
> Key: SPARK-32432
> URL: https://issues.apache.org/jira/browse/SPARK-32432
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Noritaka Sekiyama
>Priority: Major
>
> Hive style symlink (SymlinkTextInputFormat) is commonly used in different 
> analytic engines including prestodb and prestosql.
> Currently SymlinkTextInputFormat works with JSON/CSV files but does not work 
> with ORC/Parquet files in Apache Spark (and Apache Hive).
> On the other hand, prestodb and prestosql support SymlinkTextInputFormat with 
> ORC/Parquet files.
> This issue is to add support for reading ORC/Parquet files with 
> SymlinkTextInputFormat in Apache Spark.
>  
> Related links
>  * Hive's SymlinkTextInputFormat: 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java]
>  * prestosql's implementation to add support for reading avro files with 
> SymlinkTextInputFormat: 
> [https://github.com/vincentpoon/prestosql/blob/master/presto-hive/src/main/java/io/prestosql/plugin/hive/BackgroundHiveSplitLoader.java]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32838) Connot insert overwite different partition with same table

2021-05-14 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-32838:

Summary: Connot insert overwite different partition with same table  (was: 
Connot overwite different partition with same table)

> Connot insert overwite different partition with same table
> --
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2020-10-14 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213854#comment-17213854
 ] 

CHC edited comment on SPARK-33144 at 10/14/20, 12:04 PM:
-

I met this problem SPARK-32838 , and change this configuration:
{code:sql}
set spark.sql.hive.convertInsertingPartitionedTable=false;
{code}
after change this, insert into multiple partition will get exception.
  
 


was (Author: chenxchen):
{code:sql}
set spark.sql.hive.convertInsertingPartitionedTable=false;
{code}

 lead to this problem
 

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> 

[jira] [Issue Comment Deleted] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2020-10-14 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-33144:

Comment: was deleted

(was: set spark.sql.hive.convertInsertingPartitionedTable=false;
lead this problem)

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Commented] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2020-10-14 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213854#comment-17213854
 ] 

CHC commented on SPARK-33144:
-

{code:sql}
set spark.sql.hive.convertInsertingPartitionedTable=false;
{code}

 lead to this problem
 

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Commented] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2020-10-14 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213853#comment-17213853
 ] 

CHC commented on SPARK-33144:
-

set spark.sql.hive.convertInsertingPartitionedTable=false;
lead this problem

> Connot insert overwite multiple partition, get exception "get partition: 
> Value for key name is null or empty"
> -
>
> Key: SPARK-33144
> URL: https://issues.apache.org/jira/browse/SPARK-33144
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
> Environment: hadoop 2.7.3 + spark 3.0.1
>Reporter: CHC
>Priority: Major
>
> When: 
> {code:sql}
> create table tmp.spark_multi_partition(
> id int
> )
> partitioned by (name string, version string)
> stored as orc
> ;
> set hive.exec.dynamic.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition=true;
>  
> set hive.exec.dynamic.partition.mode=nonstrict;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> insert overwrite table tmp.spark_multi_partition partition (name, version)
> select
> *
> from (
>   select
>   1 as id
>   , 'hadoop' as name
>   , '2.7.3' as version
>   union
>   select
>   2 as id
>   , 'spark' as name
>   , '3.0.1' as version
>   union
>   select
>   3 as id
>   , 'hive' as name
>   , '2.3.4' as version
> ) as A;
> {code}
> and get exception:
> {code:bash}
> INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path 
> = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
>  with partSpec {name=spark, version=3.0.1}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-1 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
>  with partSpec {name=hadoop, version=2.7.3}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-2 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
>  with partSpec {name=hive, version=2.3.4}
> 20/10/14 09:15:33 INFO load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1919]: New loading path = 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
>  with partSpec {name=, version=}
> 20/10/14 09:15:33 ERROR load-dynamic-partitions-3 
> [hive.ql.metadata.Hive:1937]: Exception when loading partition with 
> parameters  
> partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
>   table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
> listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
> org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for 
> key name is null or empty
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
>   at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
>   at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
>   at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
>  to trash at: 
> hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
> 20/10/14 09:15:33 INFO load-dynamic-partitions-0 
> [org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it 
> doesn't exist: 
> hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
> 20/10/14 09:15:33 INFO Delete-Thread-0 
> [org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
> 

[jira] [Created] (SPARK-33144) Connot insert overwite multiple partition, get exception "get partition: Value for key name is null or empty"

2020-10-14 Thread CHC (Jira)
CHC created SPARK-33144:
---

 Summary: Connot insert overwite multiple partition, get exception 
"get partition: Value for key name is null or empty"
 Key: SPARK-33144
 URL: https://issues.apache.org/jira/browse/SPARK-33144
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
 Environment: hadoop 2.7.3 + spark 3.0.1
Reporter: CHC


When: 
{code:sql}
create table tmp.spark_multi_partition(
id int
)
partitioned by (name string, version string)
stored as orc
;

set hive.exec.dynamic.partition=true;
set spark.hadoop.hive.exec.dynamic.partition=true;
 
set hive.exec.dynamic.partition.mode=nonstrict;
set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table tmp.spark_multi_partition partition (name, version)
select
*
from (
  select
  1 as id
  , 'hadoop' as name
  , '2.7.3' as version
  union
  select
  2 as id
  , 'spark' as name
  , '3.0.1' as version
  union
  select
  3 as id
  , 'hive' as name
  , '2.3.4' as version
) as A;
{code}
and get exception:
{code:bash}
INFO load-dynamic-partitions-0 [hive.ql.metadata.Hive:1919]: New loading path = 
hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=spark/version=3.0.1
 with partSpec {name=spark, version=3.0.1}
20/10/14 09:15:33 INFO load-dynamic-partitions-1 [hive.ql.metadata.Hive:1919]: 
New loading path = 
hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hadoop/version=2.7.3
 with partSpec {name=hadoop, version=2.7.3}
20/10/14 09:15:33 INFO load-dynamic-partitions-2 [hive.ql.metadata.Hive:1919]: 
New loading path = 
hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/name=hive/version=2.3.4
 with partSpec {name=hive, version=2.3.4}
20/10/14 09:15:33 INFO load-dynamic-partitions-3 [hive.ql.metadata.Hive:1919]: 
New loading path = 
hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0
 with partSpec {name=, version=}
20/10/14 09:15:33 ERROR load-dynamic-partitions-3 [hive.ql.metadata.Hive:1937]: 
Exception when loading partition with parameters  
partPath=hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/.hive-staging_hive_2020-10-14_09-15-27_718_4118806337003279343-1/-ext-1/_temporary/0,
  table=spark_multi_partition,  partSpec={name=, version=},  replace=true,  
listBucketingEnabled=false,  isAcid=false,  hasFollowingStatsTask=false
org.apache.hadoop.hive.ql.metadata.HiveException: get partition: Value for key 
name is null or empty
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2233)
at org.apache.hadoop.hive.ql.metadata.Hive.getPartition(Hive.java:2181)
at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition(Hive.java:1611)
at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1922)
at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:1913)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
20/10/14 09:15:33 INFO Delete-Thread-0 
[org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
 to trash at: 
hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1/part-1-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
20/10/14 09:15:33 INFO load-dynamic-partitions-0 
[org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it doesn't 
exist: 
hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=spark/version=3.0.1
20/10/14 09:15:33 INFO Delete-Thread-0 
[org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 
'hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-2-b745147b-600f-4c79-8ba2-12a99283b0a9.c000'
 to trash at: 
hdfs://namespace/user/hive/.Trash/Current/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4/part-2-b745147b-600f-4c79-8ba2-12a99283b0a9.c000
20/10/14 09:15:33 INFO load-dynamic-partitions-2 
[org.apache.hadoop.hive.common.FileUtils:520]: Creating directory if it doesn't 
exist: 
hdfs://namespace/apps/hive/warehouse/tmp.db/spark_multi_partition/name=hive/version=2.3.4
20/10/14 09:15:33 INFO Delete-Thread-0 
[org.apache.hadoop.fs.TrashPolicyDefault:168]: Moved: 

[jira] [Comment Edited] (SPARK-32838) Connot overwite different partition with same table

2020-09-12 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194644#comment-17194644
 ] 

CHC edited comment on SPARK-32838 at 9/12/20, 7:47 AM:
---

After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand.

 

 


was (Author: chenxchen):
After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:

 
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand.

 

 

> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32838) Connot overwite different partition with same table

2020-09-12 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194644#comment-17194644
 ] 

CHC commented on SPARK-32838:
-

After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:

 
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand.

 

 

> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32838) Connot overwite different partition with same table

2020-09-12 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194644#comment-17194644
 ] 

CHC edited comment on SPARK-32838 at 9/12/20, 7:47 AM:
---

After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand. 


was (Author: chenxchen):
After spending a long time exploring,

I found that 
[HiveStrategies.scala|https://github.com/apache/spark/blob/v3.0.0/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L209-L215]
 will convert HiveTableRelation to LogicalRelation,

and this will match this case 
[DataSourceAnalysis|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L166-L228]
 condition

(at spark 2.4.3, HiveTableRelation will not to convert to LogicalRelation if 
table is partitioned,

so if table is none partitioned, insert overwirte itselft also will be get an 
error)

This is ok when:
{code:java}
set spark.sql.hive.convertInsertingPartitionedTable=false;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
I think this is bug cause this scene is normal demand.

 

 

> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32838) Connot overwite different partition with same table

2020-09-11 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194448#comment-17194448
 ] 

CHC commented on SPARK-32838:
-

[~dongjoon] Yes, I have tried this, this is right, but there are too many sql 
with static partition like this...Thanks.

> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32838) Connot overwite different partition with same table

2020-09-11 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194406#comment-17194406
 ] 

CHC commented on SPARK-32838:
-

[~rohitmishr1484] Ok, I get it. Thanks :)

> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Major
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32838) Connot overwite different partition with same table

2020-09-09 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-32838:

Description: 
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194

This work on spark 2.4.3 and do not work on spark 3.0.0
  

  was:
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194

This work on spark 2.4.3 and do not work on spark 3.0.0
  


> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Critical
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32838) Connot overwite different partition with same table

2020-09-09 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-32838:

Description: 
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194

This work on spark 2.4.3 and do not work on spark 3.0.0
  

  was:
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194

This work on spark 2.4.3 success
  


> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Critical
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 and do not work on spark 3.0.0
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32838) Connot overwite different partition with same table

2020-09-09 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-32838:

Description: 
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194

This work on spark 2.4.3 success
  

  was:
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194
 


> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Critical
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
> This work on spark 2.4.3 success
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32838) Connot overwite different partition with same table

2020-09-09 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-32838:

Description: 
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194

  was:
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194


> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Critical
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-32838) Connot overwite different partition with same table

2020-09-09 Thread CHC (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-32838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

CHC updated SPARK-32838:

Description: 
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194
 

  was:
When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194


> Connot overwite different partition with same table
> ---
>
> Key: SPARK-32838
> URL: https://issues.apache.org/jira/browse/SPARK-32838
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
> Environment: hadoop 2.7 + spark 3.0.0
>Reporter: CHC
>Priority: Critical
>
> When:
> {code:java}
> CREATE TABLE tmp.spark3_snap (
> id string
> )
> PARTITIONED BY (dt string)
> STORED AS ORC
> ;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
> select 10;
> insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
> select 1;
> insert overwrite table tmp.spark3_snap partition(dt='2020-09-10')
> select id from tmp.spark3_snap where dt='2020-09-09';
> {code}
> and it will be get a error: "Cannot overwrite a path that is also being read 
> from"
> related: https://issues.apache.org/jira/browse/SPARK-24194
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-32838) Connot overwite different partition with same table

2020-09-09 Thread CHC (Jira)
CHC created SPARK-32838:
---

 Summary: Connot overwite different partition with same table
 Key: SPARK-32838
 URL: https://issues.apache.org/jira/browse/SPARK-32838
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
 Environment: hadoop 2.7 + spark 3.0.0
Reporter: CHC


When:
{code:java}
CREATE TABLE tmp.spark3_snap (
id string
)
PARTITIONED BY (dt string)
STORED AS ORC
;

insert overwrite tmp.spark3_snap partition(dt='2020-09-09')
select 10;
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select 1;

insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and it will be get a error: "Cannot overwrite a path that is also being read 
from"

related: https://issues.apache.org/jira/browse/SPARK-24194



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24194) HadoopFsRelation cannot overwrite a path that is also being read from

2020-09-09 Thread CHC (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17193329#comment-17193329
 ] 

CHC commented on SPARK-24194:
-

I met this on spark 3.0.0 too

When:
{code:java}
insert overwrite tmp.spark3_snap partition(dt='2020-09-10')
select id from tmp.spark3_snap where dt='2020-09-09';
{code}
and get error: "Error in query: Cannot overwrite a path that is also being read 
from.;"

> HadoopFsRelation cannot overwrite a path that is also being read from
> -
>
> Key: SPARK-24194
> URL: https://issues.apache.org/jira/browse/SPARK-24194
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
> Environment: spark master
>Reporter: yangz
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When
> {code:java}
> INSERT OVERWRITE TABLE territory_count_compare select * from 
> territory_count_compare where shop_count!=real_shop_count
> {code}
> And territory_count_compare is a table with parquet, there will be a error 
> Cannot overwrite a path that is also being read from
>  
> And in file MetastoreDataSourceSuite.scala, there have a test case
>  
>  
> {code:java}
> table(tableName).write.mode(SaveMode.Overwrite).insertInto(tableName)
> {code}
>  
> But when the table territory_count_compare is a common hive table, there is 
> no error. 
> So I think the reason is when insert overwrite into hadoopfs relation with 
> static partition, it first delete the partition in the output. But it should 
> be the time when the job commited.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org