[jira] [Resolved] (SPARK-39153) When we look at spark UI or History, we can see the failed tasks first
[ https://issues.apache.org/jira/browse/SPARK-39153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-39153. - Resolution: Not A Problem > When we look at spark UI or History, we can see the failed tasks first > -- > > Key: SPARK-39153 > URL: https://issues.apache.org/jira/browse/SPARK-39153 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 > Environment: spark 3.2.0 >Reporter: jingxiong zhong >Priority: Major > Fix For: 3.2.0 > > > When a task fails, users are more concerned about the causes of failed tasks > and failed tasks. The Current Spark UI and History are sorted according to > "Index" rather than "Errors". When a large number of tasks are sorted, you > need to wait a certain period for tasks to be sorted. In order to find the > cause of Errors for failed tasks, we can improve the user experience by > specifying sorting by the "Errors" column at the beginning -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39967) Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed
[ https://issues.apache.org/jira/browse/SPARK-39967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-39967. - Resolution: Fixed New version not reproduced > Instead of using the scalar tasksSuccessful, use the successful array to > calculate whether the task is completed > > > Key: SPARK-39967 > URL: https://issues.apache.org/jira/browse/SPARK-39967 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.3, 2.4.6 >Reporter: jingxiong zhong >Priority: Critical > Attachments: spark1-1.png, spark2.png, spark3-1.png > > > When counting the number of successful tasks in the stage of spark, spark > uses the indicator of `tasksSuccessful`, but in fact, the success or failure > of tasks is based on the array of `successful`. Through the log I added, it > is found that the number of failed tasks counted by `tasksSuccessful` is > inconsistent with the number of failures stored in the array of `successful`. > We should take `successful` as the standard. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42392) Add a new case of TriggeredByExecutorDecommissionInfo to remove unnecessary param
jingxiong zhong created SPARK-42392: --- Summary: Add a new case of TriggeredByExecutorDecommissionInfo to remove unnecessary param Key: SPARK-42392 URL: https://issues.apache.org/jira/browse/SPARK-42392 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: jingxiong zhong Add a new case of TriggeredByExecutorDecommissionInfo and no need to add additional parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42336) Use getOrElse() instead of contains() in
[ https://issues.apache.org/jira/browse/SPARK-42336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-42336: Summary: Use getOrElse() instead of contains() in(was: Use OpenHashMap instead of HashMap) > Use getOrElse() instead of contains() in > -- > > Key: SPARK-42336 > URL: https://issues.apache.org/jira/browse/SPARK-42336 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Minor > > In ResourceAllocator, we can use `.getOrElse(address, throw new > SparkException(...))` instead of one `contains` which can gain better > performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42336) Use getOrElse() instead of contains() in ResourceAllocator
[ https://issues.apache.org/jira/browse/SPARK-42336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-42336: Summary: Use getOrElse() instead of contains() in ResourceAllocator (was: Use getOrElse() instead of contains() in ) > Use getOrElse() instead of contains() in ResourceAllocator > --- > > Key: SPARK-42336 > URL: https://issues.apache.org/jira/browse/SPARK-42336 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Minor > > In ResourceAllocator, we can use `.getOrElse(address, throw new > SparkException(...))` instead of one `contains` which can gain better > performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42336) Use OpenHashMap instead of HashMap
[ https://issues.apache.org/jira/browse/SPARK-42336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-42336: Description: In ResourceAllocator, we can use `.getOrElse(address, throw new SparkException(...))` instead of one `contains` which can gain better performance. (was: In ResourceAllocator, we can use OpenHashMap instead of HashMap, which can gain better performance.) > Use OpenHashMap instead of HashMap > -- > > Key: SPARK-42336 > URL: https://issues.apache.org/jira/browse/SPARK-42336 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Minor > > In ResourceAllocator, we can use `.getOrElse(address, throw new > SparkException(...))` instead of one `contains` which can gain better > performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42336) Use OpenHashMap instead of HashMap
jingxiong zhong created SPARK-42336: --- Summary: Use OpenHashMap instead of HashMap Key: SPARK-42336 URL: https://issues.apache.org/jira/browse/SPARK-42336 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: jingxiong zhong In ResourceAllocator, we can use OpenHashMap instead of HashMap, which can gain better performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
[ https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17672320#comment-17672320 ] jingxiong zhong commented on SPARK-41982: - cc [~cloud_fan] [~gurwls223] I want to know your opinion. > When the inserted partition type is of string type, similar `dt=01` will be > converted to `dt=1` > --- > > Key: SPARK-41982 > URL: https://issues.apache.org/jira/browse/SPARK-41982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Critical > > At present, during the process of upgrading Spark2.4 to Spark3.2, we > carefully read the migration documentwe and found a kind of situation not > involved: > {code:java} > create table if not exists test_90(a string, b string) partitioned by (dt > string); > desc formatted test_90; > // case1 > insert into table test_90 partition (dt=05) values("1","2"); > // case2 > insert into table test_90 partition (dt='05') values("1","2"); > drop table test_90;{code} > in spark2.4.3, it will generate such a path: > {code:java} > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > //result > spark-sql> select * from test_90; > 1 2 05 > 1 2 05 > Time taken: 1.316 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > Time taken: 0.201 seconds, Fetched 1 row(s) > spark-sql> select * from test_90 where dt='05'; > 1 2 05 > 1 2 05 > Time taken: 0.212 seconds, Fetched 2 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > == Physical Plan == > Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, > [a, b] > +- LocalTableScan [a#116, b#117] > Time taken: 1.145 seconds, Fetched 1 row(s){code} > in spark3.2.0, it will generate two path: > {code:java} > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > hdfs://test5/user/hive/db1/test_90/dt=5 > // result > spark-sql> select * from test_90; > 1 2 05 > 1 2 5 > Time taken: 2.119 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > dt=5 > Time taken: 0.161 seconds, Fetched 2 row(s) > spark-sql> select * from test_90 where dt='05'; > 1 2 05 > Time taken: 0.252 seconds, Fetched 1 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > plan > == Physical Plan == > Execute InsertIntoHiveTable `db1`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] > +- LocalTableScan [a#109, b#110]{code} > This will cause problems in reading data after the user switches to spark3. > The root cause is that in the process of partition field resolution, Spark3 > has a process of strongly converting this string type, which will cause > partition `05` to lose the previous `0` > So I think we have two solutions: > one is to record the risk clearly in the migration document, and the other is > to repair this case, because we internally keep the partition of string type > as string type, regardless of whether single or double quotation marks are > added. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
[ https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41982: Description: At present, during the process of upgrading Spark2.4 to Spark3.2, we carefully read the migration documentwe and found a kind of situation not involved: {code:java} create table if not exists test_90(a string, b string) partitioned by (dt string); desc formatted test_90; // case1 insert into table test_90 partition (dt=05) values("1","2"); // case2 insert into table test_90 partition (dt='05') values("1","2"); drop table test_90;{code} in spark2.4.3, it will generate such a path: {code:java} // the path hdfs://test5/user/hive/db1/test_90/dt=05 //result spark-sql> select * from test_90; 1 2 05 1 2 05 Time taken: 1.316 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 Time taken: 0.201 seconds, Fetched 1 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 1 2 05 Time taken: 0.212 seconds, Fetched 2 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); == Physical Plan == Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, [a, b] +- LocalTableScan [a#116, b#117] Time taken: 1.145 seconds, Fetched 1 row(s){code} in spark3.2.0, it will generate two path: {code:java} // the path hdfs://test5/user/hive/db1/test_90/dt=05 hdfs://test5/user/hive/db1/test_90/dt=5 // result spark-sql> select * from test_90; 1 2 05 1 2 5 Time taken: 2.119 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 dt=5 Time taken: 0.161 seconds, Fetched 2 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 Time taken: 0.252 seconds, Fetched 1 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); plan == Physical Plan == Execute InsertIntoHiveTable `db1`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] +- LocalTableScan [a#109, b#110]{code} This will cause problems in reading data after the user switches to spark3. The root cause is that in the process of partition field resolution, Spark3 has a process of strongly converting this string type, which will cause partition `05` to lose the previous `0` So I think we have two solutions: one is to record the risk clearly in the migration document, and the other is to repair this case, because we internally keep the partition of string type as string type, regardless of whether single or double quotation marks are added. was: At present, during the process of upgrading Spark2.4 to Spark3.2, we carefully read the migration documentwe and found a kind of situation not involved: {code:java} //代码占位符 create table if not exists test_90(a string, b string) partitioned by (dt string); desc formatted test_90; // case1 insert into table test_90 partition (dt=05) values("1","2"); // case2 insert into table test_90 partition (dt='05') values("1","2"); drop table test_90;{code} in spark2.4.3, it will generate such a path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 //result spark-sql> select * from test_90; 1 2 05 1 2 05 Time taken: 1.316 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 Time taken: 0.201 seconds, Fetched 1 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 1 2 05 Time taken: 0.212 seconds, Fetched 2 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); == Physical Plan == Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, [a, b] +- LocalTableScan [a#116, b#117] Time taken: 1.145 seconds, Fetched 1 row(s){code} in spark3.2.0, it will generate two path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 hdfs://test5/user/hive/db1/test_90/dt=5 // result spark-sql> select * from test_90; 1 2 05 1 2 5 Time taken: 2.119 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 dt=5 Time taken: 0.161 seconds, Fetched 2 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 Time taken: 0.252 seconds, Fetched 1 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); plan == Physical Plan == Execute InsertIntoHiveTable `db1`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] +- LocalTableScan [a#109, b#110]{code} This will cause problems in reading data after the user switches to spark3. The root cause is that in the process of partition field resolution, Spark3 has a process of strongly converting this string type, which will
[jira] [Updated] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
[ https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41982: Description: At present, during the process of upgrading Spark2.4 to Spark3.2, we carefully read the migration documentwe and found a kind of situation not involved: {code:java} //代码占位符 create table if not exists test_90(a string, b string) partitioned by (dt string); desc formatted test_90; // case1 insert into table test_90 partition (dt=05) values("1","2"); // case2 insert into table test_90 partition (dt='05') values("1","2"); drop table test_90;{code} in spark2.4.3, it will generate such a path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 //result spark-sql> select * from test_90; 1 2 05 1 2 05 Time taken: 1.316 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 Time taken: 0.201 seconds, Fetched 1 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 1 2 05 Time taken: 0.212 seconds, Fetched 2 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); == Physical Plan == Execute InsertIntoHiveTable InsertIntoHiveTable `db1`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, [a, b] +- LocalTableScan [a#116, b#117] Time taken: 1.145 seconds, Fetched 1 row(s){code} in spark3.2.0, it will generate two path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 hdfs://test5/user/hive/db1/test_90/dt=5 // result spark-sql> select * from test_90; 1 2 05 1 2 5 Time taken: 2.119 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 dt=5 Time taken: 0.161 seconds, Fetched 2 row(s) spark-sql> select * from test_90 where dt='05'; 1 2 05 Time taken: 0.252 seconds, Fetched 1 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); plan == Physical Plan == Execute InsertIntoHiveTable `db1`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] +- LocalTableScan [a#109, b#110]{code} This will cause problems in reading data after the user switches to spark3. The root cause is that in the process of partition field resolution, Spark3 has a process of strongly converting this string type, which will cause partition `05` to lose the previous `0` So I think we have two solutions: one is to record the risk clearly in the migration document, and the other is to repair this case, because we internally keep the partition of string type as string type, regardless of whether single or double quotation marks are added. was: At present, during the process of upgrading Spark2.4 to Spark3.2, we carefully read the migration documentwe and found a kind of situation not involved: {code:java} //代码占位符 create table if not exists test_90(a string, b string) partitioned by (dt string); desc formatted test_90; // case1 insert into table test_90 partition (dt=05) values("1","2"); // case2 insert into table test_90 partition (dt='05') values("1","2"); drop table test_90;{code} in spark2.4.3, it will generate such a path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 //result spark-sql> select * from test_90; 1 2 05 1 2 05 Time taken: 1.316 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 Time taken: 0.201 seconds, Fetched 1 row(s) spark-sql> select * from bigdata_qa.test_90 where dt='05'; 1 2 05 1 2 05 Time taken: 0.212 seconds, Fetched 2 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); == Physical Plan == Execute InsertIntoHiveTable InsertIntoHiveTable `bigdata_qa`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, [a, b] +- LocalTableScan [a#116, b#117] Time taken: 1.145 seconds, Fetched 1 row(s){code} in spark3.2.0, it will generate two path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 hdfs://test5/user/hive/db1/test_90/dt=5 // result spark-sql> select * from test_90; 1 2 05 1 2 5 Time taken: 2.119 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 dt=5 Time taken: 0.161 seconds, Fetched 2 row(s) spark-sql> select * from bigdata_qa.test_90 where dt='05'; 1 2 05 Time taken: 0.252 seconds, Fetched 1 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); plan == Physical Plan == Execute InsertIntoHiveTable `bigdata_qa`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] +- LocalTableScan [a#109, b#110]{code} This will cause problems in reading data after the user switches to spark3. The root cause is that in the process of partition field resolution,
[jira] [Updated] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
[ https://issues.apache.org/jira/browse/SPARK-41982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41982: Description: At present, during the process of upgrading Spark2.4 to Spark3.2, we carefully read the migration documentwe and found a kind of situation not involved: {code:java} //代码占位符 create table if not exists test_90(a string, b string) partitioned by (dt string); desc formatted test_90; // case1 insert into table test_90 partition (dt=05) values("1","2"); // case2 insert into table test_90 partition (dt='05') values("1","2"); drop table test_90;{code} in spark2.4.3, it will generate such a path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 //result spark-sql> select * from test_90; 1 2 05 1 2 05 Time taken: 1.316 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 Time taken: 0.201 seconds, Fetched 1 row(s) spark-sql> select * from bigdata_qa.test_90 where dt='05'; 1 2 05 1 2 05 Time taken: 0.212 seconds, Fetched 2 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); == Physical Plan == Execute InsertIntoHiveTable InsertIntoHiveTable `bigdata_qa`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, [a, b] +- LocalTableScan [a#116, b#117] Time taken: 1.145 seconds, Fetched 1 row(s){code} in spark3.2.0, it will generate two path: {code:java} //代码占位符 // the path hdfs://test5/user/hive/db1/test_90/dt=05 hdfs://test5/user/hive/db1/test_90/dt=5 // result spark-sql> select * from test_90; 1 2 05 1 2 5 Time taken: 2.119 seconds, Fetched 2 row(s) spark-sql> show partitions test_90; dt=05 dt=5 Time taken: 0.161 seconds, Fetched 2 row(s) spark-sql> select * from bigdata_qa.test_90 where dt='05'; 1 2 05 Time taken: 0.252 seconds, Fetched 1 row(s) spark-sql> explain insert into table test_90 partition (dt=05) values("1","2"); plan == Physical Plan == Execute InsertIntoHiveTable `bigdata_qa`.`test_90`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, [dt=Some(5)], false, false, [a, b] +- LocalTableScan [a#109, b#110]{code} This will cause problems in reading data after the user switches to spark3. The root cause is that in the process of partition field resolution, Spark3 has a process of strongly converting this string type, which will cause partition `05` to lose the previous `0` So I think we have two solutions: one is to record the risk clearly in the migration document, and the other is to repair this case, because we internally keep the partition of string type as string type, regardless of whether single or double quotation marks are added. > When the inserted partition type is of string type, similar `dt=01` will be > converted to `dt=1` > --- > > Key: SPARK-41982 > URL: https://issues.apache.org/jira/browse/SPARK-41982 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Critical > > At present, during the process of upgrading Spark2.4 to Spark3.2, we > carefully read the migration documentwe and found a kind of situation not > involved: > > {code:java} > //代码占位符 > create table if not exists test_90(a string, b string) partitioned by (dt > string); > desc formatted test_90; > // case1 > insert into table test_90 partition (dt=05) values("1","2"); > // case2 > insert into table test_90 partition (dt='05') values("1","2"); > drop table test_90;{code} > in spark2.4.3, it will generate such a path: > > > {code:java} > //代码占位符 > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > //result > spark-sql> select * from test_90; > 1 2 05 > 1 2 05 > Time taken: 1.316 seconds, Fetched 2 row(s) > spark-sql> show partitions test_90; > dt=05 > Time taken: 0.201 seconds, Fetched 1 row(s) > spark-sql> select * from bigdata_qa.test_90 where dt='05'; > 1 2 05 > 1 2 05 > Time taken: 0.212 seconds, Fetched 2 row(s) > spark-sql> explain insert into table test_90 partition (dt=05) > values("1","2"); > == Physical Plan == > Execute InsertIntoHiveTable InsertIntoHiveTable `bigdata_qa`.`test_90`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Map(dt -> Some(05)), false, false, > [a, b] > +- LocalTableScan [a#116, b#117] > Time taken: 1.145 seconds, Fetched 1 row(s){code} > in spark3.2.0, it will generate two path: > {code:java} > //代码占位符 > // the path > hdfs://test5/user/hive/db1/test_90/dt=05 > hdfs://test5/user/hive/db1/test_90/dt=5 > // result > spark-sql> select * from test_90; > 1 2 05 > 1 2 5 > Time taken: 2.119 seconds, Fetched 2 row(s) > spark-sql>
[jira] [Created] (SPARK-41982) When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1`
jingxiong zhong created SPARK-41982: --- Summary: When the inserted partition type is of string type, similar `dt=01` will be converted to `dt=1` Key: SPARK-41982 URL: https://issues.apache.org/jira/browse/SPARK-41982 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: jingxiong zhong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41943) Use java api to create files and grant permissions is DiskBlockManager
jingxiong zhong created SPARK-41943: --- Summary: Use java api to create files and grant permissions is DiskBlockManager Key: SPARK-41943 URL: https://issues.apache.org/jira/browse/SPARK-41943 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.0 Reporter: jingxiong zhong For method {{{}createDirWithPermission770{}}}, using java api to create files and grant permissions instead of calling shell commands. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17653733#comment-17653733 ] jingxiong zhong commented on SPARK-37677: - At present, I have repaired Hadoop version 3.3.5, but it has not been released yet. In the future, Spark needs to update the Hadoop version to solve this problem.[~valux] > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-37521. - Resolution: Won't Fix > insert overwrite table but the partition information stored in Metastore was > not changed > > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Major > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9 > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1 partition(dt='20200104') values(1, > 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41769) Remove useless semicolons
[ https://issues.apache.org/jira/browse/SPARK-41769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-41769. - Resolution: Won't Fix > Remove useless semicolons > - > > Key: SPARK-41769 > URL: https://issues.apache.org/jira/browse/SPARK-41769 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41769) Remove useless semicolons
[ https://issues.apache.org/jira/browse/SPARK-41769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41769: Component/s: (was: Spark Core) > Remove useless semicolons > - > > Key: SPARK-41769 > URL: https://issues.apache.org/jira/browse/SPARK-41769 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jingxiong zhong >Priority: Trivial > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41769) Remove useless semicolons
jingxiong zhong created SPARK-41769: --- Summary: Remove useless semicolons Key: SPARK-41769 URL: https://issues.apache.org/jira/browse/SPARK-41769 Project: Spark Issue Type: Improvement Components: Spark Core, SQL Affects Versions: 3.4.0 Reporter: jingxiong zhong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638532#comment-17638532 ] jingxiong zhong commented on SPARK-41236: - I think you can a pr for it [~huldar] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) > at >
[jira] [Comment Edited] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638532#comment-17638532 ] jingxiong zhong edited comment on SPARK-41236 at 11/25/22 7:14 AM: --- I think you can raise a pr for it [~huldar] was (Author: JIRAUSER281124): I think you can a pr for it [~huldar] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > {code:java} > select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 > {code} > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > h3. *like this:* > {code:sql} > create db1.table1(age int, name string); > insert into db1.table1 values(1, 'a'); > insert into db1.table1 values(2, 'b'); > insert into db1.table1 values(3, 'c'); > --then run sql like this > select collect_set(age) as age from db1.table1 group by name having size(age) > > 1 ; > {code} > h3. Stack Information > org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input > columns: [age]; line 4 pos 12; > 'Filter (size('age, true) > 1) > +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] >+- SubqueryAlias spark_catalog.db1.table1 > +- HiveTableRelation [`db1`.`table1`, > org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], > Partition Cols: []] > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) > at > org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) > at >
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: {code:java} select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 {code} a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? h3. *like this:* {code:sql} create db1.table1(age int, name string); insert into db1.table1 values(1, 'a'); insert into db1.table1 values(2, 'b'); insert into db1.table1 values(3, 'c'); --then run sql like this select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ; {code} h3. Stack Information org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12; 'Filter (size('age, true) > 1) +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] +- SubqueryAlias spark_catalog.db1.table1 +- HiveTableRelation [`db1`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88) at
[jira] [Updated] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41229: Description: SQL1: {code:java} with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1; {code} It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: {code:java} with table_hive1 as(select * from db1.table_hive) select * from table_hive1; {code} It work well. I'm a little confused. Is this syntax with database name not supported. you can run like this: {code:java} create db1.table_hive(age int, name string); insert into db1.table_hive values(1, 'a'); insert into db1.table_hive values(2, 'b'); insert into db1.table_hive values(3, 'c'); with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1; {code} was: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. you can run like this: create db1.table_hive(age int, name string); insert into db1.table_hive values(1, 'a'); insert into db1.table_hive values(2, 'b'); insert into db1.table_hive values(3, 'c'); with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1; > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Major > > SQL1: > {code:java} > with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1; {code} > > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: db1.table_hive1;`but spark in 2.4.3 work well. > SQL2: > {code:java} > with table_hive1 as(select * from db1.table_hive) > select * from table_hive1; {code} > It work well. > I'm a little confused. Is this syntax with database name not supported. > you can run like this: > {code:java} > create db1.table_hive(age int, name string); > insert into db1.table_hive values(1, 'a'); > insert into db1.table_hive values(2, 'b'); > insert into db1.table_hive values(3, 'c'); > with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1; {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: {code:java} select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 {code} a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? h3. *like this:* {code:sql} create db1.table1(age int, name string); insert into db1.table1 values(1, 'a'); insert into db1.table1 values(2, 'b'); insert into db1.table1 values(3, 'c'); --then run sql like this select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ; {code} h3. Stack Information org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12; 'Filter (size('age, true) > 1) +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] +- SubqueryAlias spark_catalog.bigdata_qa.table1 +- HiveTableRelation [`bigdata_qa`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88) at
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: {code:java} select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 {code} a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? h3. *like this:* {code:sql} create db1.table1(age int, name string); insert into db1.table1 values(1, 'a'); insert into db1.table1 values(2, 'b'); insert into db1.table1 values(3, 'c'); {code} then run sql like this `select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;` h3. Stack Information org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12; 'Filter (size('age, true) > 1) +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] +- SubqueryAlias spark_catalog.bigdata_qa.table1 +- HiveTableRelation [`bigdata_qa`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:88) at
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: {code:java} select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 {code} a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? h3. *like this:* spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds then run sql like this `select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;` h3. Stack Information org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12; 'Filter (size('age, true) > 1) +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] +- SubqueryAlias spark_catalog.bigdata_qa.table1 +- HiveTableRelation [`bigdata_qa`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? h3. *like this:* spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds then run sql like this `select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;` h3. Stack Information org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12; 'Filter (size('age, true) > 1) +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] +- SubqueryAlias spark_catalog.bigdata_qa.table1 +- HiveTableRelation [`bigdata_qa`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? like this: spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds then run sql like this `select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;` Stack Information org.apache.spark.sql.AnalysisException: cannot resolve 'age' given input columns: [age]; line 4 pos 12; 'Filter (size('age, true) > 1) +- Aggregate [name#2], [collect_set(age#1, 0, 0) AS age#0] +- SubqueryAlias spark_catalog.bigdata_qa.table1 +- HiveTableRelation [`bigdata_qa`.`table1`, org.apache.hadoop.hive.ql.io.orc.OrcSerde, Data Cols: [age#1, name#2], Partition Cols: []] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:54) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:179) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$$nestedInanonfun$checkAnalysis$1$2.applyOrElse(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$2(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:535) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1128) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1127) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:467) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:532) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1154) at org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1153) at org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:555) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:532) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsUpWithPruning$1(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:193) at org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:204) at org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:323) at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:214) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUpWithPruning(QueryPlan.scala:181) at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:161) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:175) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:263) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:94) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:91) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:172) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:196) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:192) at
[jira] [Commented] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638148#comment-17638148 ] jingxiong zhong commented on SPARK-41229: - Sorry,:(I don't know what’s the self-contained reproducer, I added SQL above that can reproduce such errors [~hyukjin.kwon] > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Major > > SQL1: > ```with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;``` > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: db1.table_hive1;`but spark in 2.4.3 work well. > SQL2: > ```with table_hive1 as(select * from db1.table_hive) > select * from table_hive1;``` > It work well. > I'm a little confused. Is this syntax with database name not supported. > you can run like this: > create db1.table_hive(age int, name string); > insert into db1.table_hive values(1, 'a'); > insert into db1.table_hive values(2, 'b'); > insert into db1.table_hive values(3, 'c'); > with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1; -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41229: Description: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. you can run like this: create db1.table_hive(age int, name string); insert into db1.table_hive values(1, 'a'); insert into db1.table_hive values(2, 'b'); insert into db1.table_hive values(3, 'c'); with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1; was: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. you can run like this: create db1.table_hive(age int, name string); insert into db1.table_hive values(1, 'a'); insert into db1.table_hive values(2, 'b'); insert into db1.table_hive values(3, 'c'); `with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;` > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Major > > SQL1: > ```with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;``` > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: db1.table_hive1;`but spark in 2.4.3 work well. > SQL2: > ```with table_hive1 as(select * from db1.table_hive) > select * from table_hive1;``` > It work well. > I'm a little confused. Is this syntax with database name not supported. > you can run like this: > create db1.table_hive(age int, name string); > insert into db1.table_hive values(1, 'a'); > insert into db1.table_hive values(2, 'b'); > insert into db1.table_hive values(3, 'c'); > with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1; -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41229: Description: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. you can run like this: create db1.table_hive(age int, name string); insert into db1.table_hive values(1, 'a'); insert into db1.table_hive values(2, 'b'); insert into db1.table_hive values(3, 'c'); `with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;` was: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Major > > SQL1: > ```with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;``` > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: db1.table_hive1;`but spark in 2.4.3 work well. > SQL2: > ```with table_hive1 as(select * from db1.table_hive) > select * from table_hive1;``` > It work well. > I'm a little confused. Is this syntax with database name not supported. > you can run like this: > create db1.table_hive(age int, name string); > insert into db1.table_hive values(1, 'a'); > insert into db1.table_hive values(2, 'b'); > insert into db1.table_hive values(3, 'c'); > `with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? like this: spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds then run sql like this `select collect_set(age) as age from db1.table1 group by name having size(age) > 1 ;` was: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? like this: spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds spark-sql> select collect_set(age) as age > from db1.table1 > group by name > having size(age) > 1 ; > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > like this: > spark-sql> create db1.table1(age int, name string); > Time taken: 1.709 seconds > spark-sql> insert into db1.table1 values(1, 'a'); > Time taken: 2.114 seconds > spark-sql> insert into db1.table1 values(2, 'b'); > Time taken: 10.208 seconds > spark-sql> insert into db1.table1 values(3, 'c'); > Time taken: 0.673 seconds > then run sql like this `select collect_set(age) as age from db1.table1 group > by name having size(age) > 1 ;` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? like this: spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds spark-sql> select collect_set(age) as age > from db1.table1 > group by name > having size(age) > 1 ; was: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? like this: spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds spark-sql> select collect_set(age) as age > from db1.table1 > group by name > having size(age) > 1 ; Time taken: 3.022 seconds > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > like this: > spark-sql> create db1.table1(age int, name string); > Time taken: 1.709 seconds > spark-sql> insert into db1.table1 values(1, 'a'); > Time taken: 2.114 seconds > spark-sql> insert into db1.table1 values(2, 'b'); > Time taken: 10.208 seconds > spark-sql> insert into db1.table1 values(3, 'c'); > Time taken: 0.673 seconds > spark-sql> select collect_set(age) as age > > from db1.table1 > > group by name > > having size(age) > 1 ; -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638142#comment-17638142 ] jingxiong zhong commented on SPARK-41236: - Thanks a lot, Is it to write the specific case like this. [~hyukjin.kwon] > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > like this: > spark-sql> create db1.table1(age int, name string); > Time taken: 1.709 seconds > spark-sql> insert into db1.table1 values(1, 'a'); > Time taken: 2.114 seconds > spark-sql> insert into db1.table1 values(2, 'b'); > Time taken: 10.208 seconds > spark-sql> insert into db1.table1 values(3, 'c'); > Time taken: 0.673 seconds > spark-sql> select collect_set(age) as age > > from db1.table1 > > group by name > > having size(age) > 1 ; > Time taken: 3.022 seconds -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Description: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? like this: spark-sql> create db1.table1(age int, name string); Time taken: 1.709 seconds spark-sql> insert into db1.table1 values(1, 'a'); Time taken: 2.114 seconds spark-sql> insert into db1.table1 values(2, 'b'); Time taken: 10.208 seconds spark-sql> insert into db1.table1 values(3, 'c'); Time taken: 0.673 seconds spark-sql> select collect_set(age) as age > from db1.table1 > group by name > having size(age) > 1 ; Time taken: 3.022 seconds was: `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? > like this: > spark-sql> create db1.table1(age int, name string); > Time taken: 1.709 seconds > spark-sql> insert into db1.table1 values(1, 'a'); > Time taken: 2.114 seconds > spark-sql> insert into db1.table1 values(2, 'b'); > Time taken: 10.208 seconds > spark-sql> insert into db1.table1 values(3, 'c'); > Time taken: 0.673 seconds > spark-sql> select collect_set(age) as age > > from db1.table1 > > group by name > > having size(age) > 1 ; > Time taken: 3.022 seconds -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41236) The renamed field name cannot be recognized after group filtering
[ https://issues.apache.org/jira/browse/SPARK-41236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41236: Summary: The renamed field name cannot be recognized after group filtering (was: The renamed field name cannot be recognized after group filtering, but it is the same as the original field name) > The renamed field name cannot be recognized after group filtering > - > > Key: SPARK-41236 > URL: https://issues.apache.org/jira/browse/SPARK-41236 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Blocker > > `select collect_set(age) as age > from db_table.table1 > group by name > having size(age) > 1 ` > a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 > Is it a bug or a new standard? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41236) The renamed field name cannot be recognized after group filtering, but it is the same as the original field name
jingxiong zhong created SPARK-41236: --- Summary: The renamed field name cannot be recognized after group filtering, but it is the same as the original field name Key: SPARK-41236 URL: https://issues.apache.org/jira/browse/SPARK-41236 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: jingxiong zhong `select collect_set(age) as age from db_table.table1 group by name having size(age) > 1 ` a simple sql, it work well in spark2.4, but doesn't work in spark3.2.0 Is it a bug or a new standard? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41229: Description: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: db1.table_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. was: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: bigdata_qa.zjx_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Blocker > > SQL1: > ```with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;``` > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: db1.table_hive1;`but spark in 2.4.3 work well. > SQL2: > ```with table_hive1 as(select * from db1.table_hive) > select * from table_hive1;``` > It work well. > I'm a little confused. Is this syntax with database name not supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17637595#comment-17637595 ] jingxiong zhong commented on SPARK-41229: - [~cloud_fan] Could you help me about this? > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Blocker > > SQL1: > ```with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;``` > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: bigdata_qa.zjx_hive1;`but spark in 2.4.3 work well. > SQL2: > ```with table_hive1 as(select * from db1.table_hive) > select * from table_hive1;``` > It work well. > I'm a little confused. Is this syntax with database name not supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
[ https://issues.apache.org/jira/browse/SPARK-41229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-41229: Description: SQL1: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: bigdata_qa.zjx_hive1;`but spark in 2.4.3 work well. SQL2: ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. was: ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: bigdata_qa.zjx_hive1;`but spark in 2.4.3 work well. ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. > When using `db_ name.temp_ table_name`, an exception will be thrown > --- > > Key: SPARK-41229 > URL: https://issues.apache.org/jira/browse/SPARK-41229 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive-ms 2.3.9 >Reporter: jingxiong zhong >Priority: Blocker > > SQL1: > ```with table_hive1 as(select * from db1.table_hive) > select * from db1.table_hive1;``` > It will throw exception `org.apache.spark.sql.AnalysisException: Table or > view not found: bigdata_qa.zjx_hive1;`but spark in 2.4.3 work well. > SQL2: > ```with table_hive1 as(select * from db1.table_hive) > select * from table_hive1;``` > It work well. > I'm a little confused. Is this syntax with database name not supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41229) When using `db_ name.temp_ table_name`, an exception will be thrown
jingxiong zhong created SPARK-41229: --- Summary: When using `db_ name.temp_ table_name`, an exception will be thrown Key: SPARK-41229 URL: https://issues.apache.org/jira/browse/SPARK-41229 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Environment: spark3.2.0 hadoop2.7.3 hive-ms 2.3.9 Reporter: jingxiong zhong ```with table_hive1 as(select * from db1.table_hive) select * from db1.table_hive1;``` It will throw exception `org.apache.spark.sql.AnalysisException: Table or view not found: bigdata_qa.zjx_hive1;`but spark in 2.4.3 work well. ```with table_hive1 as(select * from db1.table_hive) select * from table_hive1;``` It work well. I'm a little confused. Is this syntax with database name not supported. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40916) udf could not filter null value cause npe
[ https://issues.apache.org/jira/browse/SPARK-40916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-40916. - Resolution: Fixed add --conf spark.sql.subexpressionElimination.enabled=false > udf could not filter null value cause npe > - > > Key: SPARK-40916 > URL: https://issues.apache.org/jira/browse/SPARK-40916 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hadoop2.7.3 > hive2.3.9 >Reporter: jingxiong zhong >Priority: Critical > > ``` > select > t22.uid, > from > ( > SELECT > code, > count(distinct uid) cnt > FROM > ( > SELECT > uid, > code, > lng, > lat > FROM > ( > select > > riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) > as code, > uid, > lng, > lat, > dt as event_time > from > ( > select > param['timestamp'] as dt, > > get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid, > > get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lng') lng, > > get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lat') lat > from manhattan_ods.ods_log_manhattan_fbi_workflow_result_log > and > get_json_object(get_json_object(param['input'],'$.bizExtents'),'$.productId')='2001' > > )a > and lng is not null > and lat is not null > ) t2 > group by uid,code,lng,lat > ) t1 > GROUP BY code having count(DISTINCT uid)>=10 > )t11 > join > ( > SELECT > uid, > code, > lng, > lat > FROM > ( > select > > riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) > as code, > uid, > lng, > lat, > dt as event_time > from > ( > select > param['timestamp'] as dt, > > get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid, > > get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lng') lng, > > get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lat') lat > from manhattan_ods.ods_log_manhattan_fbi_workflow_result_log > and > get_json_object(get_json_object(param['input'],'$.bizExtents'),'$.productId')='2001' > > )a > and lng is not null > and lat is not null > ) t2 > where substr(code,0,6)<>'wx4ey3' > group by uid,code,lng,lat > ) t22 on t11.code=t22.code > group by t22.uid > ``` > this sql can't run because > `riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8)` > will throw npe(`Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > Unable to execute method public java.lang.String > com.xiaoju.automarket.GeohashEncode.evaluate(java.lang.Double,java.lang.Double,java.lang.Integer) > with arguments {null,null,8}:null`), but I have filter null in my condition, > the udf of manhattan_dw.aes_decode will return null if lng or lat is null, > *but after I remove `where substr(code,0,6)<>'wx4ey3' `this condition, it can > run normally.* > complete : > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to > execute method public java.lang.String > com.xiaoju.automarket.GeohashEncode.evaluate(java.lang.Double,java.lang.Double,java.lang.Integer) > with arguments {null,null,8}:null > at > org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1049) > at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:102) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_3$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown > Source) > at > org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:275) > at > org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:274) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:515) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown > Source) > at >
[jira] [Created] (SPARK-40916) udf could not filter null value cause npe
jingxiong zhong created SPARK-40916: --- Summary: udf could not filter null value cause npe Key: SPARK-40916 URL: https://issues.apache.org/jira/browse/SPARK-40916 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Environment: spark3.2.0 hadoop2.7.3 hive2.3.9 Reporter: jingxiong zhong ``` select t22.uid, from ( SELECT code, count(distinct uid) cnt FROM ( SELECT uid, code, lng, lat FROM ( select riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) as code, uid, lng, lat, dt as event_time from ( select param['timestamp'] as dt, get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lng') lng, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lat') lat from manhattan_ods.ods_log_manhattan_fbi_workflow_result_log and get_json_object(get_json_object(param['input'],'$.bizExtents'),'$.productId')='2001' )a and lng is not null and lat is not null ) t2 group by uid,code,lng,lat ) t1 GROUP BY code having count(DISTINCT uid)>=10 )t11 join ( SELECT uid, code, lng, lat FROM ( select riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8) as code, uid, lng, lat, dt as event_time from ( select param['timestamp'] as dt, get_json_object(get_json_object(param['input'],'$.baseInfo'),'$.uid') uid, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lng') lng, get_json_object(get_json_object(param['input'],'$.envInfo'),'$.lat') lat from manhattan_ods.ods_log_manhattan_fbi_workflow_result_log and get_json_object(get_json_object(param['input'],'$.bizExtents'),'$.productId')='2001' )a and lng is not null and lat is not null ) t2 where substr(code,0,6)<>'wx4ey3' group by uid,code,lng,lat ) t22 on t11.code=t22.code group by t22.uid ``` this sql can't run because `riskmanage_dw.GEOHASH_ENCODE(manhattan_dw.aes_decode(lng),manhattan_dw.aes_decode(lat),8)` will throw npe(`Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String com.xiaoju.automarket.GeohashEncode.evaluate(java.lang.Double,java.lang.Double,java.lang.Integer) with arguments {null,null,8}:null`), but I have filter null in my condition, the udf of manhattan_dw.aes_decode will return null if lng or lat is null, *but after I remove `where substr(code,0,6)<>'wx4ey3' `this condition, it can run normally.* complete : Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to execute method public java.lang.String com.xiaoju.automarket.GeohashEncode.evaluate(java.lang.Double,java.lang.Double,java.lang.Integer) with arguments {null,null,8}:null at org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:1049) at org.apache.spark.sql.hive.HiveSimpleUDF.eval(hiveUDFs.scala:102) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.subExpr_3$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate.eval(Unknown Source) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3(basicPhysicalOperators.scala:275) at org.apache.spark.sql.execution.FilterExec.$anonfun$doExecute$3$adapted(basicPhysicalOperators.scala:274) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:515) at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithKeys_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39967) Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed
[ https://issues.apache.org/jira/browse/SPARK-39967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-39967: Description: When counting the number of successful tasks in the stage of spark, spark uses the indicator of `tasksSuccessful`, but in fact, the success or failure of tasks is based on the array of `successful`. Through the log I added, it is found that the number of failed tasks counted by `tasksSuccessful` is inconsistent with the number of failures stored in the array of `successful`. We should take `successful` as the standard. (was: When counting the number of successful tasks in the stage of spark, spark uses the indicator of `tasksSuccessful`, but in fact, the success or failure of tasks is based on the array of `successful`. Through the log, it is found that the number of failed tasks counted by `tasksSuccessful` is inconsistent with the number of failures stored in the array of `successful`. We should take `successful` as the standard.) > Instead of using the scalar tasksSuccessful, use the successful array to > calculate whether the task is completed > > > Key: SPARK-39967 > URL: https://issues.apache.org/jira/browse/SPARK-39967 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.3, 2.4.6 >Reporter: jingxiong zhong >Priority: Critical > Attachments: spark1-1.png, spark2.png, spark3-1.png > > > When counting the number of successful tasks in the stage of spark, spark > uses the indicator of `tasksSuccessful`, but in fact, the success or failure > of tasks is based on the array of `successful`. Through the log I added, it > is found that the number of failed tasks counted by `tasksSuccessful` is > inconsistent with the number of failures stored in the array of `successful`. > We should take `successful` as the standard. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39967) Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed
[ https://issues.apache.org/jira/browse/SPARK-39967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-39967: Attachment: spark1-1.png spark2.png spark3-1.png > Instead of using the scalar tasksSuccessful, use the successful array to > calculate whether the task is completed > > > Key: SPARK-39967 > URL: https://issues.apache.org/jira/browse/SPARK-39967 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.3, 2.4.6 >Reporter: jingxiong zhong >Priority: Critical > Attachments: spark1-1.png, spark2.png, spark3-1.png > > > When counting the number of successful tasks in the stage of spark, spark > uses the indicator of `tasksSuccessful`, but in fact, the success or failure > of tasks is based on the array of `successful`. Through the log, it is found > that the number of failed tasks counted by `tasksSuccessful` is inconsistent > with the number of failures stored in the array of `successful`. We should > take `successful` as the standard. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39967) Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed
jingxiong zhong created SPARK-39967: --- Summary: Instead of using the scalar tasksSuccessful, use the successful array to calculate whether the task is completed Key: SPARK-39967 URL: https://issues.apache.org/jira/browse/SPARK-39967 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.6, 2.4.3 Reporter: jingxiong zhong When counting the number of successful tasks in the stage of spark, spark uses the indicator of `tasksSuccessful`, but in fact, the success or failure of tasks is based on the array of `successful`. Through the log, it is found that the number of failed tasks counted by `tasksSuccessful` is inconsistent with the number of failures stored in the array of `successful`. We should take `successful` as the standard. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39153) When we look at spark UI or History, we can see the failed tasks first
jingxiong zhong created SPARK-39153: --- Summary: When we look at spark UI or History, we can see the failed tasks first Key: SPARK-39153 URL: https://issues.apache.org/jira/browse/SPARK-39153 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Environment: spark 3.2.0 Reporter: jingxiong zhong Fix For: 3.2.0 When a task fails, users are more concerned about the causes of failed tasks and failed tasks. The Current Spark UI and History are sorted according to "Index" rather than "Errors". When a large number of tasks are sorted, you need to wait a certain period for tasks to be sorted. In order to find the cause of Errors for failed tasks, we can improve the user experience by specifying sorting by the "Errors" column at the beginning -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480381#comment-17480381 ] jingxiong zhong commented on SPARK-37677: - [~hyukjin.kwon] Hey sir, I brought up another one for this issue. This is the use of shell command is to decompress. > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong resolved SPARK-37708. - Resolution: Not A Problem > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-35715) Option "--files" with local:// prefix is not honoured for Spark on kubernetes
[ https://issues.apache.org/jira/browse/SPARK-35715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475135#comment-17475135 ] jingxiong zhong edited comment on SPARK-35715 at 1/13/22, 7:00 AM: --- It seems that spark 3 does not support the schema using local as the path. You can try file:///etc/xattr.conf was (Author: JIRAUSER281124): t seems that spark 3 does not support the schema using local as the path. You can try file:///etc/xattr.conf > Option "--files" with local:// prefix is not honoured for Spark on kubernetes > - > > Key: SPARK-35715 > URL: https://issues.apache.org/jira/browse/SPARK-35715 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.2, 3.1.2 >Reporter: Pardhu Madipalli >Priority: Major > > When we provide a local file as a dependency using "--files" option, the file > is not getting copied to work directories of executors. > h5. Example 1: > > {code:java} > $SPARK_HOME/bin/spark-submit --master k8s://https:// \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.pod.name=sparkdriverpod \ > --files local:///etc/xattr.conf \ > local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar 100 > {code} > > h6. Content of Spark Executor work-dir: > > {code:java} > ~$ kubectl exec -n default spark-pi-22de6279f6bec01c-exec-1 ls > /opt/spark/work-dir/ > spark-examples_2.12-3.1.2.jar > {code} > > We can notice here that the file _/etc/xattr.conf_ is *NOT* copied to > _/opt/spark/work-dir/ ._ > > > > {{Instead of using "–files", if we use "--jars" option the file is getting > copied as expected.}} > h5. Example 2: > {code:java} > $SPARK_HOME/bin/spark-submit --master k8s://https:// \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.pod.name=sparkdriverpod \ > --jars local:///etc/xattr.conf \ > local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar 100 > {code} > h6. Content of Spark Executor work-dir: > > {code:java} > ~$ kubectl exec -n default spark-pi-22de6279f6bec01c-exec-1 ls > /opt/spark/work-dir/ > spark-examples_2.12-3.1.2.jar > xattr.conf > {code} > We can notice here that the file _/etc/xattr.conf_ *IS COPIED* to > _/opt/spark/work-dir/ ._ > > I tested this with versions *3.1.2* and *3.0.2*. It is behaving the same way > in both cases. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35715) Option "--files" with local:// prefix is not honoured for Spark on kubernetes
[ https://issues.apache.org/jira/browse/SPARK-35715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17475135#comment-17475135 ] jingxiong zhong commented on SPARK-35715: - t seems that spark 3 does not support the schema using local as the path. You can try file:///etc/xattr.conf > Option "--files" with local:// prefix is not honoured for Spark on kubernetes > - > > Key: SPARK-35715 > URL: https://issues.apache.org/jira/browse/SPARK-35715 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.2, 3.1.2 >Reporter: Pardhu Madipalli >Priority: Major > > When we provide a local file as a dependency using "--files" option, the file > is not getting copied to work directories of executors. > h5. Example 1: > > {code:java} > $SPARK_HOME/bin/spark-submit --master k8s://https:// \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.pod.name=sparkdriverpod \ > --files local:///etc/xattr.conf \ > local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar 100 > {code} > > h6. Content of Spark Executor work-dir: > > {code:java} > ~$ kubectl exec -n default spark-pi-22de6279f6bec01c-exec-1 ls > /opt/spark/work-dir/ > spark-examples_2.12-3.1.2.jar > {code} > > We can notice here that the file _/etc/xattr.conf_ is *NOT* copied to > _/opt/spark/work-dir/ ._ > > > > {{Instead of using "–files", if we use "--jars" option the file is getting > copied as expected.}} > h5. Example 2: > {code:java} > $SPARK_HOME/bin/spark-submit --master k8s://https:// \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.pod.name=sparkdriverpod \ > --jars local:///etc/xattr.conf \ > local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar 100 > {code} > h6. Content of Spark Executor work-dir: > > {code:java} > ~$ kubectl exec -n default spark-pi-22de6279f6bec01c-exec-1 ls > /opt/spark/work-dir/ > spark-examples_2.12-3.1.2.jar > xattr.conf > {code} > We can notice here that the file _/etc/xattr.conf_ *IS COPIED* to > _/opt/spark/work-dir/ ._ > > I tested this with versions *3.1.2* and *3.0.2*. It is behaving the same way > in both cases. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17471014#comment-17471014 ] jingxiong zhong commented on SPARK-37708: - [~hyukjin.kwon]In the end, we found that the operating system was different and that python would not run in the image.If we use centos system, it can work normally > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17467251#comment-17467251 ] jingxiong zhong commented on SPARK-37677: - I found that unzip is implemented by decompressing files through Java's io. When we create an output file object, permission is given by default, so the permission of unzip's output file is consistent with the permission to create the file, which cannot be modified. Therefore, I try to give permission after the output file is created, Or you can directly save the permission information when reading the output file, but it need to modify the zip code here. > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465060#comment-17465060 ] jingxiong zhong edited comment on SPARK-37708 at 12/24/21, 5:28 PM: [~hyukjin.kwon] I found some packages downloaded from python3.9, such as pandas, NLTK.That would be conflict, because the operating system is different. Can I change the default operating system debian of dockerFile to Centos6/7? was (Author: JIRAUSER281124): [~hyukjin.kwon] I found some packages downloaded, such as pandas, NLTK.That would be conflict, because the operating system is different. Can I change the default operating system debian of dockerFile to Centos6/7? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465060#comment-17465060 ] jingxiong zhong edited comment on SPARK-37708 at 12/24/21, 5:24 PM: [~hyukjin.kwon] I found some packages downloaded, such as pandas, NLTK.That would be conflict, because the operating system is different. Can I change the default operating system debian of dockerFile to Centos6/7? was (Author: JIRAUSER281124): [~hyukjin.kwon] I found some packages downloaded, such as pandas, NLTK. Can I change the default operating system debian of dockerFile to Centos6/7? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465060#comment-17465060 ] jingxiong zhong commented on SPARK-37708: - [~hyukjin.kwon] I found some packages downloaded, such as pandas, NLTK. Can I change the default operating system debian of dockerFile to Centos6/7? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17465058#comment-17465058 ] jingxiong zhong commented on SPARK-37708: - I used wget to download and compile the source code, but it seems python3.6 is not supported by spark3.2 > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37708: Component/s: Kubernetes > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Critical > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37708: Description: I have a question about that how do I add my Python dependencies to Spark Job, as following {code:sh} spark-submit \ --archives s3a://path/python3.6.9.tgz#python3.6.9 \ --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ --conf "spark.pyspark.python=python3.6.9/bin/python3" \ --name "piroottest" \ ./examples/src/main/python/pi.py 10 {code} this can't run my job sucessfully,it throw error {code:sh} Traceback (most recent call last): File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in from pyspark.sql import SparkSession File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in async def _ag(): File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory {code} Or is there another way to add Python dependencies? was: I have a question about that how do I add my Python dependencies to Spark Job, as following spark-submit \ --archives s3a://path/python3.6.9.tgz#python3.6.9 \ --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ --conf "spark.pyspark.python=python3.6.9/bin/python3" \ --name "piroottest" \ ./examples/src/main/python/pi.py 10 this can't run my job sucessfully,it throw error Traceback (most recent call last): File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in from pyspark.sql import SparkSession File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in async def _ag(): File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory Or is there another way to add Python dependencies? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Critical > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37708: Description: I have a question about that how do I add my Python dependencies to Spark Job, as following spark-submit \ --archives s3a://path/python3.6.9.tgz#python3.6.9 \ --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ --conf "spark.pyspark.python=python3.6.9/bin/python3" \ --name "piroottest" \ ./examples/src/main/python/pi.py 10 this can't run my job sucessfully,it throw error Traceback (most recent call last): File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in from pyspark.sql import SparkSession File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in async def _ag(): File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory Or is there another way to add Python dependencies? > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Critical > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > this can't run my job sucessfully,it throw error > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37708) pyspark adding third-party Dependencies on k8s
jingxiong zhong created SPARK-37708: --- Summary: pyspark adding third-party Dependencies on k8s Key: SPARK-37708 URL: https://issues.apache.org/jira/browse/SPARK-37708 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.0 Environment: pyspark3.2 Reporter: jingxiong zhong -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26404) set spark.pyspark.python or PYSPARK_PYTHON doesn't work in k8s client-cluster mode.
[ https://issues.apache.org/jira/browse/SPARK-26404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463384#comment-17463384 ] jingxiong zhong edited comment on SPARK-26404 at 12/21/21, 5:52 PM: @gollum999Tim Sanders,hey sir, I have a question about that how do I add my Python dependencies to Spark Job, as following {code:sh} spark-submit \ --archives s3a://path/python3.6.9.tgz#python3.6.9 \ --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ --conf "spark.pyspark.python=python3.6.9/bin/python3" \ --name "piroottest" \ ./examples/src/main/python/pi.py 10 {code} this can't run my job sucessfully,it throw error {code:sh} Traceback (most recent call last): File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in from pyspark.sql import SparkSession File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in async def _ag(): File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory {code} Or is there another way to add Python dependencies? was (Author: JIRAUSER281124): @gollum999Tim Sanders,hey sir, I have a question about that how can I add my python dependency into spark job, as following {code:sh} spark-submit \ --archives s3a://path/python3.6.9.tgz#python3.6.9 \ --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ --conf "spark.pyspark.python=python3.6.9/bin/python3" \ --name "piroottest" \ ./examples/src/main/python/pi.py 10 {code} this can't run my job sucessfully,it throw error {code:sh} Traceback (most recent call last): File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in from pyspark.sql import SparkSession File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in async def _ag(): File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory {code} Or is there another way to add Python dependencies? > set spark.pyspark.python or PYSPARK_PYTHON doesn't work in k8s client-cluster > mode. > --- > > Key: SPARK-26404 > URL: https://issues.apache.org/jira/browse/SPARK-26404 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 2.4.0 >Reporter: Dongqing Liu >Priority: Major > > Neither > conf.set("spark.executorEnv.PYSPARK_PYTHON", "/opt/pythonenvs/bin/python") > nor > conf.set("spark.pyspark.python", "/opt/pythonenvs/bin/python") > works. > Looks like the executor always picks python from PATH. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26404) set spark.pyspark.python or PYSPARK_PYTHON doesn't work in k8s client-cluster mode.
[ https://issues.apache.org/jira/browse/SPARK-26404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17463384#comment-17463384 ] jingxiong zhong commented on SPARK-26404: - @gollum999Tim Sanders,hey sir, I have a question about that how can I add my python dependency into spark job, as following {code:sh} spark-submit \ --archives s3a://path/python3.6.9.tgz#python3.6.9 \ --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ --conf "spark.pyspark.python=python3.6.9/bin/python3" \ --name "piroottest" \ ./examples/src/main/python/pi.py 10 {code} this can't run my job sucessfully,it throw error {code:sh} Traceback (most recent call last): File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in from pyspark.sql import SparkSession File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, in File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in async def _ag(): File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", line 7, in from _ctypes import Union, Structure, Array ImportError: libffi.so.6: cannot open shared object file: No such file or directory {code} Or is there another way to add Python dependencies? > set spark.pyspark.python or PYSPARK_PYTHON doesn't work in k8s client-cluster > mode. > --- > > Key: SPARK-26404 > URL: https://issues.apache.org/jira/browse/SPARK-26404 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 2.4.0 >Reporter: Dongqing Liu >Priority: Major > > Neither > conf.set("spark.executorEnv.PYSPARK_PYTHON", "/opt/pythonenvs/bin/python") > nor > conf.set("spark.pyspark.python", "/opt/pythonenvs/bin/python") > works. > Looks like the executor always picks python from PATH. > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461800#comment-17461800 ] jingxiong zhong commented on SPARK-37677: - :D > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461799#comment-17461799 ] jingxiong zhong commented on SPARK-37677: - Let me have a try. > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677 ] jingxiong zhong deleted comment on SPARK-37677: - was (Author: JIRAUSER281124): :D > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
[ https://issues.apache.org/jira/browse/SPARK-37677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461797#comment-17461797 ] jingxiong zhong commented on SPARK-37677: - can we fix it throuth modify file permissions when the file is decompressed? > spark on k8s, when the user want to push python3.6.6.zip to the pod , but no > permission to execute > -- > > Key: SPARK-37677 > URL: https://issues.apache.org/jira/browse/SPARK-37677 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: jingxiong zhong >Priority: Major > > In cluster mode, I hava another question that when I unzip python3.6.6.zip in > pod , but no permission to execute, my execute operation as follows: > {code:sh} > spark-submit \ > --archives ./python3.6.6.zip#python3.6.6 \ > --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ > --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ > --conf spark.kubernetes.container.image.pullPolicy=Always \ > ./examples/src/main/python/pi.py 100 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461405#comment-17461405 ] jingxiong zhong commented on SPARK-36088: - @hyukjin.kwon I make an issue at https://issues.apache.org/jira/browse/SPARK-37677, I think I can fix it. > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37677) spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute
jingxiong zhong created SPARK-37677: --- Summary: spark on k8s, when the user want to push python3.6.6.zip to the pod , but no permission to execute Key: SPARK-37677 URL: https://issues.apache.org/jira/browse/SPARK-37677 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Environment: spark-3.2.0 Reporter: jingxiong zhong In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:sh} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452521#comment-17452521 ] jingxiong zhong edited comment on SPARK-37521 at 12/17/21, 10:45 AM: - The schema of metasotre's updated partition was not found in Hive when you execute {code:sql} 'create table updata_col_test1(a int) partitioned by (dt string); insert overwrite table updata_col_test1 partition(dt='20200101') values(1); insert overwrite table updata_col_test1 partition(dt='20200102') values(1); insert overwrite table updata_col_test1 partition(dt='20200103') values(1); alter table updata_col_test1 add columns (b int); insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); ' {code} result from two engine HIVE: hive> select * from bigdata_qa.updata_col_test1; OK updata_col_test1.a updata_col_test1.b updata_col_test1.dt 1 NULL 20200101 1 NULL 20200102 1 NULL 20200103 Time taken: 2.985 seconds, Fetched: 3 row(s) hive> desc bigdata_qa.updata_col_test1 partition(dt='20200101'); OK col_name data_type comment a int dt string # Partition Information # col_name data_type comment dt string Time taken: 6.469 seconds, Fetched: 7 row(s) SPARK: spark-sql> select * from bigdata_qa.updata_col_test1; a b dt 1 2 20200101 1 NULL 20200102 1 NULL 20200103 Time taken: 0.357 seconds, Fetched 3 row(s) spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101'); col_name data_type comment a int b int dt string # Partition Information # col_name data_type comment dt string Time taken: 0.196 seconds, Fetched 6 row(s) was (Author: JIRAUSER281124): The schema of metasotre's updated partition was not found in Hive when you execute 'create table updata_col_test1(a int) partitioned by (dt string); insert overwrite table updata_col_test1 partition(dt='20200101') values(1); insert overwrite table updata_col_test1 partition(dt='20200102') values(1); insert overwrite table updata_col_test1 partition(dt='20200103') values(1); alter table updata_col_test1 add columns (b int); insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); ' result from two engine HIVE: hive> select * from bigdata_qa.updata_col_test1; OK updata_col_test1.a updata_col_test1.b updata_col_test1.dt 1 NULL 20200101 1 NULL 20200102 1 NULL 20200103 Time taken: 2.985 seconds, Fetched: 3 row(s) hive> desc bigdata_qa.updata_col_test1 partition(dt='20200101'); OK col_name data_type comment a int dt string # Partition Information # col_name data_type comment dt string Time taken: 6.469 seconds, Fetched: 7 row(s) SPARK: spark-sql> select * from bigdata_qa.updata_col_test1; a b dt 1 2 20200101 1 NULL 20200102 1 NULL 20200103 Time taken: 0.357 seconds, Fetched 3 row(s) spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101'); col_name data_type comment a int b int dt string # Partition Information # col_name data_type comment dt string Time taken: 0.196 seconds, Fetched 6 row(s) > insert overwrite table but the partition information stored in Metastore was > not changed > > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Major > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9 > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1
[jira] [Comment Edited] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461230#comment-17461230 ] jingxiong zhong edited comment on SPARK-36088 at 12/17/21, 6:53 AM: In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:sh} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} was (Author: JIRAUSER281124): In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:shell} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36088) 'spark.archives' does not extract the archive file into the driver under client mode
[ https://issues.apache.org/jira/browse/SPARK-36088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461230#comment-17461230 ] jingxiong zhong commented on SPARK-36088: - In cluster mode, I hava another question that when I unzip python3.6.6.zip in pod , but no permission to execute, my execute operation as follows: {code:shell} spark-submit \ --archives ./python3.6.6.zip#python3.6.6 \ --conf "spark.pyspark.python=python3.6.6/python3.6.6/bin/python3" \ --conf "spark.pyspark.driver.python=python3.6.6/python3.6.6/bin/python3" \ --conf spark.kubernetes.container.image.pullPolicy=Always \ ./examples/src/main/python/pi.py 100 {code} > 'spark.archives' does not extract the archive file into the driver under > client mode > > > Key: SPARK-36088 > URL: https://issues.apache.org/jira/browse/SPARK-36088 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Submit >Affects Versions: 3.1.2 >Reporter: rickcheng >Priority: Major > > When running spark in the k8s cluster, there are 2 deploy modes: cluster and > client. After my test, in the cluster mode, *spark.archives* can extract the > archive file to the working directory of the executors and driver. But in > client mode, *spark.archives* can only extract the archive file to the > working directory of the executors. > > However, I need *spark.archives* to send the virtual environment tar file > packaged by conda to both the driver and executors under client mode (So that > the executor and the driver have the same python environment). > > Why *spark.archives* does not extract the archive file into the working > directory of the driver under client mode? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37521: Issue Type: Bug (was: New Bugzilla Project) > insert overwrite table but the partition information stored in Metastore was > not changed > > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Major > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9 > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1 partition(dt='20200104') values(1, > 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37521: Issue Type: New Bugzilla Project (was: Question) > insert overwrite table but the partition information stored in Metastore was > not changed > > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: New Bugzilla Project > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Major > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9 > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1 partition(dt='20200104') values(1, > 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37521) insert overwrite table but the partition information stored in Metastore was not changed
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37521: Summary: insert overwrite table but the partition information stored in Metastore was not changed (was: insert overwrite table but didn't change the message of metastore) > insert overwrite table but the partition information stored in Metastore was > not changed > > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Blocker > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9 > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1 partition(dt='20200104') values(1, > 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37521) insert overwrite table but didn't change the message of metastore
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17452521#comment-17452521 ] jingxiong zhong commented on SPARK-37521: - The schema of metasotre's updated partition was not found in Hive when you execute 'create table updata_col_test1(a int) partitioned by (dt string); insert overwrite table updata_col_test1 partition(dt='20200101') values(1); insert overwrite table updata_col_test1 partition(dt='20200102') values(1); insert overwrite table updata_col_test1 partition(dt='20200103') values(1); alter table updata_col_test1 add columns (b int); insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); ' result from two engine HIVE: hive> select * from bigdata_qa.updata_col_test1; OK updata_col_test1.a updata_col_test1.b updata_col_test1.dt 1 NULL 20200101 1 NULL 20200102 1 NULL 20200103 Time taken: 2.985 seconds, Fetched: 3 row(s) hive> desc bigdata_qa.updata_col_test1 partition(dt='20200101'); OK col_name data_type comment a int dt string # Partition Information # col_name data_type comment dt string Time taken: 6.469 seconds, Fetched: 7 row(s) SPARK: spark-sql> select * from bigdata_qa.updata_col_test1; a b dt 1 2 20200101 1 NULL 20200102 1 NULL 20200103 Time taken: 0.357 seconds, Fetched 3 row(s) spark-sql> desc bigdata_qa.updata_col_test1 partition(dt='20200101'); col_name data_type comment a int b int dt string # Partition Information # col_name data_type comment dt string Time taken: 0.196 seconds, Fetched 6 row(s) > insert overwrite table but didn't change the message of metastore > - > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Blocker > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9,whatever > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1 partition(dt='20200104') values(1, > 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37521) insert overwrite table but didn't change the message of metastore
[ https://issues.apache.org/jira/browse/SPARK-37521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jingxiong zhong updated SPARK-37521: Description: I create a partitioned table in SparkSQL, insert a data entry, add a regular field, and finally insert a new data entry into the partition,The query is normal in SparkSQL, but the return value of the newly inserted field is NULL in Hive 2.3.9 for example create table updata_col_test1(a int) partitioned by (dt string); insert overwrite table updata_col_test1 partition(dt='20200101') values(1); insert overwrite table updata_col_test1 partition(dt='20200102') values(1); insert overwrite table updata_col_test1 partition(dt='20200103') values(1); alter table updata_col_test1 add columns (b int); insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); fail insert overwrite table updata_col_test1 partition(dt='20200101') values(1, 2); fail insert overwrite table updata_col_test1 partition(dt='20200104') values(1, 2); sucessfully was: I create a partitioned table in SparkSQL, insert a data entry, add a regular field, and finally insert a new data entry into the partition,The query is normal in SparkSQL, but the return value of the newly inserted field is NULL in Hive 2.3.9,whatever for example create table updata_col_test1(a int) partitioned by (dt string); insert overwrite table updata_col_test1 partition(dt='20200101') values(1); insert overwrite table updata_col_test1 partition(dt='20200102') values(1); insert overwrite table updata_col_test1 partition(dt='20200103') values(1); alter table updata_col_test1 add columns (b int); insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); fail insert overwrite table updata_col_test1 partition(dt='20200101') values(1, 2); fail insert overwrite table updata_col_test1 partition(dt='20200104') values(1, 2); sucessfully > insert overwrite table but didn't change the message of metastore > - > > Key: SPARK-37521 > URL: https://issues.apache.org/jira/browse/SPARK-37521 > Project: Spark > Issue Type: Question > Components: SQL >Affects Versions: 3.2.0 > Environment: spark3.2.0 > hive2.3.9 > metastore2.3.9 >Reporter: jingxiong zhong >Priority: Blocker > > I create a partitioned table in SparkSQL, insert a data entry, add a regular > field, and finally insert a new data entry into the partition,The query is > normal in SparkSQL, but the return value of the newly inserted field is NULL > in Hive 2.3.9 > for example > create table updata_col_test1(a int) partitioned by (dt string); > insert overwrite table updata_col_test1 partition(dt='20200101') values(1); > insert overwrite table updata_col_test1 partition(dt='20200102') values(1); > insert overwrite table updata_col_test1 partition(dt='20200103') values(1); > alter table updata_col_test1 add columns (b int); > insert overwrite table updata_col_test1 partition(dt) values(1, 2, > '20200101'); fail > insert overwrite table updata_col_test1 partition(dt='20200101') values(1, > 2); fail > insert overwrite table updata_col_test1 partition(dt='20200104') values(1, > 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37521) insert overwrite table but didn't change the message of metastore
jingxiong zhong created SPARK-37521: --- Summary: insert overwrite table but didn't change the message of metastore Key: SPARK-37521 URL: https://issues.apache.org/jira/browse/SPARK-37521 Project: Spark Issue Type: Question Components: SQL Affects Versions: 3.2.0 Environment: spark3.2.0 hive2.3.9 metastore2.3.9 Reporter: jingxiong zhong I create a partitioned table in SparkSQL, insert a data entry, add a regular field, and finally insert a new data entry into the partition,The query is normal in SparkSQL, but the return value of the newly inserted field is NULL in Hive 2.3.9,whatever for example create table updata_col_test1(a int) partitioned by (dt string); insert overwrite table updata_col_test1 partition(dt='20200101') values(1); insert overwrite table updata_col_test1 partition(dt='20200102') values(1); insert overwrite table updata_col_test1 partition(dt='20200103') values(1); alter table updata_col_test1 add columns (b int); insert overwrite table updata_col_test1 partition(dt) values(1, 2, '20200101'); fail insert overwrite table updata_col_test1 partition(dt='20200101') values(1, 2); fail insert overwrite table updata_col_test1 partition(dt='20200104') values(1, 2); sucessfully -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org