[jira] [Created] (SPARK-44345) Only log unknown shuffle map output as error when shuffle migration disabled

2023-07-08 Thread Zhongwei Zhu (Jira)
Zhongwei Zhu created SPARK-44345:


 Summary: Only log unknown shuffle map output as error when shuffle 
migration disabled
 Key: SPARK-44345
 URL: https://issues.apache.org/jira/browse/SPARK-44345
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Zhongwei Zhu


When decommission and shuffle migration is enabled, there're lots of error 
message like `Asked to update map output for unknown shuffle` . 

As shuffle clean and unregister is done by ContextCleaner in an async way, when 
target block manager received the shuffle block from decommissioned block 
manager, then update shuffle location to map output tracker, but at that time, 
shuffle might have been unregistered. This should not consider as error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44339) spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have [SELECT]

2023-07-08 Thread Yuming Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741340#comment-17741340
 ] 

Yuming Wang commented on SPARK-44339:
-

It seems it's cloudera spark issue.

> spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable 
> to fetch table . Permission denied: user [AD user] does not 
> have [SELECT] privilege on [/] when reads hive view 
> 
>
> Key: SPARK-44339
> URL: https://issues.apache.org/jira/browse/SPARK-44339
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.3.0
> Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
> enabled.
>Reporter: Amar Gurung
>Priority: Critical
>
> *Problem statement* 
> A hive view is created using beeline to restrict the users from accessing the 
> original hive table since the data contains sensitive information. 
> For illustration purpose, let's consider a sensitive table as emp_db.employee 
> with columns id, name, salary created through beeline by user '{*}userA{*}'
>  
> {code:java}
> create external table emp_db.employee (id int, name string, salary double) 
> location ''{code}
>  
> A view is created using beeline by the same user '{*}userA{*}'
>  
> {code:java}
> ate view empview_db.emp_v  as select id,name from emp_db.employee' {code}
>  
> From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
> '{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
> permission.
>  
> *Steps to replicate* 
>  # ssh to edge node where beeline is available using *userB*
>  # Try executing following queries
>  ## select * from emp_db.employee  *;*
>  ## desc formatted empview_db.emp_v;
>  ## Above queries works fine without any issues.
>  # Now, try using spark3-shell using *userB* 
> {code:java}
> # spark3-shell --deploy-mode client
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
> exist
> Spark context Web UI available at http://xxx:4040
> Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
>       /_/
>          
> Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> spark.table("empview_db.emp_v").schema
> 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf 
> hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
> hive logic
> Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> employee. Permission denied: user [userB] does not have [SELECT] privilege on 
> [emp_db/employee]
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197)
>   at 
> 

[jira] [Updated] (SPARK-44339) spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have [SELECT] p

2023-07-08 Thread Amar Gurung (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Gurung updated SPARK-44339:

Summary: spark3-shell errors 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
. Permission denied: user [AD user] does not have [SELECT] 
privilege on [/] when reads hive view   (was: 
spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: 
Unable to fetch table . Permission denied: user [AD user] does 
not have [SELECT] privilege on [/] to read hive view 
metadata )

> spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable 
> to fetch table . Permission denied: user [AD user] does not 
> have [SELECT] privilege on [/] when reads hive view 
> 
>
> Key: SPARK-44339
> URL: https://issues.apache.org/jira/browse/SPARK-44339
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.3.0
> Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
> enabled.
>Reporter: Amar Gurung
>Priority: Critical
>
> *Problem statement* 
> A hive view is created using beeline to restrict the users from accessing the 
> original hive table since the data contains sensitive information. 
> For illustration purpose, let's consider a sensitive table as emp_db.employee 
> with columns id, name, salary created through beeline by user '{*}userA{*}'
>  
> {code:java}
> create external table emp_db.employee (id int, name string, salary double) 
> location ''{code}
>  
> A view is created using beeline by the same user '{*}userA{*}'
>  
> {code:java}
> ate view empview_db.emp_v  as select id,name from emp_db.employee' {code}
>  
> From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
> '{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
> permission.
>  
> *Steps to replicate* 
>  # ssh to edge node where beeline is available using *userB*
>  # Try executing following queries
>  ## select * from emp_db.employee  *;*
>  ## desc formatted empview_db.emp_v;
>  ## Above queries works fine without any issues.
>  # Now, try using spark3-shell using *userB* 
> {code:java}
> # spark3-shell --deploy-mode client
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
> exist
> Spark context Web UI available at http://xxx:4040
> Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
>       /_/
>          
> Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> spark.table("empview_db.emp_v").schema
> 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf 
> hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
> hive logic
> Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> employee. Permission denied: user [userB] does not have [SELECT] privilege on 
> [emp_db/employee]
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> 

[jira] [Resolved] (SPARK-44321) Move ParseException to SQL/API

2023-07-08 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44321.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Move ParseException to SQL/API
> --
>
> Key: SPARK-44321
> URL: https://issues.apache.org/jira/browse/SPARK-44321
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2023-07-08 Thread koert kuipers (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740726#comment-17740726
 ] 

koert kuipers edited comment on SPARK-37829 at 7/8/23 4:19 PM:
---

since this (admittedly somewhat weird) behavior of returning a Row with null 
values has been present since spark 3.0.x (a major breaking release, and 3 
years ago) i would argue this is the default behavior and this jira introduces 
a breaking change.

basically i am saying if one argues this was a breaking change in going from 
spark 2.x to 3.x then i agree but a major version can make a breaking change. 
introducing a fix in 3.4.1 that reverts that breaking change is basically 
introducing a breaking change going from 3.4.0 to 3.4.1 which is worse in my 
opinion.

also expressionencoders are used for other purposes than dataset joins and now 
we find nulls popping up in places they should not. this is how i ran into this 
issue.


was (Author: koert):
since this (admittedly somewhat weird) behavior of returning a Row with null 
values has been present since spark 3.0.x (a major breaking release, and 3 
years ago) i would argue this is the default behavior and this jira introduces 
a breaking change.

also expressionencoders are used for other purposes than dataset joins and now 
we find nulls popping up in places they should not.

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Assignee: Jason Xu
>Priority: Major
> Fix For: 3.3.3, 3.4.1, 3.5.0
>
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception

2023-07-08 Thread jeanlyn (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741309#comment-17741309
 ] 

jeanlyn commented on SPARK-35635:
-

When the tasks are running concurrently, the "_temporary" will be attempted to 
be deleted multiple times, which may result in job failure. Is it more 
appropriate to reopen this issue? [~gurwls223] 

> concurrent insert statements from multiple beeline fail with job aborted 
> exception
> --
>
> Key: SPARK-35635
> URL: https://issues.apache.org/jira/browse/SPARK-35635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> Create tables - 
> CREATE TABLE J1_TBL (
>  i integer,
>  j integer,
>  t string
> ) USING parquet;
> CREATE TABLE J2_TBL (
>  i integer,
>  k integer
> ) USING parquet;
> From 4 concurrent beeline sessions execute the insert into select queries - 
> INSERT INTO J1_TBL VALUES (1, 4, 'one');
> INSERT INTO J1_TBL VALUES (2, 3, 'two');
> INSERT INTO J1_TBL VALUES (3, 2, 'three');
> INSERT INTO J1_TBL VALUES (4, 1, 'four');
> INSERT INTO J1_TBL VALUES (5, 0, 'five');
> INSERT INTO J1_TBL VALUES (6, 6, 'six');
> INSERT INTO J1_TBL VALUES (7, 7, 'seven');
> INSERT INTO J1_TBL VALUES (8, 8, 'eight');
> INSERT INTO J1_TBL VALUES (0, NULL, 'zero');
> INSERT INTO J1_TBL VALUES (NULL, NULL, 'null');
> INSERT INTO J1_TBL VALUES (NULL, 0, 'zero');
> INSERT INTO J2_TBL VALUES (1, -1);
> INSERT INTO J2_TBL VALUES (2, 2);
> INSERT INTO J2_TBL VALUES (3, -3);
> INSERT INTO J2_TBL VALUES (2, 4);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (0, NULL);
> INSERT INTO J2_TBL VALUES (NULL, NULL);
> INSERT INTO J2_TBL VALUES (NULL, 0);
>  
> Issue : concurrent insert statements from multiple beeline fail with job 
> aborted exception.
> 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, 
> 'eight');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown
>  Source)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>  at 

[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception

2023-07-08 Thread jeanlyn (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741296#comment-17741296
 ] 

jeanlyn commented on SPARK-35635:
-

We encounter the same issue when concurrent writing in deference partition on 
same table.

> concurrent insert statements from multiple beeline fail with job aborted 
> exception
> --
>
> Key: SPARK-35635
> URL: https://issues.apache.org/jira/browse/SPARK-35635
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.1
> Environment: Spark 3.1.1
>Reporter: Chetan Bhat
>Priority: Minor
>
> Create tables - 
> CREATE TABLE J1_TBL (
>  i integer,
>  j integer,
>  t string
> ) USING parquet;
> CREATE TABLE J2_TBL (
>  i integer,
>  k integer
> ) USING parquet;
> From 4 concurrent beeline sessions execute the insert into select queries - 
> INSERT INTO J1_TBL VALUES (1, 4, 'one');
> INSERT INTO J1_TBL VALUES (2, 3, 'two');
> INSERT INTO J1_TBL VALUES (3, 2, 'three');
> INSERT INTO J1_TBL VALUES (4, 1, 'four');
> INSERT INTO J1_TBL VALUES (5, 0, 'five');
> INSERT INTO J1_TBL VALUES (6, 6, 'six');
> INSERT INTO J1_TBL VALUES (7, 7, 'seven');
> INSERT INTO J1_TBL VALUES (8, 8, 'eight');
> INSERT INTO J1_TBL VALUES (0, NULL, 'zero');
> INSERT INTO J1_TBL VALUES (NULL, NULL, 'null');
> INSERT INTO J1_TBL VALUES (NULL, 0, 'zero');
> INSERT INTO J2_TBL VALUES (1, -1);
> INSERT INTO J2_TBL VALUES (2, 2);
> INSERT INTO J2_TBL VALUES (3, -3);
> INSERT INTO J2_TBL VALUES (2, 4);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (5, -5);
> INSERT INTO J2_TBL VALUES (0, NULL);
> INSERT INTO J2_TBL VALUES (NULL, NULL);
> INSERT INTO J2_TBL VALUES (NULL, 0);
>  
> Issue : concurrent insert statements from multiple beeline fail with job 
> aborted exception.
> 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, 
> 'eight');
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown
>  Source)
>  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
>  at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Job aborted.
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121)
>  at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
>  at org.apache.spark.sql.Dataset$$Lambda$1650/1168893915.apply(Unknown Source)
>  at 

[jira] [Created] (SPARK-44344) Prepare RowEncoder for the move to sql/api

2023-07-08 Thread Jira
Herman van Hövell created SPARK-44344:
-

 Summary: Prepare RowEncoder for the move to sql/api
 Key: SPARK-44344
 URL: https://issues.apache.org/jira/browse/SPARK-44344
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.4.1
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44331) Add bitmap functions to Scala and Python

2023-07-08 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741287#comment-17741287
 ] 

BingKun Pan commented on SPARK-44331:
-

I work on it.

> Add bitmap functions to Scala and Python
> 
>
> Key: SPARK-44331
> URL: https://issues.apache.org/jira/browse/SPARK-44331
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> * bitmap_bucket_number
> * bitmap_bit_position
> * bitmap_construct_agg
> * bitmap_count
> * bitmap_or_agg



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44343) Separate encoder inference from expression encoder generation in ScalaReflection

2023-07-08 Thread Jira
Herman van Hövell created SPARK-44343:
-

 Summary: Separate encoder inference from expression encoder 
generation in ScalaReflection
 Key: SPARK-44343
 URL: https://issues.apache.org/jira/browse/SPARK-44343
 Project: Spark
  Issue Type: New Feature
  Components: Connect, SQL
Affects Versions: 3.4.1
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44329) Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and Python

2023-07-08 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741285#comment-17741285
 ] 

BingKun Pan commented on SPARK-44329:
-

I work on it

> Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and 
> Python
> --
>
> Key: SPARK-44329
> URL: https://issues.apache.org/jira/browse/SPARK-44329
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44332) Fix the sorting error of Executor ID Column on Executors UI Page

2023-07-08 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44332:

Summary: Fix the sorting error of Executor ID Column on Executors UI Page  
(was: Fix the sorting error of Executor ID Column on Executor Page)

> Fix the sorting error of Executor ID Column on Executors UI Page
> 
>
> Key: SPARK-44332
> URL: https://issues.apache.org/jira/browse/SPARK-44332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44332) Fix the sorting error of Executor ID Column on Executor Page

2023-07-08 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44332:

Summary: Fix the sorting error of Executor ID Column on Executor Page  
(was: Fix the sorting error of ExecutorID on Executor Page)

> Fix the sorting error of Executor ID Column on Executor Page
> 
>
> Key: SPARK-44332
> URL: https://issues.apache.org/jira/browse/SPARK-44332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44332) Fix the sorting error of ExecutorID on Executor Page

2023-07-08 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-44332:

Summary: Fix the sorting error of ExecutorID on Executor Page  (was: The 
Executor ID should start with 1 when running on Spark cluster of [N, cores, 
memory] locally mode)

> Fix the sorting error of ExecutorID on Executor Page
> 
>
> Key: SPARK-44332
> URL: https://issues.apache.org/jira/browse/SPARK-44332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741258#comment-17741258
 ] 

ASF GitHub Bot commented on SPARK-44342:


User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41900

> Replace SQLContext with SparkSession for GenTPCDSData
> -
>
> Key: SPARK-44342
> URL: https://issues.apache.org/jira/browse/SPARK-44342
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQLContext is an old API for Spark SQL.
> But GenTPCDSData still use it directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741259#comment-17741259
 ] 

ASF GitHub Bot commented on SPARK-44342:


User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41900

> Replace SQLContext with SparkSession for GenTPCDSData
> -
>
> Key: SPARK-44342
> URL: https://issues.apache.org/jira/browse/SPARK-44342
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQLContext is an old API for Spark SQL.
> But GenTPCDSData still use it directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44342:
---
Description: 
The SQLContext is an old API for Spark SQL.
But 

> Replace SQLContext with SparkSession for GenTPCDSData
> -
>
> Key: SPARK-44342
> URL: https://issues.apache.org/jira/browse/SPARK-44342
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQLContext is an old API for Spark SQL.
> But 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44342:
---
Description: 
The SQLContext is an old API for Spark SQL.
But GenTPCDSData still use it directly.

  was:
The SQLContext is an old API for Spark SQL.
But 


> Replace SQLContext with SparkSession for GenTPCDSData
> -
>
> Key: SPARK-44342
> URL: https://issues.apache.org/jira/browse/SPARK-44342
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The SQLContext is an old API for Spark SQL.
> But GenTPCDSData still use it directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData

2023-07-08 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44342:
--

 Summary: Replace SQLContext with SparkSession for GenTPCDSData
 Key: SPARK-44342
 URL: https://issues.apache.org/jira/browse/SPARK-44342
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec

2023-07-08 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741251#comment-17741251
 ] 

jiaan.geng commented on SPARK-44341:


I'm working on.

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowExec
> --
>
> Key: SPARK-44341
> URL: https://issues.apache.org/jira/browse/SPARK-44341
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> WindowExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec

2023-07-08 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44341:
--

 Summary: Define the computing logic through PartitionEvaluator API 
and use it in WindowExec
 Key: SPARK-44341
 URL: https://issues.apache.org/jira/browse/SPARK-44341
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng


Define the computing logic through PartitionEvaluator API and use it in 
WindowExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API

2023-07-08 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741246#comment-17741246
 ] 

Ruifeng Zheng commented on SPARK-43907:
---

[~beliefer][~LuciferYang][~panbingkun][~ivoson] many thanks for your 
contribution!
I just go though all registered sql functions, and create subtask 23 & 24 for 
newly added functions, feel free to pick up.

> Add SQL functions into Scala, Python and R API
> --
>
> Key: SPARK-43907
> URL: https://issues.apache.org/jira/browse/SPARK-43907
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See the discussion in dev mailing list 
> (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg).
> This is an umbrella JIRA to implement all SQL functions in Scala, Python and R



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org