[jira] [Created] (SPARK-44345) Only log unknown shuffle map output as error when shuffle migration disabled
Zhongwei Zhu created SPARK-44345: Summary: Only log unknown shuffle map output as error when shuffle migration disabled Key: SPARK-44345 URL: https://issues.apache.org/jira/browse/SPARK-44345 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.1 Reporter: Zhongwei Zhu When decommission and shuffle migration is enabled, there're lots of error message like `Asked to update map output for unknown shuffle` . As shuffle clean and unregister is done by ContextCleaner in an async way, when target block manager received the shuffle block from decommissioned block manager, then update shuffle location to map output tracker, but at that time, shuffle might have been unregistered. This should not consider as error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44339) spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have [SELECT]
[ https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741340#comment-17741340 ] Yuming Wang commented on SPARK-44339: - It seems it's cloudera spark issue. > spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable > to fetch table . Permission denied: user [AD user] does not > have [SELECT] privilege on [/] when reads hive view > > > Key: SPARK-44339 > URL: https://issues.apache.org/jira/browse/SPARK-44339 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Spark Submit >Affects Versions: 3.3.0 > Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation > enabled. >Reporter: Amar Gurung >Priority: Critical > > *Problem statement* > A hive view is created using beeline to restrict the users from accessing the > original hive table since the data contains sensitive information. > For illustration purpose, let's consider a sensitive table as emp_db.employee > with columns id, name, salary created through beeline by user '{*}userA{*}' > > {code:java} > create external table emp_db.employee (id int, name string, salary double) > location ''{code} > > A view is created using beeline by the same user '{*}userA{*}' > > {code:java} > ate view empview_db.emp_v as select id,name from emp_db.employee' {code} > > From Ranger UI, we define a policy under Hadoop SQL Policies that will let > '{*}userB{*}' to access database - empview_db and table - emp_v with SELECT > permission. > > *Steps to replicate* > # ssh to edge node where beeline is available using *userB* > # Try executing following queries > ## select * from emp_db.employee *;* > ## desc formatted empview_db.emp_v; > ## Above queries works fine without any issues. > # Now, try using spark3-shell using *userB* > {code:java} > # spark3-shell --deploy-mode client > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not > exist > Spark context Web UI available at http://xxx:4040 > Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.3.0.3.3.7180.0-274 > /_/ > > Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_181) > Type in expressions to have them evaluated. > Type :help for more information.scala> spark.table("empview_db.emp_v").schema > 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf > hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless > hive logic > Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table > employee. Permission denied: user [userB] does not have [SELECT] privilege on > [emp_db/employee] > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110) > at > org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66) > at > org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206) > at scala.Option.orElse(Option.scala:447) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205) > at scala.Option.orElse(Option.scala:447) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197) > at >
[jira] [Updated] (SPARK-44339) spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have [SELECT] p
[ https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Amar Gurung updated SPARK-44339: Summary: spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have [SELECT] privilege on [/] when reads hive view (was: spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have [SELECT] privilege on [/] to read hive view metadata ) > spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable > to fetch table . Permission denied: user [AD user] does not > have [SELECT] privilege on [/] when reads hive view > > > Key: SPARK-44339 > URL: https://issues.apache.org/jira/browse/SPARK-44339 > Project: Spark > Issue Type: Bug > Components: Spark Shell, Spark Submit >Affects Versions: 3.3.0 > Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation > enabled. >Reporter: Amar Gurung >Priority: Critical > > *Problem statement* > A hive view is created using beeline to restrict the users from accessing the > original hive table since the data contains sensitive information. > For illustration purpose, let's consider a sensitive table as emp_db.employee > with columns id, name, salary created through beeline by user '{*}userA{*}' > > {code:java} > create external table emp_db.employee (id int, name string, salary double) > location ''{code} > > A view is created using beeline by the same user '{*}userA{*}' > > {code:java} > ate view empview_db.emp_v as select id,name from emp_db.employee' {code} > > From Ranger UI, we define a policy under Hadoop SQL Policies that will let > '{*}userB{*}' to access database - empview_db and table - emp_v with SELECT > permission. > > *Steps to replicate* > # ssh to edge node where beeline is available using *userB* > # Try executing following queries > ## select * from emp_db.employee *;* > ## desc formatted empview_db.emp_v; > ## Above queries works fine without any issues. > # Now, try using spark3-shell using *userB* > {code:java} > # spark3-shell --deploy-mode client > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not > exist > Spark context Web UI available at http://xxx:4040 > Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx). > Spark session available as 'spark'. > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 3.3.0.3.3.7180.0-274 > /_/ > > Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java > 1.8.0_181) > Type in expressions to have them evaluated. > Type :help for more information.scala> spark.table("empview_db.emp_v").schema > 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf > hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless > hive logic > Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e > org.apache.spark.sql.AnalysisException: > org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table > employee. Permission denied: user [userB] does not have [SELECT] privilege on > [emp_db/employee] > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110) > at > org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877) > at > org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66) > at > org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311) > at > org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206) > at scala.Option.orElse(Option.scala:447) > at >
[jira] [Resolved] (SPARK-44321) Move ParseException to SQL/API
[ https://issues.apache.org/jira/browse/SPARK-44321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hövell resolved SPARK-44321. --- Fix Version/s: 3.5.0 Resolution: Fixed > Move ParseException to SQL/API > -- > > Key: SPARK-44321 > URL: https://issues.apache.org/jira/browse/SPARK-44321 > Project: Spark > Issue Type: New Feature > Components: Connect, SQL >Affects Versions: 3.5.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17740726#comment-17740726 ] koert kuipers edited comment on SPARK-37829 at 7/8/23 4:19 PM: --- since this (admittedly somewhat weird) behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. basically i am saying if one argues this was a breaking change in going from spark 2.x to 3.x then i agree but a major version can make a breaking change. introducing a fix in 3.4.1 that reverts that breaking change is basically introducing a breaking change going from 3.4.0 to 3.4.1 which is worse in my opinion. also expressionencoders are used for other purposes than dataset joins and now we find nulls popping up in places they should not. this is how i ran into this issue. was (Author: koert): since this (admittedly somewhat weird) behavior of returning a Row with null values has been present since spark 3.0.x (a major breaking release, and 3 years ago) i would argue this is the default behavior and this jira introduces a breaking change. also expressionencoders are used for other purposes than dataset joins and now we find nulls popping up in places they should not. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Assignee: Jason Xu >Priority: Major > Fix For: 3.3.3, 3.4.1, 3.5.0 > > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception
[ https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741309#comment-17741309 ] jeanlyn commented on SPARK-35635: - When the tasks are running concurrently, the "_temporary" will be attempted to be deleted multiple times, which may result in job failure. Is it more appropriate to reopen this issue? [~gurwls223] > concurrent insert statements from multiple beeline fail with job aborted > exception > -- > > Key: SPARK-35635 > URL: https://issues.apache.org/jira/browse/SPARK-35635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 >Reporter: Chetan Bhat >Priority: Minor > > Create tables - > CREATE TABLE J1_TBL ( > i integer, > j integer, > t string > ) USING parquet; > CREATE TABLE J2_TBL ( > i integer, > k integer > ) USING parquet; > From 4 concurrent beeline sessions execute the insert into select queries - > INSERT INTO J1_TBL VALUES (1, 4, 'one'); > INSERT INTO J1_TBL VALUES (2, 3, 'two'); > INSERT INTO J1_TBL VALUES (3, 2, 'three'); > INSERT INTO J1_TBL VALUES (4, 1, 'four'); > INSERT INTO J1_TBL VALUES (5, 0, 'five'); > INSERT INTO J1_TBL VALUES (6, 6, 'six'); > INSERT INTO J1_TBL VALUES (7, 7, 'seven'); > INSERT INTO J1_TBL VALUES (8, 8, 'eight'); > INSERT INTO J1_TBL VALUES (0, NULL, 'zero'); > INSERT INTO J1_TBL VALUES (NULL, NULL, 'null'); > INSERT INTO J1_TBL VALUES (NULL, 0, 'zero'); > INSERT INTO J2_TBL VALUES (1, -1); > INSERT INTO J2_TBL VALUES (2, 2); > INSERT INTO J2_TBL VALUES (3, -3); > INSERT INTO J2_TBL VALUES (2, 4); > INSERT INTO J2_TBL VALUES (5, -5); > INSERT INTO J2_TBL VALUES (5, -5); > INSERT INTO J2_TBL VALUES (0, NULL); > INSERT INTO J2_TBL VALUES (NULL, NULL); > INSERT INTO J2_TBL VALUES (NULL, 0); > > Issue : concurrent insert statements from multiple beeline fail with job > aborted exception. > 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, > 'eight'); > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown > Source) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) > at
[jira] [Commented] (SPARK-35635) concurrent insert statements from multiple beeline fail with job aborted exception
[ https://issues.apache.org/jira/browse/SPARK-35635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741296#comment-17741296 ] jeanlyn commented on SPARK-35635: - We encounter the same issue when concurrent writing in deference partition on same table. > concurrent insert statements from multiple beeline fail with job aborted > exception > -- > > Key: SPARK-35635 > URL: https://issues.apache.org/jira/browse/SPARK-35635 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1 > Environment: Spark 3.1.1 >Reporter: Chetan Bhat >Priority: Minor > > Create tables - > CREATE TABLE J1_TBL ( > i integer, > j integer, > t string > ) USING parquet; > CREATE TABLE J2_TBL ( > i integer, > k integer > ) USING parquet; > From 4 concurrent beeline sessions execute the insert into select queries - > INSERT INTO J1_TBL VALUES (1, 4, 'one'); > INSERT INTO J1_TBL VALUES (2, 3, 'two'); > INSERT INTO J1_TBL VALUES (3, 2, 'three'); > INSERT INTO J1_TBL VALUES (4, 1, 'four'); > INSERT INTO J1_TBL VALUES (5, 0, 'five'); > INSERT INTO J1_TBL VALUES (6, 6, 'six'); > INSERT INTO J1_TBL VALUES (7, 7, 'seven'); > INSERT INTO J1_TBL VALUES (8, 8, 'eight'); > INSERT INTO J1_TBL VALUES (0, NULL, 'zero'); > INSERT INTO J1_TBL VALUES (NULL, NULL, 'null'); > INSERT INTO J1_TBL VALUES (NULL, 0, 'zero'); > INSERT INTO J2_TBL VALUES (1, -1); > INSERT INTO J2_TBL VALUES (2, 2); > INSERT INTO J2_TBL VALUES (3, -3); > INSERT INTO J2_TBL VALUES (2, 4); > INSERT INTO J2_TBL VALUES (5, -5); > INSERT INTO J2_TBL VALUES (5, -5); > INSERT INTO J2_TBL VALUES (0, NULL); > INSERT INTO J2_TBL VALUES (NULL, NULL); > INSERT INTO J2_TBL VALUES (NULL, 0); > > Issue : concurrent insert statements from multiple beeline fail with job > aborted exception. > 0: jdbc:hive2://10.19.89.222:23040/> INSERT INTO J1_TBL VALUES (8, 8, > 'eight'); > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:366) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$2(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3$$Lambda$1781/750578465.apply$mcV$sp(Unknown > Source) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:45) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:263) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:258) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:272) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.spark.SparkException: Job aborted. > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:231) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:188) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:109) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:107) > at > org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:121) > at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228) > at org.apache.spark.sql.Dataset$$Lambda$1650/1168893915.apply(Unknown Source) > at
[jira] [Created] (SPARK-44344) Prepare RowEncoder for the move to sql/api
Herman van Hövell created SPARK-44344: - Summary: Prepare RowEncoder for the move to sql/api Key: SPARK-44344 URL: https://issues.apache.org/jira/browse/SPARK-44344 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 3.4.1 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44331) Add bitmap functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-44331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741287#comment-17741287 ] BingKun Pan commented on SPARK-44331: - I work on it. > Add bitmap functions to Scala and Python > > > Key: SPARK-44331 > URL: https://issues.apache.org/jira/browse/SPARK-44331 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > * bitmap_bucket_number > * bitmap_bit_position > * bitmap_construct_agg > * bitmap_count > * bitmap_or_agg -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44343) Separate encoder inference from expression encoder generation in ScalaReflection
Herman van Hövell created SPARK-44343: - Summary: Separate encoder inference from expression encoder generation in ScalaReflection Key: SPARK-44343 URL: https://issues.apache.org/jira/browse/SPARK-44343 Project: Spark Issue Type: New Feature Components: Connect, SQL Affects Versions: 3.4.1 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44329) Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-44329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741285#comment-17741285 ] BingKun Pan commented on SPARK-44329: - I work on it > Add hll_sketch_agg, hll_union_agg, to_varchar, try_aes_decrypt to Scala and > Python > -- > > Key: SPARK-44329 > URL: https://issues.apache.org/jira/browse/SPARK-44329 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44332) Fix the sorting error of Executor ID Column on Executors UI Page
[ https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-44332: Summary: Fix the sorting error of Executor ID Column on Executors UI Page (was: Fix the sorting error of Executor ID Column on Executor Page) > Fix the sorting error of Executor ID Column on Executors UI Page > > > Key: SPARK-44332 > URL: https://issues.apache.org/jira/browse/SPARK-44332 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44332) Fix the sorting error of Executor ID Column on Executor Page
[ https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-44332: Summary: Fix the sorting error of Executor ID Column on Executor Page (was: Fix the sorting error of ExecutorID on Executor Page) > Fix the sorting error of Executor ID Column on Executor Page > > > Key: SPARK-44332 > URL: https://issues.apache.org/jira/browse/SPARK-44332 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44332) Fix the sorting error of ExecutorID on Executor Page
[ https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-44332: Summary: Fix the sorting error of ExecutorID on Executor Page (was: The Executor ID should start with 1 when running on Spark cluster of [N, cores, memory] locally mode) > Fix the sorting error of ExecutorID on Executor Page > > > Key: SPARK-44332 > URL: https://issues.apache.org/jira/browse/SPARK-44332 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData
[ https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741258#comment-17741258 ] ASF GitHub Bot commented on SPARK-44342: User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41900 > Replace SQLContext with SparkSession for GenTPCDSData > - > > Key: SPARK-44342 > URL: https://issues.apache.org/jira/browse/SPARK-44342 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The SQLContext is an old API for Spark SQL. > But GenTPCDSData still use it directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData
[ https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741259#comment-17741259 ] ASF GitHub Bot commented on SPARK-44342: User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41900 > Replace SQLContext with SparkSession for GenTPCDSData > - > > Key: SPARK-44342 > URL: https://issues.apache.org/jira/browse/SPARK-44342 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The SQLContext is an old API for Spark SQL. > But GenTPCDSData still use it directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData
[ https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-44342: --- Description: The SQLContext is an old API for Spark SQL. But > Replace SQLContext with SparkSession for GenTPCDSData > - > > Key: SPARK-44342 > URL: https://issues.apache.org/jira/browse/SPARK-44342 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The SQLContext is an old API for Spark SQL. > But -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData
[ https://issues.apache.org/jira/browse/SPARK-44342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-44342: --- Description: The SQLContext is an old API for Spark SQL. But GenTPCDSData still use it directly. was: The SQLContext is an old API for Spark SQL. But > Replace SQLContext with SparkSession for GenTPCDSData > - > > Key: SPARK-44342 > URL: https://issues.apache.org/jira/browse/SPARK-44342 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The SQLContext is an old API for Spark SQL. > But GenTPCDSData still use it directly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44342) Replace SQLContext with SparkSession for GenTPCDSData
jiaan.geng created SPARK-44342: -- Summary: Replace SQLContext with SparkSession for GenTPCDSData Key: SPARK-44342 URL: https://issues.apache.org/jira/browse/SPARK-44342 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec
[ https://issues.apache.org/jira/browse/SPARK-44341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741251#comment-17741251 ] jiaan.geng commented on SPARK-44341: I'm working on. > Define the computing logic through PartitionEvaluator API and use it in > WindowExec > -- > > Key: SPARK-44341 > URL: https://issues.apache.org/jira/browse/SPARK-44341 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > Define the computing logic through PartitionEvaluator API and use it in > WindowExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44341) Define the computing logic through PartitionEvaluator API and use it in WindowExec
jiaan.geng created SPARK-44341: -- Summary: Define the computing logic through PartitionEvaluator API and use it in WindowExec Key: SPARK-44341 URL: https://issues.apache.org/jira/browse/SPARK-44341 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: jiaan.geng Define the computing logic through PartitionEvaluator API and use it in WindowExec -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API
[ https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17741246#comment-17741246 ] Ruifeng Zheng commented on SPARK-43907: --- [~beliefer][~LuciferYang][~panbingkun][~ivoson] many thanks for your contribution! I just go though all registered sql functions, and create subtask 23 & 24 for newly added functions, feel free to pick up. > Add SQL functions into Scala, Python and R API > -- > > Key: SPARK-43907 > URL: https://issues.apache.org/jira/browse/SPARK-43907 > Project: Spark > Issue Type: Umbrella > Components: PySpark, SparkR, SQL >Affects Versions: 3.5.0 >Reporter: Hyukjin Kwon >Priority: Major > > See the discussion in dev mailing list > (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg). > This is an umbrella JIRA to implement all SQL functions in Scala, Python and R -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org