[jira] [Resolved] (SPARK-44327) Add functions any and len to Scala

2023-07-07 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-44327.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41886
[https://github.com/apache/spark/pull/41886]

> Add functions any and len to Scala
> --
>
> Key: SPARK-44327
> URL: https://issues.apache.org/jira/browse/SPARK-44327
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44333) Move EnhancedLogicalPlan out of ParserUtils

2023-07-07 Thread Jira
Herman van Hövell created SPARK-44333:
-

 Summary: Move EnhancedLogicalPlan out of ParserUtils
 Key: SPARK-44333
 URL: https://issues.apache.org/jira/browse/SPARK-44333
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.1
Reporter: Herman van Hövell
Assignee: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44309) Display Add/Remove Time of Executors on ExecutorsTab

2023-07-07 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-44309.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41861
[https://github.com/apache/spark/pull/41861]

> Display Add/Remove Time of Executors on ExecutorsTab
> 
>
> Key: SPARK-44309
> URL: https://issues.apache.org/jira/browse/SPARK-44309
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44309) Display Add/Remove Time of Executors on ExecutorsTab

2023-07-07 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-44309:


Assignee: Kent Yao

> Display Add/Remove Time of Executors on ExecutorsTab
> 
>
> Key: SPARK-44309
> URL: https://issues.apache.org/jira/browse/SPARK-44309
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44332) The Executor ID should start with 1 when running on Spark cluster of [N, cores, memory] locally mode

2023-07-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740956#comment-17740956
 ] 

ASF GitHub Bot commented on SPARK-44332:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41887

> The Executor ID should start with 1 when running on Spark cluster of [N, 
> cores, memory] locally mode
> 
>
> Key: SPARK-44332
> URL: https://issues.apache.org/jira/browse/SPARK-44332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44334) Status of execution w/ error and w/o jobs shall be FAILED not COMPLETED

2023-07-07 Thread Kent Yao (Jira)
Kent Yao created SPARK-44334:


 Summary: Status of execution w/ error and w/o jobs shall be FAILED 
not COMPLETED
 Key: SPARK-44334
 URL: https://issues.apache.org/jira/browse/SPARK-44334
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.4.1, 3.3.2, 3.5.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44263) Allow ChannelBuilder extensions -- Scala

2023-07-07 Thread GridGain Integration (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17740973#comment-17740973
 ] 

GridGain Integration commented on SPARK-44263:
--

User 'cdkrot' has created a pull request for this issue:
https://github.com/apache/spark/pull/41880

> Allow ChannelBuilder extensions -- Scala
> 
>
> Key: SPARK-44263
> URL: https://issues.apache.org/jira/browse/SPARK-44263
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Alice Sayutina
>Priority: Major
>
> Follow up to https://issues.apache.org/jira/browse/SPARK-43332
> Provide similar extension capabilities in Scala



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44322) Make parser use SqlApiConf

2023-07-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44322.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Make parser use SqlApiConf
> --
>
> Key: SPARK-44322
> URL: https://issues.apache.org/jira/browse/SPARK-44322
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44333) Move EnhancedLogicalPlan out of ParserUtils

2023-07-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44333.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Move EnhancedLogicalPlan out of ParserUtils
> ---
>
> Key: SPARK-44333
> URL: https://issues.apache.org/jira/browse/SPARK-44333
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44326) Move utils that are used from Scala client to the common modules

2023-07-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-44326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-44326.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Move utils that are used from Scala client to the common modules
> 
>
> Key: SPARK-44326
> URL: https://issues.apache.org/jira/browse/SPARK-44326
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44335) SQL parser should have a SQLConf parameter instead of relying on the active conf

2023-07-07 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-44335:
---

 Summary: SQL parser should have a SQLConf parameter instead of 
relying on the active conf
 Key: SPARK-44335
 URL: https://issues.apache.org/jira/browse/SPARK-44335
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44336) Add Python inbuilt functions to DataFrame for ease of use for Python developers

2023-07-07 Thread Stephen Offer (Jira)
Stephen Offer created SPARK-44336:
-

 Summary: Add Python inbuilt functions to DataFrame for ease of use 
for Python developers
 Key: SPARK-44336
 URL: https://issues.apache.org/jira/browse/SPARK-44336
 Project: Spark
  Issue Type: New Feature
  Components: PySpark
Affects Versions: 3.4.1
Reporter: Stephen Offer


Python developers are used to common inbuilt functions when developing but 
PySpark doesn't support any of the most used inbuilt functionality for 
DataFrames. PySpark already has this functionality for columns but not for the 
DataFrame itself. Adding this support for DataFrames would simplify some parts 
of development. For example:


{code:java}
if df == df1:   # DataFrame Equality 
if df != df2:   # DataFrame Inequality

df_large = df * 100 # Quickly make a larger dataframe through union of copies
# Very useful for performance testing

df_sub = df1 - df2  # Simple DataFrame subtraction
# Equivalent to df1.subtract(df2)

df4 = df + df1  # Equivalent to df.union(df1)

len(df) # Equivalent to df.count()

for row in df:  # Equivalent to `for row in df.collect():`
some_work(row)

if "company_name" in df: # Check if item is in the DataFrame

{code}
 

There is an ongoing DataFrame equality function effort in PR: 41833, I've also 
built my own.


These are suggestions, any other functions to be added or removed from this 
list can be discussed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44287) Define the computing logic through PartitionEvaluator API and use it in RowToColumnarExec & ColumnarToRowExec SQL operators.

2023-07-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44287.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41839
[https://github.com/apache/spark/pull/41839]

> Define the computing logic through  PartitionEvaluator API and use it in 
> RowToColumnarExec & ColumnarToRowExec SQL operators.
> -
>
> Key: SPARK-44287
> URL: https://issues.apache.org/jira/browse/SPARK-44287
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Assignee: Vinod KC
>Priority: Major
> Fix For: 3.5.0
>
>
>    
> Define the computing logic through PartitionEvaluator API and use it in 
> RowToColumnarExec & ColumnarToRowExec SQL operators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44287) Define the computing logic through PartitionEvaluator API and use it in RowToColumnarExec & ColumnarToRowExec SQL operators.

2023-07-07 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-44287:
---

Assignee: Vinod KC

> Define the computing logic through  PartitionEvaluator API and use it in 
> RowToColumnarExec & ColumnarToRowExec SQL operators.
> -
>
> Key: SPARK-44287
> URL: https://issues.apache.org/jira/browse/SPARK-44287
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Vinod KC
>Assignee: Vinod KC
>Priority: Major
>
>    
> Define the computing logic through PartitionEvaluator API and use it in 
> RowToColumnarExec & ColumnarToRowExec SQL operators.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44337) Any fields set to Any.getDefaultInstance cause exceptions.

2023-07-07 Thread Raghu Angadi (Jira)
Raghu Angadi created SPARK-44337:


 Summary: Any fields set to Any.getDefaultInstance cause exceptions.
 Key: SPARK-44337
 URL: https://issues.apache.org/jira/browse/SPARK-44337
 Project: Spark
  Issue Type: Task
  Components: Protobuf
Affects Versions: 3.4.1
Reporter: Raghu Angadi
 Fix For: 3.4.2


Protobuf functions added support for converting `Any` fields to json strings. 

It uses Protobuf's built in `JsonFormat` to covert to JSON.

JsonFormat fails to handled the fields when they are set to 
`Any.getDefaultInstance()` in the original message. This fails only while using 
descriptor set, but does not fail while using Java classes. Since using 
descriptor files is the common case, this can be blocker. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44338) fix view schema mismatch error message

2023-07-07 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-44338:
---

 Summary: fix view schema mismatch error message
 Key: SPARK-44338
 URL: https://issues.apache.org/jira/browse/SPARK-44338
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44339) spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table provdatatypeall. Permission denied: user [AD user] does not have [S

2023-07-07 Thread Amar Gurung (Jira)
Amar Gurung created SPARK-44339:
---

 Summary: spark3-shell fails with error 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
provdatatypeall. Permission denied: user [AD user] does not have [SELECT] 
privilege on [/] to read view metadata 
 Key: SPARK-44339
 URL: https://issues.apache.org/jira/browse/SPARK-44339
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, Spark Submit
Affects Versions: 3.0.3
 Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
enabled.
Reporter: Amar Gurung


*Problem statement* 

A hive view is created using beeline to restrict the users from accessing the 
original hive table since the data contains sensitive information. 

For illustration purpose, let's consider a sensitive table as emp_db.employee 
with columns id, name, salary created through beeline by user '{*}userA{*}'

 
{code:java}
create external table emp_db.employee (id int, name string, salary double) 
location ''{code}
 

A view is created using beeline by the same user '{*}userA{*}'

 
{code:java}
ate view empview_db.emp_v  as select id,name from emp_db.employee' owned by 
'user-A{code}
 

>From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
>'{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
>permission.

 

*Steps to replicate* 
 # ssh to edge node where beeline is available using *userB* 
 # Try executing following queries
 ## select * from emp_db.employee  *;*
 ## desc formatted empview_db.emp_v;
 ## Above queries works fine without any issues.
 # Now, try using spark3-shell using *userB* 
{code:java}
# spark3-shell --deploy-mode client
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
exist
Spark context Web UI available at http://xxx:4040
Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
Spark session available as 'spark'.
Welcome to
                    __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
      /_/
         
Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.scala> spark.table("empview_db.emp_v").schema
23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf hive.execution.engine 
is 'tez' and will be reset to 'mr' to disable useless hive logic
Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
employee. Permission denied: user [userB] does not have [SELECT] privilege on 
[emp_db/employee]
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
  at 
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
  at 
org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
  at scala.Option.orElse(Option.scala:447)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205)
  at scala.Option.orElse(Option.scala:447)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1068)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1032)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
  at 
org.apache.spark.

[jira] [Updated] (SPARK-44339) spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have

2023-07-07 Thread Amar Gurung (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Gurung updated SPARK-44339:

Summary: spark3-shell fails with error 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
. Permission denied: user [AD user] does not have [SELECT] 
privilege on [/] to read view metadata   (was: 
spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: 
Unable to fetch table provdatatypeall. Permission denied: user [AD user] does 
not have [SELECT] privilege on [/] to read view metadata )

> spark3-shell fails with error 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> . Permission denied: user [AD user] does not have [SELECT] 
> privilege on [/] to read view metadata 
> ---
>
> Key: SPARK-44339
> URL: https://issues.apache.org/jira/browse/SPARK-44339
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.0.3
> Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
> enabled.
>Reporter: Amar Gurung
>Priority: Blocker
>
> *Problem statement* 
> A hive view is created using beeline to restrict the users from accessing the 
> original hive table since the data contains sensitive information. 
> For illustration purpose, let's consider a sensitive table as emp_db.employee 
> with columns id, name, salary created through beeline by user '{*}userA{*}'
>  
> {code:java}
> create external table emp_db.employee (id int, name string, salary double) 
> location ''{code}
>  
> A view is created using beeline by the same user '{*}userA{*}'
>  
> {code:java}
> ate view empview_db.emp_v  as select id,name from emp_db.employee' owned by 
> 'user-A{code}
>  
> From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
> '{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
> permission.
>  
> *Steps to replicate* 
>  # ssh to edge node where beeline is available using *userB* 
>  # Try executing following queries
>  ## select * from emp_db.employee  *;*
>  ## desc formatted empview_db.emp_v;
>  ## Above queries works fine without any issues.
>  # Now, try using spark3-shell using *userB* 
> {code:java}
> # spark3-shell --deploy-mode client
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
> exist
> Spark context Web UI available at http://xxx:4040
> Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
>       /_/
>          
> Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> spark.table("empview_db.emp_v").schema
> 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf 
> hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
> hive logic
> Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> employee. Permission denied: user [userB] does not have [SELECT] privilege on 
> [emp_db/employee]
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
>   at scala.Option.orElse(O

[jira] [Updated] (SPARK-44339) spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have

2023-07-07 Thread Amar Gurung (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Gurung updated SPARK-44339:

Summary: spark3-shell fails with error 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
. Permission denied: user [AD user] does not have [SELECT] 
privilege on [/] to read hive view metadata   (was: 
spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: 
Unable to fetch table . Permission denied: user [AD user] does 
not have [SELECT] privilege on [/] to read view metadata )

> spark3-shell fails with error 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> . Permission denied: user [AD user] does not have [SELECT] 
> privilege on [/] to read hive view metadata 
> 
>
> Key: SPARK-44339
> URL: https://issues.apache.org/jira/browse/SPARK-44339
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.0.3
> Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
> enabled.
>Reporter: Amar Gurung
>Priority: Blocker
>
> *Problem statement* 
> A hive view is created using beeline to restrict the users from accessing the 
> original hive table since the data contains sensitive information. 
> For illustration purpose, let's consider a sensitive table as emp_db.employee 
> with columns id, name, salary created through beeline by user '{*}userA{*}'
>  
> {code:java}
> create external table emp_db.employee (id int, name string, salary double) 
> location ''{code}
>  
> A view is created using beeline by the same user '{*}userA{*}'
>  
> {code:java}
> ate view empview_db.emp_v  as select id,name from emp_db.employee' owned by 
> 'user-A{code}
>  
> From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
> '{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
> permission.
>  
> *Steps to replicate* 
>  # ssh to edge node where beeline is available using *userB* 
>  # Try executing following queries
>  ## select * from emp_db.employee  *;*
>  ## desc formatted empview_db.emp_v;
>  ## Above queries works fine without any issues.
>  # Now, try using spark3-shell using *userB* 
> {code:java}
> # spark3-shell --deploy-mode client
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
> exist
> Spark context Web UI available at http://xxx:4040
> Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
>       /_/
>          
> Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> spark.table("empview_db.emp_v").schema
> 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf 
> hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
> hive logic
> Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> employee. Permission denied: user [userB] does not have [SELECT] privilege on 
> [emp_db/employee]
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
>   at scala.Option.orElse(O

[jira] [Updated] (SPARK-44339) spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have

2023-07-07 Thread Amar Gurung (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Gurung updated SPARK-44339:

Description: 
*Problem statement* 

A hive view is created using beeline to restrict the users from accessing the 
original hive table since the data contains sensitive information. 

For illustration purpose, let's consider a sensitive table as emp_db.employee 
with columns id, name, salary created through beeline by user '{*}userA{*}'

 
{code:java}
create external table emp_db.employee (id int, name string, salary double) 
location ''{code}
 

A view is created using beeline by the same user '{*}userA{*}'

 
{code:java}
ate view empview_db.emp_v  as select id,name from emp_db.employee' {code}
 

>From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
>'{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
>permission.

 

*Steps to replicate* 
 # ssh to edge node where beeline is available using *userB*
 # Try executing following queries
 ## select * from emp_db.employee  *;*
 ## desc formatted empview_db.emp_v;
 ## Above queries works fine without any issues.
 # Now, try using spark3-shell using *userB* 
{code:java}
# spark3-shell --deploy-mode client
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
exist
Spark context Web UI available at http://xxx:4040
Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
Spark session available as 'spark'.
Welcome to
                    __
     / __/__  ___ _/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
      /_/
         
Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
Type in expressions to have them evaluated.
Type :help for more information.scala> spark.table("empview_db.emp_v").schema
23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf hive.execution.engine 
is 'tez' and will be reset to 'mr' to disable useless hive logic
Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
employee. Permission denied: user [userB] does not have [SELECT] privilege on 
[emp_db/employee]
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
  at 
org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
  at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
  at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
  at 
org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
  at 
org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
  at scala.Option.orElse(Option.scala:447)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205)
  at scala.Option.orElse(Option.scala:447)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1068)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1032)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
  at 
org.apa

[jira] [Updated] (SPARK-44339) spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have

2023-07-07 Thread Amar Gurung (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Gurung updated SPARK-44339:

Affects Version/s: 3.3.0
   (was: 3.0.3)

> spark3-shell fails with error 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> . Permission denied: user [AD user] does not have [SELECT] 
> privilege on [/] to read hive view metadata 
> 
>
> Key: SPARK-44339
> URL: https://issues.apache.org/jira/browse/SPARK-44339
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.3.0
> Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
> enabled.
>Reporter: Amar Gurung
>Priority: Blocker
>
> *Problem statement* 
> A hive view is created using beeline to restrict the users from accessing the 
> original hive table since the data contains sensitive information. 
> For illustration purpose, let's consider a sensitive table as emp_db.employee 
> with columns id, name, salary created through beeline by user '{*}userA{*}'
>  
> {code:java}
> create external table emp_db.employee (id int, name string, salary double) 
> location ''{code}
>  
> A view is created using beeline by the same user '{*}userA{*}'
>  
> {code:java}
> ate view empview_db.emp_v  as select id,name from emp_db.employee' {code}
>  
> From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
> '{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
> permission.
>  
> *Steps to replicate* 
>  # ssh to edge node where beeline is available using *userB*
>  # Try executing following queries
>  ## select * from emp_db.employee  *;*
>  ## desc formatted empview_db.emp_v;
>  ## Above queries works fine without any issues.
>  # Now, try using spark3-shell using *userB* 
> {code:java}
> # spark3-shell --deploy-mode client
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
> exist
> Spark context Web UI available at http://xxx:4040
> Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
>       /_/
>          
> Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> spark.table("empview_db.emp_v").schema
> 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf 
> hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
> hive logic
> Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> employee. Permission denied: user [userB] does not have [SELECT] privilege on 
> [emp_db/employee]
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197)
>   at 
> org.apache.s

[jira] [Updated] (SPARK-44339) spark3-shell fails with error org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table . Permission denied: user [AD user] does not have

2023-07-07 Thread Amar Gurung (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amar Gurung updated SPARK-44339:

Priority: Critical  (was: Blocker)

> spark3-shell fails with error 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> . Permission denied: user [AD user] does not have [SELECT] 
> privilege on [/] to read hive view metadata 
> 
>
> Key: SPARK-44339
> URL: https://issues.apache.org/jira/browse/SPARK-44339
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.3.0
> Environment: CDP 7.1.7 Ranger, kerberized and hadoop impersonation 
> enabled.
>Reporter: Amar Gurung
>Priority: Critical
>
> *Problem statement* 
> A hive view is created using beeline to restrict the users from accessing the 
> original hive table since the data contains sensitive information. 
> For illustration purpose, let's consider a sensitive table as emp_db.employee 
> with columns id, name, salary created through beeline by user '{*}userA{*}'
>  
> {code:java}
> create external table emp_db.employee (id int, name string, salary double) 
> location ''{code}
>  
> A view is created using beeline by the same user '{*}userA{*}'
>  
> {code:java}
> ate view empview_db.emp_v  as select id,name from emp_db.employee' {code}
>  
> From Ranger UI, we define a policy under Hadoop SQL Policies that will let 
> '{*}userB{*}' to access database - empview_db  and table - emp_v with SELECT 
> permission.
>  
> *Steps to replicate* 
>  # ssh to edge node where beeline is available using *userB*
>  # Try executing following queries
>  ## select * from emp_db.employee  *;*
>  ## desc formatted empview_db.emp_v;
>  ## Above queries works fine without any issues.
>  # Now, try using spark3-shell using *userB* 
> {code:java}
> # spark3-shell --deploy-mode client
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> 23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not 
> exist
> Spark context Web UI available at http://xxx:4040
> Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
> Spark session available as 'spark'.
> Welcome to
>                     __
>      / __/__  ___ _/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
>       /_/
>          
> Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_181)
> Type in expressions to have them evaluated.
> Type :help for more information.scala> spark.table("empview_db.emp_v").schema
> 23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf 
> hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless 
> hive logic
> Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
> org.apache.spark.sql.AnalysisException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
> employee. Permission denied: user [userB] does not have [SELECT] privilege on 
> [emp_db/employee]
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
>   at 
> org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
>   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205)
>   at scala.Option.orElse(Option.scala:447)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197)
>   at 
> org.apache.spark.sql.catalyst.analysis.An

[jira] [Commented] (SPARK-44301) Add Support for TPCH Micro Benchmark

2023-07-07 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741237#comment-17741237
 ] 

Snoot.io commented on SPARK-44301:
--

User 'oss-maker' has created a pull request for this issue:
https://github.com/apache/spark/pull/41856

> Add Support for TPCH Micro Benchmark
> 
>
> Key: SPARK-44301
> URL: https://issues.apache.org/jira/browse/SPARK-44301
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Suraj Naik
>Priority: Trivial
>
> I am proposing this Jira to add support for benchmark for TPCH. I see that 
> there is currently support for TPCDS, but couldn't find one for TPCH. I will 
> add support for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44301) Add Support for TPCH Micro Benchmark

2023-07-07 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741238#comment-17741238
 ] 

Snoot.io commented on SPARK-44301:
--

User 'oss-maker' has created a pull request for this issue:
https://github.com/apache/spark/pull/41856

> Add Support for TPCH Micro Benchmark
> 
>
> Key: SPARK-44301
> URL: https://issues.apache.org/jira/browse/SPARK-44301
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Suraj Naik
>Priority: Trivial
>
> I am proposing this Jira to add support for benchmark for TPCH. I see that 
> there is currently support for TPCDS, but couldn't find one for TPCH. I will 
> add support for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44328) Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]

2023-07-07 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741240#comment-17741240
 ] 

Snoot.io commented on SPARK-44328:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41889

> Assign names to the error class _LEGACY_ERROR_TEMP_[2325-2328]
> --
>
> Key: SPARK-44328
> URL: https://issues.apache.org/jira/browse/SPARK-44328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43964) Support arrow-optimized Python UDTFs

2023-07-07 Thread Snoot.io (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741241#comment-17741241
 ] 

Snoot.io commented on SPARK-43964:
--

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/41867

> Support arrow-optimized Python UDTFs
> 
>
> Key: SPARK-43964
> URL: https://issues.apache.org/jira/browse/SPARK-43964
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Priority: Major
>
> Support arrow-based evaluation for Python UDTFs by default.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44340) Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec

2023-07-07 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-44340:
--

 Summary: Define the computing logic through PartitionEvaluator API 
and use it in WindowGroupLimitExec
 Key: SPARK-44340
 URL: https://issues.apache.org/jira/browse/SPARK-44340
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44340) Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec

2023-07-07 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741242#comment-17741242
 ] 

jiaan.geng commented on SPARK-44340:


I'm working on.

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowGroupLimitExec
> 
>
> Key: SPARK-44340
> URL: https://issues.apache.org/jira/browse/SPARK-44340
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44340) Define the computing logic through PartitionEvaluator API and use it in WindowGroupLimitExec

2023-07-07 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-44340:
---
Description: Define the computing logic through PartitionEvaluator API and 
use it in WindowGroupLimitExec

> Define the computing logic through PartitionEvaluator API and use it in 
> WindowGroupLimitExec
> 
>
> Key: SPARK-44340
> URL: https://issues.apache.org/jira/browse/SPARK-44340
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> Define the computing logic through PartitionEvaluator API and use it in 
> WindowGroupLimitExec



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44061) Add assert_df_equality util function

2023-07-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44061.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41606
[https://github.com/apache/spark/pull/41606]

> Add assert_df_equality util function
> 
>
> Key: SPARK-44061
> URL: https://issues.apache.org/jira/browse/SPARK-44061
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44061) Add assert_df_equality util function

2023-07-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44061:


Assignee: Amanda Liu

> Add assert_df_equality util function
> 
>
> Key: SPARK-44061
> URL: https://issues.apache.org/jira/browse/SPARK-44061
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Assignee: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44290) Session-based files and archives in Spark Connect

2023-07-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44290.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 41495
[https://github.com/apache/spark/pull/41495]

> Session-based files and archives in Spark Connect
> -
>
> Key: SPARK-44290
> URL: https://issues.apache.org/jira/browse/SPARK-44290
> Project: Spark
>  Issue Type: Task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.5.0
>
>
> Similar with SPARK-44146 and SPARK-44078. We need the session based artifact 
> in executor side for PySpark too. For example, Python UDF execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44290) Session-based files and archives in Spark Connect

2023-07-07 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44290:


Assignee: Hyukjin Kwon

> Session-based files and archives in Spark Connect
> -
>
> Key: SPARK-44290
> URL: https://issues.apache.org/jira/browse/SPARK-44290
> Project: Spark
>  Issue Type: Task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Similar with SPARK-44146 and SPARK-44078. We need the session based artifact 
> in executor side for PySpark too. For example, Python UDF execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43907) Add SQL functions into Scala, Python and R API

2023-07-07 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17741246#comment-17741246
 ] 

Ruifeng Zheng commented on SPARK-43907:
---

[~beliefer][~LuciferYang][~panbingkun][~ivoson] many thanks for your 
contribution!
I just go though all registered sql functions, and create subtask 23 & 24 for 
newly added functions, feel free to pick up.

> Add SQL functions into Scala, Python and R API
> --
>
> Key: SPARK-43907
> URL: https://issues.apache.org/jira/browse/SPARK-43907
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SparkR, SQL
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See the discussion in dev mailing list 
> (https://lists.apache.org/thread/0tdcfyzxzcv8w46qbgwys2rormhdgyqg).
> This is an umbrella JIRA to implement all SQL functions in Scala, Python and R



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org