date:20231010

[jira] [Created] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage

2023-10-10 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45500:
-

 Summary: Show the number of abnormally completed drivers in 
MasterPage
 Key: SPARK-45500
 URL: https://issues.apache.org/jira/browse/SPARK-45500
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core, Web UI
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773921#comment-17773921
 ] 

Zhizhen Hou commented on SPARK-45478:
-

There are three children in If: predicate, trueValue and falseValue.

There are two execution paths. 1: predicate and trueValue. 2: predicate and 
falseValue.

There are three conbinations of possible common subexpression. 1: predicate and 
trueValue. 2: predicate and falseValue. 3: trueValue and falseValue.

So if all possible common subexpression be eliminated, there is 2 of 3 
possibility to improve the performance. For example, if there is common 
subexpression in predicate and falseValue, and common subexpression is executed 
only once, and it can improve the performance. Only there is common 
subexpression in trueValue and falseValue will not improve the performance and 
it will not draw back the performance, since whether trueValue and falseValue 
will be executed.

So, it looks good to check all three children in If. Any suggestions?

> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.children.head :: Nil
>   case c: Coalesce => c.children.head :: Nil
>   case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
> i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>  
> But the following comment in `childrenToRecurse`  makes me not sure it will 
> cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, 
> but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45498:
---
Labels: pull-request-available  (was: )

> Followup: Ignore task completion from old stage after retrying indeterminate 
> stages
> ---
>
> Key: SPARK-45498
> URL: https://issues.apache.org/jira/browse/SPARK-45498
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Mayur Bhosale
>Priority: Minor
>  Labels: pull-request-available
>
> With SPARK-45182, we added a fix for not letting laggard tasks of the older 
> attempts of the indeterminate stage from marking the partition has completed 
> in map output tracker.
> When a task completes, DAG scheduler also notifies all the tasksets of the 
> stage about that partition being completed. Tasksets would not schedule such 
> task if they are not already scheduled. This is not correct for indeterminate 
> stage, since we want to re-run all the tasks on re-attempt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14745) CEP support in Spark Streaming

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-14745:
---
Labels: pull-request-available  (was: )

> CEP support in Spark Streaming
> --
>
> Key: SPARK-14745
> URL: https://issues.apache.org/jira/browse/SPARK-14745
> Project: Spark
>  Issue Type: New Feature
>  Components: DStreams
>Reporter: Mario Briggs
>Priority: Major
>  Labels: pull-request-available
> Attachments: SparkStreamingCEP.pdf
>
>
> Complex Event Processing is a often used feature in Streaming applications. 
> Spark Streaming current does not have a DSL/API for it. This JIRA is about 
> how/what can we add in Spark Streaming to support CEP out of the box



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45499:
---
Labels: pull-request-available  (was: )

> Replace `Reference#isEnqueued` with `Reference#refersTo`
> 
>
> Key: SPARK-45499
> URL: https://issues.apache.org/jira/browse/SPARK-45499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages

2023-10-10 Thread Mayur Bhosale (Jira)

Mayur Bhosale created SPARK-45498:
-

 Summary: Followup: Ignore task completion from old stage after 
retrying indeterminate stages
 Key: SPARK-45498
 URL: https://issues.apache.org/jira/browse/SPARK-45498
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 4.0.0, 3.5.1
Reporter: Mayur Bhosale


With SPARK-45182, we added a fix for not letting laggard tasks of the older 
attempts of the indeterminate stage from marking the partition has completed in 
map output tracker.

When a task completes, DAG scheduler also notifies all the tasksets of the 
stage about that partition being completed. Tasksets would not schedule such 
task if they are not already scheduled. This is not correct for indeterminate 
stage, since we want to re-run all the tasks on re-attempt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`

2023-10-10 Thread Yang Jie (Jira)

Yang Jie created SPARK-45499:


 Summary: Replace `Reference#isEnqueued` with `Reference#refersTo`
 Key: SPARK-45499
 URL: https://issues.apache.org/jira/browse/SPARK-45499
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Tests
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773897#comment-17773897
 ] 

Yang Jie commented on SPARK-45482:
--

OK, Let me close this one

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>       UserGroupInformation clientUGI =
>           
> ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
>       return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
> serverHttpUrl));
>     }
>   } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---

[jira] [Resolved] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45482.
--
Resolution: Won't Fix

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>       UserGroupInformation clientUGI =
>           
> ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
>       return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
> serverHttpUrl));
>     }
>   } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issue

[jira] [Updated] (SPARK-45497) Add a symbolic link file `spark-examples.jar` in K8s Docker images

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45497:
--
Summary: Add a symbolic link file `spark-examples.jar` in K8s Docker images 
 (was: Add an symbolic link file `spark-examples.jar` in K8s Docker images)

> Add a symbolic link file `spark-examples.jar` in K8s Docker images
> --
>
> Key: SPARK-45497
> URL: https://issues.apache.org/jira/browse/SPARK-45497
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45497) Add an symbolic link file `spark-examples.jar` in K8s Docker images

2023-10-10 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-45497:
-

 Summary: Add an symbolic link file `spark-examples.jar` in K8s 
Docker images
 Key: SPARK-45497
 URL: https://issues.apache.org/jira/browse/SPARK-45497
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45497) Add an symbolic link file `spark-examples.jar` in K8s Docker images

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45497:
---
Labels: pull-request-available  (was: )

> Add an symbolic link file `spark-examples.jar` in K8s Docker images
> ---
>
> Key: SPARK-45497
> URL: https://issues.apache.org/jira/browse/SPARK-45497
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42881) Codegen Support for get_json_object

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42881:
---
Labels: pull-request-available  (was: )

> Codegen Support for get_json_object
> ---
>
> Key: SPARK-42881
> URL: https://issues.apache.org/jira/browse/SPARK-42881
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-41730) `min` fails on the minimal timestamp

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-41730:
---
Labels: pull-request-available  (was: )

> `min` fails on the minimal timestamp
> 
>
> Key: SPARK-41730
> URL: https://issues.apache.org/jira/browse/SPARK-41730
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> The code below demonstrates the issue:
> {code:python}
> >>> from datetime import datetime, timezone
> >>> from pyspark.sql.types import TimestampType
> >>> from pyspark.sql import functions as F
> >>> ts = spark.createDataFrame([datetime(1, 1, 1, 0, 0, 0, 0, 
> >>> tzinfo=timezone.utc)], TimestampType()).toDF("test_column")
> >>> ts.select(F.min('test_column')).first()[0]
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2762, in first
> return self.head()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2738, in head
> rs = self.head(1)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 2740, in head
> return self.take(n)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1297, in take
> return self.limit(num).collect()
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", 
> line 1198, in collect
> return list(_load_from_socket(sock_info, 
> BatchedSerializer(CPickleSerializer(
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 152, in load_stream
> yield self._read_with_length(stream)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 174, in _read_with_length
> return self.loads(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", 
> line 472, in loads
> return cloudpickle.loads(obj, encoding=encoding)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 2010, in 
> return lambda *a: dataType.fromInternal(a)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1018, in fromInternal
> values = [
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 1019, in 
> f.fromInternal(v) if c else v
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 667, in fromInternal
> return self.dataType.fromInternal(obj)
>   File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line 
> 279, in fromInternal
> return datetime.datetime.fromtimestamp(ts // 
> 100).replace(microsecond=ts % 100)
> ValueError: year 0 is out of range
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773892#comment-17773892
 ] 

Dongjoon Hyun commented on SPARK-45482:
---

Actually, I'm not sure about those three cases. Why don't we keep them for now 
because Java 21 keeps them still, [~LuciferYang] ?

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>       UserGroupInformation clientUGI =
>           
> ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
>       return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
> serverHttpUrl));
>     }
>   } {code}
>

[jira] [Comment Edited] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773892#comment-17773892
 ] 

Dongjoon Hyun edited comment on SPARK-45482 at 10/11/23 4:15 AM:
-

Actually, I'm not sure about those three cases. Yes, let's keep them for now 
because Java 21 keeps them still, [~LuciferYang] .


was (Author: dongjoon):
Actually, I'm not sure about those three cases. Why don't we keep them for now 
because Java 21 keeps them still, [~LuciferYang] ?

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>       UserGroupInformation clientUGI =
>

[jira] [Updated] (SPARK-45468) More transparent proxy handling for HTTP redirects

2023-10-10 Thread Nobuaki Sukegawa (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nobuaki Sukegawa updated SPARK-45468:
-
Description: 
Currently, proxies can be made transparent for hyperlinks in Spark web UIs with 
spark.ui.proxyRoot or X-Forwarded-Context header alone. However, HTTP redirects 
(such as job/stage kill) currently requires explicit spark.ui.proxyRedirectUri 
as well for handling proxy. This is not ideal as proxy hostname may not be 
known at the time configuring Spark apps.

This can be mitigated by 1) always prepending spark.ui.proxyRoot to redirect 
path for those proxies that intelligently rewrite Location headers and 2) by 
using path without hostname (/jobs/, not [https://example.com/jobs/]) for those 
proxies without Location header rewrites. Then redirects behavior would be 
basically the same way as other hyperlinks.
h2. Example

Let's say proxy URL is [https://example.org/sparkui/]... forwarding to 
[http://drv.svc/]...
and spark.ui.proxyRoot is configured to be /sparkui
h3. Existing behavior (without spark.ui.proxyRedirectUri)

job/stage kill links redirects to [http://drv.svc/jobs/] - likely 404
(other hyperlinks are to paths with prefix, e.g., /sparkui/executors - works 
fine)
h3. After the change 2)

links redirects to /sparkui/jobs/ - works fine
also consistent with other hyperlinks

NOTE: while hostname was originally required in RFC 2616 in 1999, since RFC 
7231 in 2014 hostname can be formally omitted as most browsers already 
supported it (it is rather hard to find any browser that doesn't support it).

  was:
Currently, proxies can be made transparent for hyperlinks in Spark web UIs with 
spark.ui.proxyRoot or X-Forwarded-Context header alone. However, HTTP redirects 
(such as job/stage kill) currently requires explicit spark.ui.proxyRedirectUri 
as well for handling proxy. This is not ideal as proxy hostname may not be 
known at the time configuring Spark apps.

This can be mitigated by 1) always prepending spark.ui.proxyRoot to redirect 
path for those proxies that intelligently rewrite Location headers and 2) by 
using path without hostname (/jobs/, not https://example.com/jobs/) for those 
proxies without Location header rewrites. Then redirects behavior would be 
basically the same way as other hyperlinks.

Regarding 2), while hostname was originally required in RFC 2616 in 1999, since 
RFC 7231 in 2014 hostname can be formally omitted as most browsers already 
supported it (it is rather hard to find any browser that doesn't support it).


> More transparent proxy handling for HTTP redirects
> --
>
> Key: SPARK-45468
> URL: https://issues.apache.org/jira/browse/SPARK-45468
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.5.0
>Reporter: Nobuaki Sukegawa
>Priority: Major
>  Labels: pull-request-available
>
> Currently, proxies can be made transparent for hyperlinks in Spark web UIs 
> with spark.ui.proxyRoot or X-Forwarded-Context header alone. However, HTTP 
> redirects (such as job/stage kill) currently requires explicit 
> spark.ui.proxyRedirectUri as well for handling proxy. This is not ideal as 
> proxy hostname may not be known at the time configuring Spark apps.
> This can be mitigated by 1) always prepending spark.ui.proxyRoot to redirect 
> path for those proxies that intelligently rewrite Location headers and 2) by 
> using path without hostname (/jobs/, not [https://example.com/jobs/]) for 
> those proxies without Location header rewrites. Then redirects behavior would 
> be basically the same way as other hyperlinks.
> h2. Example
> Let's say proxy URL is [https://example.org/sparkui/]... forwarding to 
> [http://drv.svc/]...
> and spark.ui.proxyRoot is configured to be /sparkui
> h3. Existing behavior (without spark.ui.proxyRedirectUri)
> job/stage kill links redirects to [http://drv.svc/jobs/] - likely 404
> (other hyperlinks are to paths with prefix, e.g., /sparkui/executors - works 
> fine)
> h3. After the change 2)
> links redirects to /sparkui/jobs/ - works fine
> also consistent with other hyperlinks
> NOTE: while hostname was originally required in RFC 2616 in 1999, since RFC 
> 7231 in 2014 hostname can be formally omitted as most browsers already 
> supported it (it is rather hard to find any browser that doesn't support it).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45494:
-

Assignee: Takuya Ueshin

> Introduce read/write a byte array util functions for PythonWorkerUtils
> --
>
> Key: SPARK-45494
> URL: https://issues.apache.org/jira/browse/SPARK-45494
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45494.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43321
[https://github.com/apache/spark/pull/43321]

> Introduce read/write a byte array util functions for PythonWorkerUtils
> --
>
> Key: SPARK-45494
> URL: https://issues.apache.org/jira/browse/SPARK-45494
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45310) Mapstatus location type changed from external shuffle service to executor after decommission migration

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45310.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43112
[https://github.com/apache/spark/pull/43112]

> Mapstatus location type changed from external shuffle service to executor 
> after decommission migration
> --
>
> Key: SPARK-45310
> URL: https://issues.apache.org/jira/browse/SPARK-45310
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> When migrating shuffle blocks during decommission, the updated mapstatus 
> location doesn't respect the external shuffle service location when external 
> shuffle service is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45496) Fix the compilation warning related to other-pure-statement

2023-10-10 Thread Yang Jie (Jira)

Yang Jie created SPARK-45496:


 Summary: Fix the compilation warning related to 
other-pure-statement
 Key: SPARK-45496
 URL: https://issues.apache.org/jira/browse/SPARK-45496
 Project: Spark
  Issue Type: Sub-task
  Components: DStreams, Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
"-Wconf:cat=other-match-analysis&site=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv",
"-Wconf:cat=other-pure-statement&site=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv",
"-Wconf:cat=other-pure-statement&site=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv",
"-Wconf:cat=other-pure-statement&site=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv",
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order

2023-10-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45416.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43219
[https://github.com/apache/spark/pull/43219]

> Sanity check that Spark Connect returns arrow batches in order
> --
>
> Key: SPARK-45416
> URL: https://issues.apache.org/jira/browse/SPARK-45416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order

2023-10-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45416:


Assignee: Juliusz Sompolski

> Sanity check that Spark Connect returns arrow batches in order
> --
>
> Key: SPARK-45416
> URL: https://issues.apache.org/jira/browse/SPARK-45416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Assignee: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45473) Incorrect error message for RoundBase

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45473:
-

Assignee: L. C. Hsieh

> Incorrect error message for RoundBase
> -
>
> Key: SPARK-45473
> URL: https://issues.apache.org/jira/browse/SPARK-45473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45473) Incorrect error message for RoundBase

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45473.
---
Fix Version/s: 3.5.1
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 43316
[https://github.com/apache/spark/pull/43316]

> Incorrect error message for RoundBase
> -
>
> Key: SPARK-45473
> URL: https://issues.apache.org/jira/browse/SPARK-45473
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0, 3.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45464) [CORE] Fix yarn distribution build

2023-10-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45464.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43289
[https://github.com/apache/spark/pull/43289]

> [CORE] Fix yarn distribution build
> --
>
> Key: SPARK-45464
> URL: https://issues.apache.org/jira/browse/SPARK-45464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [https://github.com/apache/spark/pull/43164] introduced a regression in:
>  
> ```
> ./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn
> ```
>  
> this needs to be fixed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45464) [CORE] Fix yarn distribution build

2023-10-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45464:


Assignee: Hasnain Lakhani

> [CORE] Fix yarn distribution build
> --
>
> Key: SPARK-45464
> URL: https://issues.apache.org/jira/browse/SPARK-45464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Assignee: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> [https://github.com/apache/spark/pull/43164] introduced a regression in:
>  
> ```
> ./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn
> ```
>  
> this needs to be fixed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45495) Support stage level task resource profile for k8s cluster when dynamic allocation disabled

2023-10-10 Thread Bobby Wang (Jira)

Bobby Wang created SPARK-45495:
--

 Summary: Support stage level task resource profile for k8s cluster 
when dynamic allocation disabled
 Key: SPARK-45495
 URL: https://issues.apache.org/jira/browse/SPARK-45495
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Bobby Wang
Assignee: Bobby Wang
 Fix For: 4.0.0, 3.5.1


[https://github.com/apache/spark/pull/37268] has introduced a new feature that 
supports stage-level schedule task resource profile for standalone cluster when 
dynamic allocation is disabled. It's really cool feature, especially for ML/DL 
cases, more details can be found in that PR.

 

The problem here is that the feature is only available for standalone cluster 
for now, but most users would also expect it can be used for other spark 
clusters like yarn and k8s.

 

So I file this issue to track this task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45495) Support stage level task resource profile for k8s cluster when dynamic allocation disabled

2023-10-10 Thread Bobby Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bobby Wang updated SPARK-45495:
---
Description: 
[https://github.com/apache/spark/pull/37268] has introduced a new feature that 
supports stage-level schedule task resource profile for standalone cluster when 
dynamic allocation is disabled. It's really cool feature, especially for ML/DL 
cases, more details can be found in that PR.

 

The problem here is that the feature is only available for standalone and YARN 
cluster for now, but most users would also expect it can be used for other 
spark clusters like K8s.

 

So I filed this issue to track this task.

  was:
[https://github.com/apache/spark/pull/37268] has introduced a new feature that 
supports stage-level schedule task resource profile for standalone cluster when 
dynamic allocation is disabled. It's really cool feature, especially for ML/DL 
cases, more details can be found in that PR.

 

The problem here is that the feature is only available for standalone cluster 
for now, but most users would also expect it can be used for other spark 
clusters like yarn and k8s.

 

So I file this issue to track this task.


> Support stage level task resource profile for k8s cluster when dynamic 
> allocation disabled
> --
>
> Key: SPARK-45495
> URL: https://issues.apache.org/jira/browse/SPARK-45495
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Bobby Wang
>Assignee: Bobby Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> [https://github.com/apache/spark/pull/37268] has introduced a new feature 
> that supports stage-level schedule task resource profile for standalone 
> cluster when dynamic allocation is disabled. It's really cool feature, 
> especially for ML/DL cases, more details can be found in that PR.
>  
> The problem here is that the feature is only available for standalone and 
> YARN cluster for now, but most users would also expect it can be used for 
> other spark clusters like K8s.
>  
> So I filed this issue to track this task.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43828) Add config to control whether close idle connection

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43828:
---
Labels: pull-request-available  (was: )

> Add config to control whether close idle connection
> ---
>
> Key: SPARK-43828
> URL: https://issues.apache.org/jira/browse/SPARK-43828
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45494:
---
Labels: pull-request-available  (was: )

> Introduce read/write a byte array util functions for PythonWorkerUtils
> --
>
> Key: SPARK-45494
> URL: https://issues.apache.org/jira/browse/SPARK-45494
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils

2023-10-10 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-45494:
-

 Summary: Introduce read/write a byte array util functions for 
PythonWorkerUtils
 Key: SPARK-45494
 URL: https://issues.apache.org/jira/browse/SPARK-45494
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45493) Replace: _LEGACY_ERROR_TEMP_2187 with a better error message

2023-10-10 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-45493:


 Summary: Replace: _LEGACY_ERROR_TEMP_2187 with a better error 
message
 Key: SPARK-45493
 URL: https://issues.apache.org/jira/browse/SPARK-45493
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def convertHiveTableToCatalogTableError(
e: SparkException, dbName: String, tableName: String): Throwable = {
  new SparkException(
errorClass = "_LEGACY_ERROR_TEMP_2187",
messageParameters = Map(
  "message" -> e.getMessage,
  "dbName" -> dbName,
  "tableName" -> tableName),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45492) Replace: _LEGACY_ERROR_TEMP_2152 with a better error class

2023-10-10 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-45492:


 Summary: Replace: _LEGACY_ERROR_TEMP_2152 with a better error class
 Key: SPARK-45492
 URL: https://issues.apache.org/jira/browse/SPARK-45492
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def expressionEncodingError(e: Exception, expressions: Seq[Expression]): 
SparkRuntimeException = {
  new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2152",
messageParameters = Map(
  "e" -> e.toString(),
  "expressions" -> expressions.map(
_.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45491) Replace: _LEGACY_ERROR_TEMP_2196 with a better error class

2023-10-10 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-45491:


 Summary: Replace: _LEGACY_ERROR_TEMP_2196 with a better error class
 Key: SPARK-45491
 URL: https://issues.apache.org/jira/browse/SPARK-45491
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def cannotFetchTablesOfDatabaseError(dbName: String, e: Exception): Throwable = 
{
  new SparkException(
errorClass = "_LEGACY_ERROR_TEMP_2196",
messageParameters = Map(
  "dbName" -> dbName),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45490) Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class

2023-10-10 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-45490:


 Summary: Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class
 Key: SPARK-45490
 URL: https://issues.apache.org/jira/browse/SPARK-45490
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


{code:java}
def expressionDecodingError(e: Exception, expressions: Seq[Expression]): 
SparkRuntimeException = {
  new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2151",
messageParameters = Map(
  "e" -> e.toString(),
  "expressions" -> expressions.map(
_.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")),
cause = e)
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45489) Replace: _LEGACY_ERROR_TEMP_2134 with a regular error class

2023-10-10 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-45489:


 Summary: Replace:   _LEGACY_ERROR_TEMP_2134 with a regular error 
class
 Key: SPARK-45489
 URL: https://issues.apache.org/jira/browse/SPARK-45489
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


This is a frequently seen error we should convert:
def cannotParseStringAsDataTypeError(pattern: String, value: String, dataType: 
DataType)
: SparkRuntimeException = {
new SparkRuntimeException(
errorClass = "_LEGACY_ERROR_TEMP_2134",
messageParameters = Map(
"value" -> toSQLValue(value),
"pattern" -> toSQLValue(pattern),
"dataType" -> dataType.toString))
}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45488) XML: Add support for value in 'rowTag' element

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45488:
---
Labels: pull-request-available  (was: )

> XML: Add support for value in 'rowTag' element
> --
>
> Key: SPARK-45488
> URL: https://issues.apache.org/jira/browse/SPARK-45488
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Major
>  Labels: pull-request-available
>
> The following XML with rowTag 'book' will yield a schema with just "_id" 
> column and not the value:
>  
> {code:java}
>  Great Book{code}
> Let's parse value as well.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45488) XML: Add support for value in 'rowTag' element

2023-10-10 Thread Sandip Agarwala (Jira)

Sandip Agarwala created SPARK-45488:
---

 Summary: XML: Add support for value in 'rowTag' element
 Key: SPARK-45488
 URL: https://issues.apache.org/jira/browse/SPARK-45488
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Sandip Agarwala


The following XML with rowTag 'book' will yield a schema with just "_id" column 
and not the value:
 
{code:java}
 Great Book{code}
Let's parse value as well.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44076) SPIP: Python Data Source API

2023-10-10 Thread Allison Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44076:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)

> SPIP: Python Data Source API
> 
>
> Key: SPARK-44076
> URL: https://issues.apache.org/jira/browse/SPARK-44076
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> This proposal aims to introduce a simple API in Python for Data Sources. The 
> idea is to enable Python developers to create data sources without having to 
> learn Scala or deal with the complexities of the current data source APIs. 
> The goal is to make a Python-based API that is simple and easy to use, thus 
> making Spark more accessible to the wider Python developer community. This 
> proposed approach is based on the recently introduced Python user-defined 
> table functions (SPARK-43797) with extensions to support data sources.
> {*}SPIP{*}: 
> [https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45486) [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and include environment specific attributes

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45486:
---
Labels: pull-request-available  (was: )

> [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and 
> include environment specific attributes
> 
>
> Key: SPARK-45486
> URL: https://issues.apache.org/jira/browse/SPARK-45486
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Robert Dillitz
>Priority: Major
>  Labels: pull-request-available
>
> Similar to the [Python 
> client,|https://github.com/apache/spark/blob/2cc1ee4d3a05a641d7a245f015ef824d8f7bae8b/python/pyspark/sql/connect/client/core.py#L284]
>  the Scala client should:
>  # Recognize SPARK_CONNECT_USER_AGENT env variable
>  # Always include the OS, Java version, Scala version, and Spark version



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45477) Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test results/unit tests log` naming

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45477.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43305
[https://github.com/apache/spark/pull/43305]

> Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test 
> results/unit tests log` naming
> ---
>
> Key: SPARK-45477
> URL: https://issues.apache.org/jira/browse/SPARK-45477
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45477) Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test results/unit tests log` naming

2023-10-10 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45477:
-

Assignee: Yang Jie

> Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test 
> results/unit tests log` naming
> ---
>
> Key: SPARK-45477
> URL: https://issues.apache.org/jira/browse/SPARK-45477
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45487) Replace: _LEGACY_ERROR_TEMP_3007

2023-10-10 Thread Serge Rielau (Jira)

Serge Rielau created SPARK-45487:


 Summary: Replace: _LEGACY_ERROR_TEMP_3007
 Key: SPARK-45487
 URL: https://issues.apache.org/jira/browse/SPARK-45487
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


def checkpointRDDBlockIdNotFoundError(rddBlockId: RDDBlockId): Throwable = \{
new SparkException(
  errorClass = "_LEGACY_ERROR_TEMP_3007",
  messageParameters = Map("rddBlockId" -> s"$rddBlockId"),
  cause = null
)
  }
This error condition appears to be quite common, so we should convert it to a 
proper error class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45486) [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and include environment specific attributes

2023-10-10 Thread Robert Dillitz (Jira)

Robert Dillitz created SPARK-45486:
--

 Summary: [CONNECT][SCALA] Update user agent to recognize 
SPARK_CONNECT_USER_AGENT and include environment specific attributes
 Key: SPARK-45486
 URL: https://issues.apache.org/jira/browse/SPARK-45486
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0
Reporter: Robert Dillitz


Similar to the [Python 
client,|https://github.com/apache/spark/blob/2cc1ee4d3a05a641d7a245f015ef824d8f7bae8b/python/pyspark/sql/connect/client/core.py#L284]
 the Scala client should:
 # Recognize SPARK_CONNECT_USER_AGENT env variable
 # Always include the OS, Java version, Scala version, and Spark version



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45485) make add_artifact idempotent

2023-10-10 Thread Alice Sayutina (Jira)

Alice Sayutina created SPARK-45485:
--

 Summary: make add_artifact idempotent
 Key: SPARK-45485
 URL: https://issues.apache.org/jira/browse/SPARK-45485
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.5.0, 4.0.0
Reporter: Alice Sayutina


# Make add_artifact request idempotent i.e. subsequent requests will succeed if 
the same content is provided. This makes retrying more safe.
 # Fix existing error handling mechanism



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45204) Allow CommandPlugins to be trackable

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45204:
---
Labels: connect pull-request-available scala  (was: connect scala)

> Allow CommandPlugins to be trackable
> 
>
> Key: SPARK-45204
> URL: https://issues.apache.org/jira/browse/SPARK-45204
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Robert Dillitz
>Priority: Major
>  Labels: connect, pull-request-available, scala
>
> There is currently no way to track the QueryStatementType & compilation time 
> for queries executed by a CommandPlugin. I propose to change 
> SparkConnectPlanner to hold an optional ExecuteHolder that can then be used 
> by plugins.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45484) Fix the bug that uses incorrect parquet compression codec lz4raw

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45484:
---
Labels: pull-request-available  (was: )

> Fix the bug that uses incorrect parquet compression codec lz4raw
> 
>
> Key: SPARK-45484
> URL: https://issues.apache.org/jira/browse/SPARK-45484
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> lz4raw is not a correct parquet compression codec name.
> We should use lz4_raw as its name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151

2023-10-10 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-45213:


Assignee: Deng Ziming  (was: Haejoon Lee)

> Assign name to _LEGACY_ERROR_TEMP_2151
> --
>
> Key: SPARK-45213
> URL: https://issues.apache.org/jira/browse/SPARK-45213
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Deng Ziming
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating 
> dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error 
> class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151

2023-10-10 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45213.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43029
[https://github.com/apache/spark/pull/43029]

> Assign name to _LEGACY_ERROR_TEMP_2151
> --
>
> Key: SPARK-45213
> URL: https://issues.apache.org/jira/browse/SPARK-45213
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Deng Ziming
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating 
> dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error 
> class name rather than `_LEGACY_ERROR_TEMP_xxx`.
>  
> *NOTE:* Please reply to this ticket before start working on it, to avoid 
> working on same ticket at a time



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45484) Fix the bug that uses incorrect parquet compression codec lz4raw

2023-10-10 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-45484:
--

 Summary: Fix the bug that uses incorrect parquet compression codec 
lz4raw
 Key: SPARK-45484
 URL: https://issues.apache.org/jira/browse/SPARK-45484
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 4.0.0
Reporter: Jiaan Geng
Assignee: Jiaan Geng


lz4raw is not a correct parquet compression codec name.
We should use lz4_raw as its name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32494) Null Aware Anti Join Optimize Support Multi-Column

2023-10-10 Thread Runkang He (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773659#comment-17773659
 ] 

Runkang He commented on SPARK-32494:


Hi [~leanken], is this feature available in Spark's master branch? I don't find 
it in master now.

> Null Aware Anti Join Optimize Support Multi-Column
> --
>
> Key: SPARK-32494
> URL: https://issues.apache.org/jira/browse/SPARK-32494
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Leanken.Lin
>Priority: Major
>
> In Issue SPARK-32290, we managed to optimize BroadcastNestedLoopJoin into 
> BroadcastHashJoin within the Single-Column NAAJ scenario, by using hash 
> lookup instead of loop join. 
> It's simple to just fulfill a "NOT IN" logical when it's a single key, but 
> multi-column not in is much more complicated with all these null aware 
> comparison.
> FYI, code logical for single and multi column is defined at
> ~/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/not-in-unit-tests-single-column.sql
> ~/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/not-in-unit-tests-multi-column.sql
>  
> Hence, proposed with a New type HashedRelation, NullAwareHashedRelation. 
> For NullAwareHashedRelation
>  # it will not skip anyNullColumn key like LongHashedRelation and 
> UnsafeHashedRelation do.
>  # while building NullAwareHashedRelation, will put extra keys into the 
> relation, just to make null aware columns comparison in hash lookup style.
> the duplication would be 2^numKeys - 1 times, for example, if we are to 
> support NAAJ with 3 column join key, the buildSide would be expanded into 
> (2^3 - 1) times, 7X.
> For example, if there is a UnsafeRow key (1,2,3)
> In NullAware Mode, it should be expanded into 7 keys with extra C(3,1), 
> C(3,2) combinations, within the combinations, we duplicated these record with 
> null padding as following.
> Original record
> (1,2,3)
> Extra record to be appended into NullAwareHashedRelation
> (null, 2, 3) (1, null, 3) (1, 2, null)
>  (null, null, 3) (null, 2, null) (1, null, null))
> with the expanded data we can extract a common pattern for both single and 
> multi column. allNull refer to a unsafeRow which has all null columns.
>  * buildSide is empty input => return all rows
>  * allNullColumnKey Exists In buildSide input => reject all rows
>  * if streamedSideRow.allNull is true => drop the row
>  * if streamedSideRow.allNull is false & findMatch in NullAwareHashedRelation 
> => drop the row
>  * if streamedSideRow.allNull is false & notFindMatch in 
> NullAwareHashedRelation => return the row
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45479.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43306
[https://github.com/apache/spark/pull/43306]

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45479:


Assignee: Hyukjin Kwon

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45204) Allow CommandPlugins to be trackable

2023-10-10 Thread Robert Dillitz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Dillitz updated SPARK-45204:
---
Summary: Allow CommandPlugins to be trackable  (was: Extend CommandPlugins 
to be trackable)

> Allow CommandPlugins to be trackable
> 
>
> Key: SPARK-45204
> URL: https://issues.apache.org/jira/browse/SPARK-45204
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Robert Dillitz
>Priority: Major
>  Labels: connect, scala
>
> There is currently no way to track the QueryStatementType & compilation time 
> for queries executed by a CommandPlugin. I propose to create an extended 
> CommandPlugin interface that also offers a process() method that accepts a 
> QueryPlanningTracker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45483) Correct the function groups in connect.functions

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45483:
---
Labels: pull-request-available  (was: )

> Correct the function groups in connect.functions
> 
>
> Key: SPARK-45483
> URL: https://issues.apache.org/jira/browse/SPARK-45483
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45483) Correct the function groups in connect.functions

2023-10-10 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45483:
-

 Summary: Correct the function groups in connect.functions
 Key: SPARK-45483
 URL: https://issues.apache.org/jira/browse/SPARK-45483
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773616#comment-17773616
 ] 

Yang Jie edited comment on SPARK-45482 at 10/10/23 9:38 AM:


For case 1, the business can be changed to 
{code:java}
private[serializer] var enableDebugging: Boolean = {
  !JBoolean.getBoolean("sun.io.serialization.extendedDebugInfo")
} {code}
For case 3, this is a useless method in Spark, perhaps it can be directly 
deleted?
 
But for case 2, can we directly clean up the usage of AccessControlContext and 
AccessController?


was (Author: luciferyang):
For case 1, the business can be changed to 
{code:java}
private[serializer] var enableDebugging: Boolean = {
  !JBoolean.getBoolean("sun.io.serialization.extendedDebugInfo")
} {code}
For case 3, this is a useless method in Spark, perhaps it can be directly 
deleted?
 
But for case 2, can we directly clean up the usage of AccessControlContext and 
AccessController?
 
 
 
 
 
 
 

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerber

[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773617#comment-17773617
 ] 

Yang Jie commented on SPARK-45482:
--

friendly ping [~dongjoon] Do you have any suggestions on this? Or should we 
keep these usages for now?
 
 
 
 
 

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>       UserGroupInformation clientUGI =
>           
> ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
>       return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
> serverHttpUrl));
>     }
>   } {code}
>  
>  



--
This message w

[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773616#comment-17773616
 ] 

Yang Jie commented on SPARK-45482:
--

For case 1, the business can be changed to 
{code:java}
private[serializer] var enableDebugging: Boolean = {
  !JBoolean.getBoolean("sun.io.serialization.extendedDebugInfo")
} {code}
For case 3, this is a useless method in Spark, perhaps it can be directly 
deleted?
 
But for case 2, can we directly clean up the usage of AccessControlContext and 
AccessController?
 
 
 
 
 
 
 

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>

[jira] [Updated] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.

2023-10-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45482:
-
Summary: Handle the usage of AccessControlContext and AccessController.  
(was: Clean up the usage of `AccessControlContext` and `AccessController`)

> Handle the usage of AccessControlContext and AccessController.
> --
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessController {
> * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Security Manager or this class.
>  */
> @Deprecated(since="17", forRemoval=true)
> public final class AccessControlContext { {code}
>  
>  
> `AccessControlContext` and `AccessController` are marked as deprecated in 
> Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that 
> they do not have corresponding replacements.
>  
> In Spark, there are three files that use AccessControlContext or 
> AccessController:
> 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
> {code:java}
> private[serializer] var enableDebugging: Boolean = {
>   !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
> "sun.io.serialization.extendedDebugInfo")).booleanValue()
> } {code}
>  
> 2. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]
>  
> {code:java}
> public void open() throws TTransportException {
>     try {
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
>         try {
>           wrapped.open();
>         } catch (TTransportException tte) {
>           // Wrap the transport exception in an RTE, since Subject.doAs() 
> then goes
>           // and unwraps this for us out of the doAs block. We then unwrap one
>           // more time in our catch clause to get back the TTE. (ugh)
>           throw new RuntimeException(tte);
>         }
>         return null;
>       });
>     } catch (PrivilegedActionException ioe) {
>       throw new RuntimeException("Received an ioe we never threw!", ioe);
>     } catch (RuntimeException rte) {
>       if (rte.getCause() instanceof TTransportException) {
>         throw (TTransportException) rte.getCause();
>       } else {
>         throw rte;
>       }
>     }
>   } {code}
>  
> 3. 
> [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
>  
> {code:java}
>   public static String getKerberosServiceTicket(String principal, String host,
>       String serverHttpUrl, boolean assumeSubject) throws Exception {
>     String serverPrincipal =
>         ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
> host);
>     if (assumeSubject) {
>       // With this option, we're assuming that the external application,
>       // using the JDBC driver has done a JAAS kerberos login already
>       AccessControlContext context = AccessController.getContext();
>       Subject subject = Subject.getSubject(context);
>       if (subject == null) {
>         throw new Exception("The Subject is not set");
>       }
>       return Subject.doAs(subject, new 
> HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
>     } else {
>       // JAAS login from ticket cache to setup the client UserGroupInformation
>       UserGroupInformation clientUGI =
>           
> ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
>       return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
> serverHttpUrl));
>     }
>   } {code}
>  
>  



--
This message was sent by Atlassia

[jira] [Updated] (SPARK-45482) Clean up the usage of `AccessControlContext` and `AccessController`

2023-10-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45482:
-
Description: 
 

 
{code:java}
 * @deprecated This class is only useful in conjunction with
 *   {@linkplain SecurityManager the Security Manager}, which is deprecated
 *   and subject to removal in a future release. Consequently, this class
 *   is also deprecated and subject to removal. There is no replacement for
 *   the Security Manager or this class.
 */

@Deprecated(since="17", forRemoval=true)
public final class AccessController {


* @deprecated This class is only useful in conjunction with
 *   {@linkplain SecurityManager the Security Manager}, which is deprecated
 *   and subject to removal in a future release. Consequently, this class
 *   is also deprecated and subject to removal. There is no replacement for
 *   the Security Manager or this class.
 */

@Deprecated(since="17", forRemoval=true)
public final class AccessControlContext { {code}
 

 

`AccessControlContext` and `AccessController` are marked as deprecated in Java 
17, with `forRemoval` set to true. From the Javadoc, it can be seen that they 
do not have corresponding replacements.

 
In Spark, there are three files that use AccessControlContext or 
AccessController:

1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]
{code:java}
private[serializer] var enableDebugging: Boolean = {
  !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
"sun.io.serialization.extendedDebugInfo")).booleanValue()
} {code}
 

2. 
[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]

 
{code:java}
public void open() throws TTransportException {
    try {
      AccessControlContext context = AccessController.getContext();
      Subject subject = Subject.getSubject(context);
      Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
        try {
          wrapped.open();
        } catch (TTransportException tte) {
          // Wrap the transport exception in an RTE, since Subject.doAs() then 
goes
          // and unwraps this for us out of the doAs block. We then unwrap one
          // more time in our catch clause to get back the TTE. (ugh)
          throw new RuntimeException(tte);
        }
        return null;
      });
    } catch (PrivilegedActionException ioe) {
      throw new RuntimeException("Received an ioe we never threw!", ioe);
    } catch (RuntimeException rte) {
      if (rte.getCause() instanceof TTransportException) {
        throw (TTransportException) rte.getCause();
      } else {
        throw rte;
      }
    }
  } {code}
 

3. 
[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
 
{code:java}
  public static String getKerberosServiceTicket(String principal, String host,
      String serverHttpUrl, boolean assumeSubject) throws Exception {
    String serverPrincipal =
        ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
host);
    if (assumeSubject) {
      // With this option, we're assuming that the external application,
      // using the JDBC driver has done a JAAS kerberos login already
      AccessControlContext context = AccessController.getContext();
      Subject subject = Subject.getSubject(context);
      if (subject == null) {
        throw new Exception("The Subject is not set");
      }
      return Subject.doAs(subject, new 
HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
    } else {
      // JAAS login from ticket cache to setup the client UserGroupInformation
      UserGroupInformation clientUGI =
          
ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
      return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
serverHttpUrl));
    }
  } {code}
 

 

  was:
 

 
{code:java}
 * @deprecated This class is only useful in conjunction with
 *   {@linkplain SecurityManager the Security Manager}, which is deprecated
 *   and subject to removal in a future release. Consequently, this class
 *   is also deprecated and subject to removal. There is no replacement for
 *   the Security Manager or this class.
 */

@Deprecated(since="17", forRemoval=true)
public final class AccessController {


* @deprecated This class is only useful in conjunction with
 *   {@linkplain SecurityManager the Security Manager}, which is deprecated
 *   and subject to removal in a future release. Consequently, this class
 *   is also deprecated and subject to removal. There is no replacement for
 *   the Security Manager or this class

[jira] [Updated] (SPARK-45482) Clean up the usage of `AccessControlContext` and `AccessController`

2023-10-10 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45482:
-
Description: 
 

 
{code:java}
 * @deprecated This class is only useful in conjunction with
 *   {@linkplain SecurityManager the Security Manager}, which is deprecated
 *   and subject to removal in a future release. Consequently, this class
 *   is also deprecated and subject to removal. There is no replacement for
 *   the Security Manager or this class.
 */

@Deprecated(since="17", forRemoval=true)
public final class AccessController {


* @deprecated This class is only useful in conjunction with
 *   {@linkplain SecurityManager the Security Manager}, which is deprecated
 *   and subject to removal in a future release. Consequently, this class
 *   is also deprecated and subject to removal. There is no replacement for
 *   the Security Manager or this class.
 */

@Deprecated(since="17", forRemoval=true)
public final class AccessControlContext { {code}
 

 

`AccessControlContext` and `AccessController` are marked as deprecated in Java 
17, with `forRemoval` set to true. From the Javadoc, it can be seen that they 
do not have corresponding replacements.

 
In Spark, there are three files that use AccessControlContext or 
AccessController:
 # 
[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73]

{code:java}
private[serializer] var enableDebugging: Boolean = {
  !AccessController.doPrivileged(new sun.security.action.GetBooleanAction(
"sun.io.serialization.extendedDebugInfo")).booleanValue()
} {code}
 
 # 
[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45]

 
{code:java}
public void open() throws TTransportException {
    try {
      AccessControlContext context = AccessController.getContext();
      Subject subject = Subject.getSubject(context);
      Subject.doAs(subject, (PrivilegedExceptionAction) () -> {
        try {
          wrapped.open();
        } catch (TTransportException tte) {
          // Wrap the transport exception in an RTE, since Subject.doAs() then 
goes
          // and unwraps this for us out of the doAs block. We then unwrap one
          // more time in our catch clause to get back the TTE. (ugh)
          throw new RuntimeException(tte);
        }
        return null;
      });
    } catch (PrivilegedActionException ioe) {
      throw new RuntimeException("Received an ioe we never threw!", ioe);
    } catch (RuntimeException rte) {
      if (rte.getCause() instanceof TTransportException) {
        throw (TTransportException) rte.getCause();
      } else {
        throw rte;
      }
    }
  } {code}
 
 # 
[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73]
 
{code:java}
  public static String getKerberosServiceTicket(String principal, String host,
      String serverHttpUrl, boolean assumeSubject) throws Exception {
    String serverPrincipal =
        ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, 
host);
    if (assumeSubject) {
      // With this option, we're assuming that the external application,
      // using the JDBC driver has done a JAAS kerberos login already
      AccessControlContext context = AccessController.getContext();
      Subject subject = Subject.getSubject(context);
      if (subject == null) {
        throw new Exception("The Subject is not set");
      }
      return Subject.doAs(subject, new 
HttpKerberosClientAction(serverPrincipal, serverHttpUrl));
    } else {
      // JAAS login from ticket cache to setup the client UserGroupInformation
      UserGroupInformation clientUGI =
          
ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos");
      return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, 
serverHttpUrl));
    }
  } {code}
 

 

> Clean up the usage of `AccessControlContext` and `AccessController`
> ---
>
> Key: SPARK-45482
> URL: https://issues.apache.org/jira/browse/SPARK-45482
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
>  
> {code:java}
>  * @deprecated This class is only useful in conjunction with
>  *   {@linkplain SecurityManager the Security Manager}, which is 
> deprecated
>  *   and subject to removal in a future release. Consequently, this class
>  *   is also deprecated and subject to removal. There is no replacement 
> for
>  *   the Sec

[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45481:
--

Assignee: Apache Spark  (was: Jiaan Geng)

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported most of all parquet compression codecs, the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45479:
--

Assignee: (was: Apache Spark)

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45479:
--

Assignee: Apache Spark

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45481:
---
Labels: pull-request-available  (was: )

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark supported most of all parquet compression codecs, the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45479:
--

Assignee: (was: Apache Spark)

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45479:
--

Assignee: Apache Spark

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45482) Clean up the usage of `AccessControlContext` and `AccessController`

2023-10-10 Thread Yang Jie (Jira)

Yang Jie created SPARK-45482:


 Summary: Clean up the usage of `AccessControlContext` and 
`AccessController`
 Key: SPARK-45482
 URL: https://issues.apache.org/jira/browse/SPARK-45482
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45479:
--

Assignee: (was: Apache Spark)

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45479:
--

Assignee: Apache Spark

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43664:
--

Assignee: Apache Spark

> Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
> ---
>
> Key: SPARK-43664
> URL: https://issues.apache.org/jira/browse/SPARK-43664
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Repro: run `SQLParityTests.test_sql_with_index_col`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43664:
--

Assignee: (was: Apache Spark)

> Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
> ---
>
> Key: SPARK-43664
> URL: https://issues.apache.org/jira/browse/SPARK-43664
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Repro: run `SQLParityTests.test_sql_with_index_col`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45480:
--

Assignee: (was: Apache Spark)

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43664:
--

Assignee: Apache Spark

> Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
> ---
>
> Key: SPARK-43664
> URL: https://issues.apache.org/jira/browse/SPARK-43664
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Repro: run `SQLParityTests.test_sql_with_index_col`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43664:
--

Assignee: (was: Apache Spark)

> Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
> ---
>
> Key: SPARK-43664
> URL: https://issues.apache.org/jira/browse/SPARK-43664
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Repro: run `SQLParityTests.test_sql_with_index_col`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45480:
--

Assignee: Apache Spark

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45480:
--

Assignee: Apache Spark

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-10 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng reassigned SPARK-45481:
--

Assignee: Jiaan Geng

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Assignee: Jiaan Geng
>Priority: Major
>
> Currently, Spark supported most of all parquet compression codecs, the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45480:
--

Assignee: (was: Apache Spark)

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-10 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-45481:
---
Description: 
Currently, Spark supported most of all parquet compression codecs, the parquet 
supported compression codecs and spark supported are not completely one-on-one.

There are a lot of magic strings copy from parquet compression codecs. This 
issue lead to developers need to manually maintain its consistency. It is easy 
to make mistakes and reduce development efficiency.

  was:Currently, StorageLevel provides fromString to get the StorageLevel's 
instance with StorageLevel's name. So developers or users have to copy the 
string literal about StorageLevel's name to set or get instance of 
StorageLevel. This issue lead to developers need to manually maintain its 
consistency. It is easy to make mistakes and reduce development efficiency.


> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Priority: Major
>
> Currently, Spark supported most of all parquet compression codecs, the 
> parquet supported compression codecs and spark supported are not completely 
> one-on-one.
> There are a lot of magic strings copy from parquet compression codecs. This 
> issue lead to developers need to manually maintain its consistency. It is 
> easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45480:
--

Assignee: (was: Apache Spark)

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45480:
--

Assignee: Apache Spark

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45356) Optimize the Maven daily test configuration

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45356:
--

Assignee: (was: Apache Spark)

> Optimize the Maven daily test configuration
> ---
>
> Key: SPARK-45356
> URL: https://issues.apache.org/jira/browse/SPARK-45356
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45480:
---
Labels: pull-request-available  (was: )

> Selectable SQL Plan
> ---
>
> Key: SPARK-45480
> URL: https://issues.apache.org/jira/browse/SPARK-45480
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-10 Thread Jiaan Geng (Jira)

Jiaan Geng created SPARK-45481:
--

 Summary: Introduce a mapper for parquet compression codecs
 Key: SPARK-45481
 URL: https://issues.apache.org/jira/browse/SPARK-45481
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Jiaan Geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs

2023-10-10 Thread Jiaan Geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng updated SPARK-45481:
---
Description: Currently, StorageLevel provides fromString to get the 
StorageLevel's instance with StorageLevel's name. So developers or users have 
to copy the string literal about StorageLevel's name to set or get instance of 
StorageLevel. This issue lead to developers need to manually maintain its 
consistency. It is easy to make mistakes and reduce development efficiency.

> Introduce a mapper for parquet compression codecs
> -
>
> Key: SPARK-45481
> URL: https://issues.apache.org/jira/browse/SPARK-45481
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jiaan Geng
>Priority: Major
>
> Currently, StorageLevel provides fromString to get the StorageLevel's 
> instance with StorageLevel's name. So developers or users have to copy the 
> string literal about StorageLevel's name to set or get instance of 
> StorageLevel. This issue lead to developers need to manually maintain its 
> consistency. It is easy to make mistakes and reduce development efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45480) Selectable SQL Plan

2023-10-10 Thread Kent Yao (Jira)

Kent Yao created SPARK-45480:


 Summary: Selectable SQL Plan
 Key: SPARK-45480
 URL: https://issues.apache.org/jira/browse/SPARK-45480
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45479:
---
Labels: pull-request-available  (was: )

> Recover Python build in branch-3.3, 3.4 and 3.5
> ---
>
> Key: SPARK-45479
> URL: https://issues.apache.org/jira/browse/SPARK-45479
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/workflows/build_branch33.yml
> https://github.com/apache/spark/actions/workflows/build_branch34.yml
> https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5

2023-10-10 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45479:


 Summary: Recover Python build in branch-3.3, 3.4 and 3.5
 Key: SPARK-45479
 URL: https://issues.apache.org/jira/browse/SPARK-45479
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/actions/workflows/build_branch33.yml
https://github.com/apache/spark/actions/workflows/build_branch34.yml
https://github.com/apache/spark/actions/workflows/build_branch35.yml



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45475) Should use DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils

2023-10-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45475.
--
Fix Version/s: 3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 43304
[https://github.com/apache/spark/pull/43304]

> Should use DataFrame.foreachPartition instead of RDD.foreachPartition in 
> JdbcUtils
> --
>
> Key: SPARK-45475
> URL: https://issues.apache.org/jira/browse/SPARK-45475
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.1, 4.0.0
>
>
> See https://github.com/apache/spark/pull/39976#issuecomment-1752930380



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45475) Should use DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils

2023-10-10 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45475:


Assignee: Hyukjin Kwon

> Should use DataFrame.foreachPartition instead of RDD.foreachPartition in 
> JdbcUtils
> --
>
> Key: SPARK-45475
> URL: https://issues.apache.org/jira/browse/SPARK-45475
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> See https://github.com/apache/spark/pull/39976#issuecomment-1752930380



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated SPARK-45478:

Description: 
*The SQL to reproduce the result*
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 

  was:
*The SQL to reproduce* 
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 


> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce the result*
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predic

[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773579#comment-17773579
 ] 

Zhizhen Hou commented on SPARK-45478:
-

The new compiled spark 4.0.0-SNAPSHOT from master branch also produce the same 
result.

> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce* 
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec;
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.children.head :: Nil
>   case c: Coalesce => c.children.head :: Nil
>   case other => other.children
> } {code}
> I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
> i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.
>  
> But the following comment in `childrenToRecurse`  makes me not sure it will 
> cause any other problems.
> {code:java}
> // 2. If: common subexpressions will always be evaluated at the beginning, 
> but the true and
> // false expressions in `If` may not get accessed, according to the predicate
> // expression. We should only recurse into the predicate expression. {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhizhen Hou updated SPARK-45478:

Description: 
*The SQL to reproduce* 
{code:java}
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec; {code}
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 

  was:
*The SQL to reproduce* 
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec;
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 


> codegen sum(decimal_column / 2) computes div twice
> --
>
> Key: SPARK-45478
> URL: https://issues.apache.org/jira/browse/SPARK-45478
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Zhizhen Hou
>Priority: Minor
>
> *The SQL to reproduce* 
> {code:java}
> create table t_dec (c1 decimal(6,2));
> insert into t_dec values(1.0),(2.0),(null),(3.0);
> explain codegen select sum(c1/2) from t_dec; {code}
>  
> *Reasons may cause the result:* 
>  
> Function sum use If expression in updateExpressions:
>  `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
>  
> The three variables in if expression like this.
> {code:java}
> predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
> true]falseValue: (input[0, decimal(26,6), true] + 
> cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
> true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
> In sub expression elimination, only predicate is evaluated in 
> EquivalentExpressions# childrenToRecurse
> {code:java}
> private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match 
> {
>   case _: CodegenFallback => Nil
>   case i: If => i.predicate :: Nil
>   case c: CaseWhen => c.ch

[jira] [Created] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice

2023-10-10 Thread Zhizhen Hou (Jira)

Zhizhen Hou created SPARK-45478:
---

 Summary: codegen sum(decimal_column / 2) computes div twice
 Key: SPARK-45478
 URL: https://issues.apache.org/jira/browse/SPARK-45478
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Zhizhen Hou


*The SQL to reproduce* 
create table t_dec (c1 decimal(6,2));
insert into t_dec values(1.0),(2.0),(null),(3.0);
explain codegen select sum(c1/2) from t_dec;
 
*Reasons may cause the result:* 
 
Function sum use If expression in updateExpressions:
 `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))`
 
The three variables in if expression like this.
{code:java}
predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), 
true]falseValue: (input[0, decimal(26,6), true] + 
cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), 
true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code}
In sub expression elimination, only predicate is evaluated in 
EquivalentExpressions# childrenToRecurse
{code:java}
private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match {
  case _: CodegenFallback => Nil
  case i: If => i.predicate :: Nil
  case c: CaseWhen => c.children.head :: Nil
  case c: Coalesce => c.children.head :: Nil
  case other => other.children
} {code}
I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => 
i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result.

 

But the following comment in `childrenToRecurse`  makes me not sure it will 
cause any other problems.
{code:java}
// 2. If: common subexpressions will always be evaluated at the beginning, but 
the true and
// false expressions in `If` may not get accessed, according to the predicate
// expression. We should only recurse into the predicate expression. {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

97 matches

Mail list logo