[jira] [Created] (SPARK-45500) Show the number of abnormally completed drivers in MasterPage
Dongjoon Hyun created SPARK-45500: - Summary: Show the number of abnormally completed drivers in MasterPage Key: SPARK-45500 URL: https://issues.apache.org/jira/browse/SPARK-45500 Project: Spark Issue Type: Improvement Components: Spark Core, Web UI Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773921#comment-17773921 ] Zhizhen Hou commented on SPARK-45478: - There are three children in If: predicate, trueValue and falseValue. There are two execution paths. 1: predicate and trueValue. 2: predicate and falseValue. There are three conbinations of possible common subexpression. 1: predicate and trueValue. 2: predicate and falseValue. 3: trueValue and falseValue. So if all possible common subexpression be eliminated, there is 2 of 3 possibility to improve the performance. For example, if there is common subexpression in predicate and falseValue, and common subexpression is executed only once, and it can improve the performance. Only there is common subexpression in trueValue and falseValue will not improve the performance and it will not draw back the performance, since whether trueValue and falseValue will be executed. So, it looks good to check all three children in If. Any suggestions? > codegen sum(decimal_column / 2) computes div twice > -- > > Key: SPARK-45478 > URL: https://issues.apache.org/jira/browse/SPARK-45478 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhizhen Hou >Priority: Minor > > *The SQL to reproduce the result* > {code:java} > create table t_dec (c1 decimal(6,2)); > insert into t_dec values(1.0),(2.0),(null),(3.0); > explain codegen select sum(c1/2) from t_dec; {code} > > *Reasons may cause the result:* > > Function sum use If expression in updateExpressions: > `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` > > The three variables in if expression like this. > {code:java} > predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), > true]falseValue: (input[0, decimal(26,6), true] + > cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} > In sub expression elimination, only predicate is evaluated in > EquivalentExpressions# childrenToRecurse > {code:java} > private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match > { > case _: CodegenFallback => Nil > case i: If => i.predicate :: Nil > case c: CaseWhen => c.children.head :: Nil > case c: Coalesce => c.children.head :: Nil > case other => other.children > } {code} > I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => > i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. > > But the following comment in `childrenToRecurse` makes me not sure it will > cause any other problems. > {code:java} > // 2. If: common subexpressions will always be evaluated at the beginning, > but the true and > // false expressions in `If` may not get accessed, according to the predicate > // expression. We should only recurse into the predicate expression. {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages
[ https://issues.apache.org/jira/browse/SPARK-45498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45498: --- Labels: pull-request-available (was: ) > Followup: Ignore task completion from old stage after retrying indeterminate > stages > --- > > Key: SPARK-45498 > URL: https://issues.apache.org/jira/browse/SPARK-45498 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 4.0.0, 3.5.1 >Reporter: Mayur Bhosale >Priority: Minor > Labels: pull-request-available > > With SPARK-45182, we added a fix for not letting laggard tasks of the older > attempts of the indeterminate stage from marking the partition has completed > in map output tracker. > When a task completes, DAG scheduler also notifies all the tasksets of the > stage about that partition being completed. Tasksets would not schedule such > task if they are not already scheduled. This is not correct for indeterminate > stage, since we want to re-run all the tasks on re-attempt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-14745) CEP support in Spark Streaming
[ https://issues.apache.org/jira/browse/SPARK-14745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-14745: --- Labels: pull-request-available (was: ) > CEP support in Spark Streaming > -- > > Key: SPARK-14745 > URL: https://issues.apache.org/jira/browse/SPARK-14745 > Project: Spark > Issue Type: New Feature > Components: DStreams >Reporter: Mario Briggs >Priority: Major > Labels: pull-request-available > Attachments: SparkStreamingCEP.pdf > > > Complex Event Processing is a often used feature in Streaming applications. > Spark Streaming current does not have a DSL/API for it. This JIRA is about > how/what can we add in Spark Streaming to support CEP out of the box -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`
[ https://issues.apache.org/jira/browse/SPARK-45499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45499: --- Labels: pull-request-available (was: ) > Replace `Reference#isEnqueued` with `Reference#refersTo` > > > Key: SPARK-45499 > URL: https://issues.apache.org/jira/browse/SPARK-45499 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45498) Followup: Ignore task completion from old stage after retrying indeterminate stages
Mayur Bhosale created SPARK-45498: - Summary: Followup: Ignore task completion from old stage after retrying indeterminate stages Key: SPARK-45498 URL: https://issues.apache.org/jira/browse/SPARK-45498 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 4.0.0, 3.5.1 Reporter: Mayur Bhosale With SPARK-45182, we added a fix for not letting laggard tasks of the older attempts of the indeterminate stage from marking the partition has completed in map output tracker. When a task completes, DAG scheduler also notifies all the tasksets of the stage about that partition being completed. Tasksets would not schedule such task if they are not already scheduled. This is not correct for indeterminate stage, since we want to re-run all the tasks on re-attempt -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45499) Replace `Reference#isEnqueued` with `Reference#refersTo`
Yang Jie created SPARK-45499: Summary: Replace `Reference#isEnqueued` with `Reference#refersTo` Key: SPARK-45499 URL: https://issues.apache.org/jira/browse/SPARK-45499 Project: Spark Issue Type: Sub-task Components: Spark Core, Tests Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773897#comment-17773897 ] Yang Jie commented on SPARK-45482: -- OK, Let me close this one > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation > UserGroupInformation clientUGI = > > ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); > return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, > serverHttpUrl)); > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) ---
[jira] [Resolved] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45482. -- Resolution: Won't Fix > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation > UserGroupInformation clientUGI = > > ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); > return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, > serverHttpUrl)); > } > } {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issue
[jira] [Updated] (SPARK-45497) Add a symbolic link file `spark-examples.jar` in K8s Docker images
[ https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-45497: -- Summary: Add a symbolic link file `spark-examples.jar` in K8s Docker images (was: Add an symbolic link file `spark-examples.jar` in K8s Docker images) > Add a symbolic link file `spark-examples.jar` in K8s Docker images > -- > > Key: SPARK-45497 > URL: https://issues.apache.org/jira/browse/SPARK-45497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45497) Add an symbolic link file `spark-examples.jar` in K8s Docker images
Dongjoon Hyun created SPARK-45497: - Summary: Add an symbolic link file `spark-examples.jar` in K8s Docker images Key: SPARK-45497 URL: https://issues.apache.org/jira/browse/SPARK-45497 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45497) Add an symbolic link file `spark-examples.jar` in K8s Docker images
[ https://issues.apache.org/jira/browse/SPARK-45497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45497: --- Labels: pull-request-available (was: ) > Add an symbolic link file `spark-examples.jar` in K8s Docker images > --- > > Key: SPARK-45497 > URL: https://issues.apache.org/jira/browse/SPARK-45497 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42881) Codegen Support for get_json_object
[ https://issues.apache.org/jira/browse/SPARK-42881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42881: --- Labels: pull-request-available (was: ) > Codegen Support for get_json_object > --- > > Key: SPARK-42881 > URL: https://issues.apache.org/jira/browse/SPARK-42881 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-41730) `min` fails on the minimal timestamp
[ https://issues.apache.org/jira/browse/SPARK-41730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-41730: --- Labels: pull-request-available (was: ) > `min` fails on the minimal timestamp > > > Key: SPARK-41730 > URL: https://issues.apache.org/jira/browse/SPARK-41730 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Major > Labels: pull-request-available > > The code below demonstrates the issue: > {code:python} > >>> from datetime import datetime, timezone > >>> from pyspark.sql.types import TimestampType > >>> from pyspark.sql import functions as F > >>> ts = spark.createDataFrame([datetime(1, 1, 1, 0, 0, 0, 0, > >>> tzinfo=timezone.utc)], TimestampType()).toDF("test_column") > >>> ts.select(F.min('test_column')).first()[0] > Traceback (most recent call last): > File "", line 1, in > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", > line 2762, in first > return self.head() > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", > line 2738, in head > rs = self.head(1) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", > line 2740, in head > return self.take(n) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", > line 1297, in take > return self.limit(num).collect() > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/dataframe.py", > line 1198, in collect > return list(_load_from_socket(sock_info, > BatchedSerializer(CPickleSerializer( > File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", > line 152, in load_stream > yield self._read_with_length(stream) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", > line 174, in _read_with_length > return self.loads(obj) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/serializers.py", > line 472, in loads > return cloudpickle.loads(obj, encoding=encoding) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line > 2010, in > return lambda *a: dataType.fromInternal(a) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line > 1018, in fromInternal > values = [ > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line > 1019, in > f.fromInternal(v) if c else v > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line > 667, in fromInternal > return self.dataType.fromInternal(obj) > File "/Users/maximgekk/proj/apache-spark/python/pyspark/sql/types.py", line > 279, in fromInternal > return datetime.datetime.fromtimestamp(ts // > 100).replace(microsecond=ts % 100) > ValueError: year 0 is out of range > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773892#comment-17773892 ] Dongjoon Hyun commented on SPARK-45482: --- Actually, I'm not sure about those three cases. Why don't we keep them for now because Java 21 keeps them still, [~LuciferYang] ? > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation > UserGroupInformation clientUGI = > > ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); > return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, > serverHttpUrl)); > } > } {code} >
[jira] [Comment Edited] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773892#comment-17773892 ] Dongjoon Hyun edited comment on SPARK-45482 at 10/11/23 4:15 AM: - Actually, I'm not sure about those three cases. Yes, let's keep them for now because Java 21 keeps them still, [~LuciferYang] . was (Author: dongjoon): Actually, I'm not sure about those three cases. Why don't we keep them for now because Java 21 keeps them still, [~LuciferYang] ? > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation > UserGroupInformation clientUGI = >
[jira] [Updated] (SPARK-45468) More transparent proxy handling for HTTP redirects
[ https://issues.apache.org/jira/browse/SPARK-45468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nobuaki Sukegawa updated SPARK-45468: - Description: Currently, proxies can be made transparent for hyperlinks in Spark web UIs with spark.ui.proxyRoot or X-Forwarded-Context header alone. However, HTTP redirects (such as job/stage kill) currently requires explicit spark.ui.proxyRedirectUri as well for handling proxy. This is not ideal as proxy hostname may not be known at the time configuring Spark apps. This can be mitigated by 1) always prepending spark.ui.proxyRoot to redirect path for those proxies that intelligently rewrite Location headers and 2) by using path without hostname (/jobs/, not [https://example.com/jobs/]) for those proxies without Location header rewrites. Then redirects behavior would be basically the same way as other hyperlinks. h2. Example Let's say proxy URL is [https://example.org/sparkui/]... forwarding to [http://drv.svc/]... and spark.ui.proxyRoot is configured to be /sparkui h3. Existing behavior (without spark.ui.proxyRedirectUri) job/stage kill links redirects to [http://drv.svc/jobs/] - likely 404 (other hyperlinks are to paths with prefix, e.g., /sparkui/executors - works fine) h3. After the change 2) links redirects to /sparkui/jobs/ - works fine also consistent with other hyperlinks NOTE: while hostname was originally required in RFC 2616 in 1999, since RFC 7231 in 2014 hostname can be formally omitted as most browsers already supported it (it is rather hard to find any browser that doesn't support it). was: Currently, proxies can be made transparent for hyperlinks in Spark web UIs with spark.ui.proxyRoot or X-Forwarded-Context header alone. However, HTTP redirects (such as job/stage kill) currently requires explicit spark.ui.proxyRedirectUri as well for handling proxy. This is not ideal as proxy hostname may not be known at the time configuring Spark apps. This can be mitigated by 1) always prepending spark.ui.proxyRoot to redirect path for those proxies that intelligently rewrite Location headers and 2) by using path without hostname (/jobs/, not https://example.com/jobs/) for those proxies without Location header rewrites. Then redirects behavior would be basically the same way as other hyperlinks. Regarding 2), while hostname was originally required in RFC 2616 in 1999, since RFC 7231 in 2014 hostname can be formally omitted as most browsers already supported it (it is rather hard to find any browser that doesn't support it). > More transparent proxy handling for HTTP redirects > -- > > Key: SPARK-45468 > URL: https://issues.apache.org/jira/browse/SPARK-45468 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.5.0 >Reporter: Nobuaki Sukegawa >Priority: Major > Labels: pull-request-available > > Currently, proxies can be made transparent for hyperlinks in Spark web UIs > with spark.ui.proxyRoot or X-Forwarded-Context header alone. However, HTTP > redirects (such as job/stage kill) currently requires explicit > spark.ui.proxyRedirectUri as well for handling proxy. This is not ideal as > proxy hostname may not be known at the time configuring Spark apps. > This can be mitigated by 1) always prepending spark.ui.proxyRoot to redirect > path for those proxies that intelligently rewrite Location headers and 2) by > using path without hostname (/jobs/, not [https://example.com/jobs/]) for > those proxies without Location header rewrites. Then redirects behavior would > be basically the same way as other hyperlinks. > h2. Example > Let's say proxy URL is [https://example.org/sparkui/]... forwarding to > [http://drv.svc/]... > and spark.ui.proxyRoot is configured to be /sparkui > h3. Existing behavior (without spark.ui.proxyRedirectUri) > job/stage kill links redirects to [http://drv.svc/jobs/] - likely 404 > (other hyperlinks are to paths with prefix, e.g., /sparkui/executors - works > fine) > h3. After the change 2) > links redirects to /sparkui/jobs/ - works fine > also consistent with other hyperlinks > NOTE: while hostname was originally required in RFC 2616 in 1999, since RFC > 7231 in 2014 hostname can be formally omitted as most browsers already > supported it (it is rather hard to find any browser that doesn't support it). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils
[ https://issues.apache.org/jira/browse/SPARK-45494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45494: - Assignee: Takuya Ueshin > Introduce read/write a byte array util functions for PythonWorkerUtils > -- > > Key: SPARK-45494 > URL: https://issues.apache.org/jira/browse/SPARK-45494 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils
[ https://issues.apache.org/jira/browse/SPARK-45494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45494. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43321 [https://github.com/apache/spark/pull/43321] > Introduce read/write a byte array util functions for PythonWorkerUtils > -- > > Key: SPARK-45494 > URL: https://issues.apache.org/jira/browse/SPARK-45494 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45310) Mapstatus location type changed from external shuffle service to executor after decommission migration
[ https://issues.apache.org/jira/browse/SPARK-45310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45310. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43112 [https://github.com/apache/spark/pull/43112] > Mapstatus location type changed from external shuffle service to executor > after decommission migration > -- > > Key: SPARK-45310 > URL: https://issues.apache.org/jira/browse/SPARK-45310 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > When migrating shuffle blocks during decommission, the updated mapstatus > location doesn't respect the external shuffle service location when external > shuffle service is enabled. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45496) Fix the compilation warning related to other-pure-statement
Yang Jie created SPARK-45496: Summary: Fix the compilation warning related to other-pure-statement Key: SPARK-45496 URL: https://issues.apache.org/jira/browse/SPARK-45496 Project: Spark Issue Type: Sub-task Components: DStreams, Spark Core Affects Versions: 4.0.0 Reporter: Yang Jie {code:java} "-Wconf:cat=other-match-analysis&site=org.apache.spark.sql.catalyst.catalog.SessionCatalog.lookupFunction.catalogFunction:wv", "-Wconf:cat=other-pure-statement&site=org.apache.spark.streaming.util.FileBasedWriteAheadLog.readAll.readFile:wv", "-Wconf:cat=other-pure-statement&site=org.apache.spark.scheduler.OutputCommitCoordinatorSuite:wv", "-Wconf:cat=other-pure-statement&site=org.apache.spark.sql.streaming.sources.StreamingDataSourceV2Suite.testPositiveCase.\\$anonfun:wv", {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order
[ https://issues.apache.org/jira/browse/SPARK-45416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45416. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43219 [https://github.com/apache/spark/pull/43219] > Sanity check that Spark Connect returns arrow batches in order > -- > > Key: SPARK-45416 > URL: https://issues.apache.org/jira/browse/SPARK-45416 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order
[ https://issues.apache.org/jira/browse/SPARK-45416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45416: Assignee: Juliusz Sompolski > Sanity check that Spark Connect returns arrow batches in order > -- > > Key: SPARK-45416 > URL: https://issues.apache.org/jira/browse/SPARK-45416 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45473) Incorrect error message for RoundBase
[ https://issues.apache.org/jira/browse/SPARK-45473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45473: - Assignee: L. C. Hsieh > Incorrect error message for RoundBase > - > > Key: SPARK-45473 > URL: https://issues.apache.org/jira/browse/SPARK-45473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45473) Incorrect error message for RoundBase
[ https://issues.apache.org/jira/browse/SPARK-45473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45473. --- Fix Version/s: 3.5.1 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 43316 [https://github.com/apache/spark/pull/43316] > Incorrect error message for RoundBase > - > > Key: SPARK-45473 > URL: https://issues.apache.org/jira/browse/SPARK-45473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0, 3.4.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45464) [CORE] Fix yarn distribution build
[ https://issues.apache.org/jira/browse/SPARK-45464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-45464. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43289 [https://github.com/apache/spark/pull/43289] > [CORE] Fix yarn distribution build > -- > > Key: SPARK-45464 > URL: https://issues.apache.org/jira/browse/SPARK-45464 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Assignee: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [https://github.com/apache/spark/pull/43164] introduced a regression in: > > ``` > ./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn > ``` > > this needs to be fixed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45464) [CORE] Fix yarn distribution build
[ https://issues.apache.org/jira/browse/SPARK-45464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-45464: Assignee: Hasnain Lakhani > [CORE] Fix yarn distribution build > -- > > Key: SPARK-45464 > URL: https://issues.apache.org/jira/browse/SPARK-45464 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN >Affects Versions: 4.0.0 >Reporter: Hasnain Lakhani >Assignee: Hasnain Lakhani >Priority: Major > Labels: pull-request-available > > [https://github.com/apache/spark/pull/43164] introduced a regression in: > > ``` > ./dev/make-distribution.sh --tgz -Phive -Phive-thriftserver -Pyarn > ``` > > this needs to be fixed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45495) Support stage level task resource profile for k8s cluster when dynamic allocation disabled
Bobby Wang created SPARK-45495: -- Summary: Support stage level task resource profile for k8s cluster when dynamic allocation disabled Key: SPARK-45495 URL: https://issues.apache.org/jira/browse/SPARK-45495 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.4.1 Reporter: Bobby Wang Assignee: Bobby Wang Fix For: 4.0.0, 3.5.1 [https://github.com/apache/spark/pull/37268] has introduced a new feature that supports stage-level schedule task resource profile for standalone cluster when dynamic allocation is disabled. It's really cool feature, especially for ML/DL cases, more details can be found in that PR. The problem here is that the feature is only available for standalone cluster for now, but most users would also expect it can be used for other spark clusters like yarn and k8s. So I file this issue to track this task. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45495) Support stage level task resource profile for k8s cluster when dynamic allocation disabled
[ https://issues.apache.org/jira/browse/SPARK-45495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bobby Wang updated SPARK-45495: --- Description: [https://github.com/apache/spark/pull/37268] has introduced a new feature that supports stage-level schedule task resource profile for standalone cluster when dynamic allocation is disabled. It's really cool feature, especially for ML/DL cases, more details can be found in that PR. The problem here is that the feature is only available for standalone and YARN cluster for now, but most users would also expect it can be used for other spark clusters like K8s. So I filed this issue to track this task. was: [https://github.com/apache/spark/pull/37268] has introduced a new feature that supports stage-level schedule task resource profile for standalone cluster when dynamic allocation is disabled. It's really cool feature, especially for ML/DL cases, more details can be found in that PR. The problem here is that the feature is only available for standalone cluster for now, but most users would also expect it can be used for other spark clusters like yarn and k8s. So I file this issue to track this task. > Support stage level task resource profile for k8s cluster when dynamic > allocation disabled > -- > > Key: SPARK-45495 > URL: https://issues.apache.org/jira/browse/SPARK-45495 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Bobby Wang >Assignee: Bobby Wang >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.1 > > > [https://github.com/apache/spark/pull/37268] has introduced a new feature > that supports stage-level schedule task resource profile for standalone > cluster when dynamic allocation is disabled. It's really cool feature, > especially for ML/DL cases, more details can be found in that PR. > > The problem here is that the feature is only available for standalone and > YARN cluster for now, but most users would also expect it can be used for > other spark clusters like K8s. > > So I filed this issue to track this task. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43828) Add config to control whether close idle connection
[ https://issues.apache.org/jira/browse/SPARK-43828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-43828: --- Labels: pull-request-available (was: ) > Add config to control whether close idle connection > --- > > Key: SPARK-43828 > URL: https://issues.apache.org/jira/browse/SPARK-43828 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Zhongwei Zhu >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils
[ https://issues.apache.org/jira/browse/SPARK-45494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45494: --- Labels: pull-request-available (was: ) > Introduce read/write a byte array util functions for PythonWorkerUtils > -- > > Key: SPARK-45494 > URL: https://issues.apache.org/jira/browse/SPARK-45494 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45494) Introduce read/write a byte array util functions for PythonWorkerUtils
Takuya Ueshin created SPARK-45494: - Summary: Introduce read/write a byte array util functions for PythonWorkerUtils Key: SPARK-45494 URL: https://issues.apache.org/jira/browse/SPARK-45494 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45493) Replace: _LEGACY_ERROR_TEMP_2187 with a better error message
Serge Rielau created SPARK-45493: Summary: Replace: _LEGACY_ERROR_TEMP_2187 with a better error message Key: SPARK-45493 URL: https://issues.apache.org/jira/browse/SPARK-45493 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau {code:java} def convertHiveTableToCatalogTableError( e: SparkException, dbName: String, tableName: String): Throwable = { new SparkException( errorClass = "_LEGACY_ERROR_TEMP_2187", messageParameters = Map( "message" -> e.getMessage, "dbName" -> dbName, "tableName" -> tableName), cause = e) } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45492) Replace: _LEGACY_ERROR_TEMP_2152 with a better error class
Serge Rielau created SPARK-45492: Summary: Replace: _LEGACY_ERROR_TEMP_2152 with a better error class Key: SPARK-45492 URL: https://issues.apache.org/jira/browse/SPARK-45492 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau {code:java} def expressionEncodingError(e: Exception, expressions: Seq[Expression]): SparkRuntimeException = { new SparkRuntimeException( errorClass = "_LEGACY_ERROR_TEMP_2152", messageParameters = Map( "e" -> e.toString(), "expressions" -> expressions.map( _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")), cause = e) } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45491) Replace: _LEGACY_ERROR_TEMP_2196 with a better error class
Serge Rielau created SPARK-45491: Summary: Replace: _LEGACY_ERROR_TEMP_2196 with a better error class Key: SPARK-45491 URL: https://issues.apache.org/jira/browse/SPARK-45491 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau {code:java} def cannotFetchTablesOfDatabaseError(dbName: String, e: Exception): Throwable = { new SparkException( errorClass = "_LEGACY_ERROR_TEMP_2196", messageParameters = Map( "dbName" -> dbName), cause = e) } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45490) Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class
Serge Rielau created SPARK-45490: Summary: Replace: _LEGACY_ERROR_TEMP_2151 with a proper error class Key: SPARK-45490 URL: https://issues.apache.org/jira/browse/SPARK-45490 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau {code:java} def expressionDecodingError(e: Exception, expressions: Seq[Expression]): SparkRuntimeException = { new SparkRuntimeException( errorClass = "_LEGACY_ERROR_TEMP_2151", messageParameters = Map( "e" -> e.toString(), "expressions" -> expressions.map( _.simpleString(SQLConf.get.maxToStringFields)).mkString("\n")), cause = e) } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45489) Replace: _LEGACY_ERROR_TEMP_2134 with a regular error class
Serge Rielau created SPARK-45489: Summary: Replace: _LEGACY_ERROR_TEMP_2134 with a regular error class Key: SPARK-45489 URL: https://issues.apache.org/jira/browse/SPARK-45489 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau This is a frequently seen error we should convert: def cannotParseStringAsDataTypeError(pattern: String, value: String, dataType: DataType) : SparkRuntimeException = { new SparkRuntimeException( errorClass = "_LEGACY_ERROR_TEMP_2134", messageParameters = Map( "value" -> toSQLValue(value), "pattern" -> toSQLValue(pattern), "dataType" -> dataType.toString)) } -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45488) XML: Add support for value in 'rowTag' element
[ https://issues.apache.org/jira/browse/SPARK-45488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45488: --- Labels: pull-request-available (was: ) > XML: Add support for value in 'rowTag' element > -- > > Key: SPARK-45488 > URL: https://issues.apache.org/jira/browse/SPARK-45488 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Major > Labels: pull-request-available > > The following XML with rowTag 'book' will yield a schema with just "_id" > column and not the value: > > {code:java} > Great Book{code} > Let's parse value as well. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45488) XML: Add support for value in 'rowTag' element
Sandip Agarwala created SPARK-45488: --- Summary: XML: Add support for value in 'rowTag' element Key: SPARK-45488 URL: https://issues.apache.org/jira/browse/SPARK-45488 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Sandip Agarwala The following XML with rowTag 'book' will yield a schema with just "_id" column and not the value: {code:java} Great Book{code} Let's parse value as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44076) SPIP: Python Data Source API
[ https://issues.apache.org/jira/browse/SPARK-44076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-44076: - Affects Version/s: 4.0.0 (was: 3.5.0) > SPIP: Python Data Source API > > > Key: SPARK-44076 > URL: https://issues.apache.org/jira/browse/SPARK-44076 > Project: Spark > Issue Type: New Feature > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > This proposal aims to introduce a simple API in Python for Data Sources. The > idea is to enable Python developers to create data sources without having to > learn Scala or deal with the complexities of the current data source APIs. > The goal is to make a Python-based API that is simple and easy to use, thus > making Spark more accessible to the wider Python developer community. This > proposed approach is based on the recently introduced Python user-defined > table functions (SPARK-43797) with extensions to support data sources. > {*}SPIP{*}: > [https://docs.google.com/document/d/1oYrCKEKHzznljYfJO4kx5K_Npcgt1Slyfph3NEk7JRU/edit?usp=sharing] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45486) [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and include environment specific attributes
[ https://issues.apache.org/jira/browse/SPARK-45486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45486: --- Labels: pull-request-available (was: ) > [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and > include environment specific attributes > > > Key: SPARK-45486 > URL: https://issues.apache.org/jira/browse/SPARK-45486 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Robert Dillitz >Priority: Major > Labels: pull-request-available > > Similar to the [Python > client,|https://github.com/apache/spark/blob/2cc1ee4d3a05a641d7a245f015ef824d8f7bae8b/python/pyspark/sql/connect/client/core.py#L284] > the Scala client should: > # Recognize SPARK_CONNECT_USER_AGENT env variable > # Always include the OS, Java version, Scala version, and Spark version -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45477) Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test results/unit tests log` naming
[ https://issues.apache.org/jira/browse/SPARK-45477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45477. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43305 [https://github.com/apache/spark/pull/43305] > Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test > results/unit tests log` naming > --- > > Key: SPARK-45477 > URL: https://issues.apache.org/jira/browse/SPARK-45477 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45477) Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test results/unit tests log` naming
[ https://issues.apache.org/jira/browse/SPARK-45477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45477: - Assignee: Yang Jie > Use `matrix.java/inputs.java` to replace the hardcoded Java version in `test > results/unit tests log` naming > --- > > Key: SPARK-45477 > URL: https://issues.apache.org/jira/browse/SPARK-45477 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45487) Replace: _LEGACY_ERROR_TEMP_3007
Serge Rielau created SPARK-45487: Summary: Replace: _LEGACY_ERROR_TEMP_3007 Key: SPARK-45487 URL: https://issues.apache.org/jira/browse/SPARK-45487 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Serge Rielau def checkpointRDDBlockIdNotFoundError(rddBlockId: RDDBlockId): Throwable = \{ new SparkException( errorClass = "_LEGACY_ERROR_TEMP_3007", messageParameters = Map("rddBlockId" -> s"$rddBlockId"), cause = null ) } This error condition appears to be quite common, so we should convert it to a proper error class. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45486) [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and include environment specific attributes
Robert Dillitz created SPARK-45486: -- Summary: [CONNECT][SCALA] Update user agent to recognize SPARK_CONNECT_USER_AGENT and include environment specific attributes Key: SPARK-45486 URL: https://issues.apache.org/jira/browse/SPARK-45486 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Robert Dillitz Similar to the [Python client,|https://github.com/apache/spark/blob/2cc1ee4d3a05a641d7a245f015ef824d8f7bae8b/python/pyspark/sql/connect/client/core.py#L284] the Scala client should: # Recognize SPARK_CONNECT_USER_AGENT env variable # Always include the OS, Java version, Scala version, and Spark version -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45485) make add_artifact idempotent
Alice Sayutina created SPARK-45485: -- Summary: make add_artifact idempotent Key: SPARK-45485 URL: https://issues.apache.org/jira/browse/SPARK-45485 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0, 4.0.0 Reporter: Alice Sayutina # Make add_artifact request idempotent i.e. subsequent requests will succeed if the same content is provided. This makes retrying more safe. # Fix existing error handling mechanism -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45204) Allow CommandPlugins to be trackable
[ https://issues.apache.org/jira/browse/SPARK-45204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45204: --- Labels: connect pull-request-available scala (was: connect scala) > Allow CommandPlugins to be trackable > > > Key: SPARK-45204 > URL: https://issues.apache.org/jira/browse/SPARK-45204 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Robert Dillitz >Priority: Major > Labels: connect, pull-request-available, scala > > There is currently no way to track the QueryStatementType & compilation time > for queries executed by a CommandPlugin. I propose to change > SparkConnectPlanner to hold an optional ExecuteHolder that can then be used > by plugins. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45484) Fix the bug that uses incorrect parquet compression codec lz4raw
[ https://issues.apache.org/jira/browse/SPARK-45484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45484: --- Labels: pull-request-available (was: ) > Fix the bug that uses incorrect parquet compression codec lz4raw > > > Key: SPARK-45484 > URL: https://issues.apache.org/jira/browse/SPARK-45484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0, 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > lz4raw is not a correct parquet compression codec name. > We should use lz4_raw as its name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151
[ https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-45213: Assignee: Deng Ziming (was: Haejoon Lee) > Assign name to _LEGACY_ERROR_TEMP_2151 > -- > > Key: SPARK-45213 > URL: https://issues.apache.org/jira/browse/SPARK-45213 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Deng Ziming >Assignee: Deng Ziming >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating > dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error > class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45213) Assign name to _LEGACY_ERROR_TEMP_2151
[ https://issues.apache.org/jira/browse/SPARK-45213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-45213. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43029 [https://github.com/apache/spark/pull/43029] > Assign name to _LEGACY_ERROR_TEMP_2151 > -- > > Key: SPARK-45213 > URL: https://issues.apache.org/jira/browse/SPARK-45213 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Deng Ziming >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > in DatasetSuite test("CLASS_UNSUPPORTED_BY_MAP_OBJECTS when creating > dataset") , we are using _LEGACY_ERROR_TEMP_2151, We should use proper error > class name rather than `_LEGACY_ERROR_TEMP_xxx`. > > *NOTE:* Please reply to this ticket before start working on it, to avoid > working on same ticket at a time -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45484) Fix the bug that uses incorrect parquet compression codec lz4raw
Jiaan Geng created SPARK-45484: -- Summary: Fix the bug that uses incorrect parquet compression codec lz4raw Key: SPARK-45484 URL: https://issues.apache.org/jira/browse/SPARK-45484 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 4.0.0 Reporter: Jiaan Geng Assignee: Jiaan Geng lz4raw is not a correct parquet compression codec name. We should use lz4_raw as its name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-32494) Null Aware Anti Join Optimize Support Multi-Column
[ https://issues.apache.org/jira/browse/SPARK-32494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773659#comment-17773659 ] Runkang He commented on SPARK-32494: Hi [~leanken], is this feature available in Spark's master branch? I don't find it in master now. > Null Aware Anti Join Optimize Support Multi-Column > -- > > Key: SPARK-32494 > URL: https://issues.apache.org/jira/browse/SPARK-32494 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Leanken.Lin >Priority: Major > > In Issue SPARK-32290, we managed to optimize BroadcastNestedLoopJoin into > BroadcastHashJoin within the Single-Column NAAJ scenario, by using hash > lookup instead of loop join. > It's simple to just fulfill a "NOT IN" logical when it's a single key, but > multi-column not in is much more complicated with all these null aware > comparison. > FYI, code logical for single and multi column is defined at > ~/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/not-in-unit-tests-single-column.sql > ~/sql/core/src/test/resources/sql-tests/inputs/subquery/in-subquery/not-in-unit-tests-multi-column.sql > > Hence, proposed with a New type HashedRelation, NullAwareHashedRelation. > For NullAwareHashedRelation > # it will not skip anyNullColumn key like LongHashedRelation and > UnsafeHashedRelation do. > # while building NullAwareHashedRelation, will put extra keys into the > relation, just to make null aware columns comparison in hash lookup style. > the duplication would be 2^numKeys - 1 times, for example, if we are to > support NAAJ with 3 column join key, the buildSide would be expanded into > (2^3 - 1) times, 7X. > For example, if there is a UnsafeRow key (1,2,3) > In NullAware Mode, it should be expanded into 7 keys with extra C(3,1), > C(3,2) combinations, within the combinations, we duplicated these record with > null padding as following. > Original record > (1,2,3) > Extra record to be appended into NullAwareHashedRelation > (null, 2, 3) (1, null, 3) (1, 2, null) > (null, null, 3) (null, 2, null) (1, null, null)) > with the expanded data we can extract a common pattern for both single and > multi column. allNull refer to a unsafeRow which has all null columns. > * buildSide is empty input => return all rows > * allNullColumnKey Exists In buildSide input => reject all rows > * if streamedSideRow.allNull is true => drop the row > * if streamedSideRow.allNull is false & findMatch in NullAwareHashedRelation > => drop the row > * if streamedSideRow.allNull is false & notFindMatch in > NullAwareHashedRelation => return the row > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45479. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43306 [https://github.com/apache/spark/pull/43306] > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45479: Assignee: Hyukjin Kwon > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45204) Allow CommandPlugins to be trackable
[ https://issues.apache.org/jira/browse/SPARK-45204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Dillitz updated SPARK-45204: --- Summary: Allow CommandPlugins to be trackable (was: Extend CommandPlugins to be trackable) > Allow CommandPlugins to be trackable > > > Key: SPARK-45204 > URL: https://issues.apache.org/jira/browse/SPARK-45204 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Robert Dillitz >Priority: Major > Labels: connect, scala > > There is currently no way to track the QueryStatementType & compilation time > for queries executed by a CommandPlugin. I propose to create an extended > CommandPlugin interface that also offers a process() method that accepts a > QueryPlanningTracker. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45483) Correct the function groups in connect.functions
[ https://issues.apache.org/jira/browse/SPARK-45483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45483: --- Labels: pull-request-available (was: ) > Correct the function groups in connect.functions > > > Key: SPARK-45483 > URL: https://issues.apache.org/jira/browse/SPARK-45483 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45483) Correct the function groups in connect.functions
Ruifeng Zheng created SPARK-45483: - Summary: Correct the function groups in connect.functions Key: SPARK-45483 URL: https://issues.apache.org/jira/browse/SPARK-45483 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773616#comment-17773616 ] Yang Jie edited comment on SPARK-45482 at 10/10/23 9:38 AM: For case 1, the business can be changed to {code:java} private[serializer] var enableDebugging: Boolean = { !JBoolean.getBoolean("sun.io.serialization.extendedDebugInfo") } {code} For case 3, this is a useless method in Spark, perhaps it can be directly deleted? But for case 2, can we directly clean up the usage of AccessControlContext and AccessController? was (Author: luciferyang): For case 1, the business can be changed to {code:java} private[serializer] var enableDebugging: Boolean = { !JBoolean.getBoolean("sun.io.serialization.extendedDebugInfo") } {code} For case 3, this is a useless method in Spark, perhaps it can be directly deleted? But for case 2, can we directly clean up the usage of AccessControlContext and AccessController? > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerber
[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773617#comment-17773617 ] Yang Jie commented on SPARK-45482: -- friendly ping [~dongjoon] Do you have any suggestions on this? Or should we keep these usages for now? > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation > UserGroupInformation clientUGI = > > ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); > return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, > serverHttpUrl)); > } > } {code} > > -- This message w
[jira] [Commented] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773616#comment-17773616 ] Yang Jie commented on SPARK-45482: -- For case 1, the business can be changed to {code:java} private[serializer] var enableDebugging: Boolean = { !JBoolean.getBoolean("sun.io.serialization.extendedDebugInfo") } {code} For case 3, this is a useless method in Spark, perhaps it can be directly deleted? But for case 2, can we directly clean up the usage of AccessControlContext and AccessController? > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation >
[jira] [Updated] (SPARK-45482) Handle the usage of AccessControlContext and AccessController.
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45482: - Summary: Handle the usage of AccessControlContext and AccessController. (was: Clean up the usage of `AccessControlContext` and `AccessController`) > Handle the usage of AccessControlContext and AccessController. > -- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessController { > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Security Manager or this class. > */ > @Deprecated(since="17", forRemoval=true) > public final class AccessControlContext { {code} > > > `AccessControlContext` and `AccessController` are marked as deprecated in > Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that > they do not have corresponding replacements. > > In Spark, there are three files that use AccessControlContext or > AccessController: > 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] > {code:java} > private[serializer] var enableDebugging: Boolean = { > !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( > "sun.io.serialization.extendedDebugInfo")).booleanValue() > } {code} > > 2. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] > > {code:java} > public void open() throws TTransportException { > try { > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > Subject.doAs(subject, (PrivilegedExceptionAction) () -> { > try { > wrapped.open(); > } catch (TTransportException tte) { > // Wrap the transport exception in an RTE, since Subject.doAs() > then goes > // and unwraps this for us out of the doAs block. We then unwrap one > // more time in our catch clause to get back the TTE. (ugh) > throw new RuntimeException(tte); > } > return null; > }); > } catch (PrivilegedActionException ioe) { > throw new RuntimeException("Received an ioe we never threw!", ioe); > } catch (RuntimeException rte) { > if (rte.getCause() instanceof TTransportException) { > throw (TTransportException) rte.getCause(); > } else { > throw rte; > } > } > } {code} > > 3. > [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] > > {code:java} > public static String getKerberosServiceTicket(String principal, String host, > String serverHttpUrl, boolean assumeSubject) throws Exception { > String serverPrincipal = > ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, > host); > if (assumeSubject) { > // With this option, we're assuming that the external application, > // using the JDBC driver has done a JAAS kerberos login already > AccessControlContext context = AccessController.getContext(); > Subject subject = Subject.getSubject(context); > if (subject == null) { > throw new Exception("The Subject is not set"); > } > return Subject.doAs(subject, new > HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); > } else { > // JAAS login from ticket cache to setup the client UserGroupInformation > UserGroupInformation clientUGI = > > ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); > return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, > serverHttpUrl)); > } > } {code} > > -- This message was sent by Atlassia
[jira] [Updated] (SPARK-45482) Clean up the usage of `AccessControlContext` and `AccessController`
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45482: - Description: {code:java} * @deprecated This class is only useful in conjunction with * {@linkplain SecurityManager the Security Manager}, which is deprecated * and subject to removal in a future release. Consequently, this class * is also deprecated and subject to removal. There is no replacement for * the Security Manager or this class. */ @Deprecated(since="17", forRemoval=true) public final class AccessController { * @deprecated This class is only useful in conjunction with * {@linkplain SecurityManager the Security Manager}, which is deprecated * and subject to removal in a future release. Consequently, this class * is also deprecated and subject to removal. There is no replacement for * the Security Manager or this class. */ @Deprecated(since="17", forRemoval=true) public final class AccessControlContext { {code} `AccessControlContext` and `AccessController` are marked as deprecated in Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that they do not have corresponding replacements. In Spark, there are three files that use AccessControlContext or AccessController: 1.[https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] {code:java} private[serializer] var enableDebugging: Boolean = { !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( "sun.io.serialization.extendedDebugInfo")).booleanValue() } {code} 2. [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] {code:java} public void open() throws TTransportException { try { AccessControlContext context = AccessController.getContext(); Subject subject = Subject.getSubject(context); Subject.doAs(subject, (PrivilegedExceptionAction) () -> { try { wrapped.open(); } catch (TTransportException tte) { // Wrap the transport exception in an RTE, since Subject.doAs() then goes // and unwraps this for us out of the doAs block. We then unwrap one // more time in our catch clause to get back the TTE. (ugh) throw new RuntimeException(tte); } return null; }); } catch (PrivilegedActionException ioe) { throw new RuntimeException("Received an ioe we never threw!", ioe); } catch (RuntimeException rte) { if (rte.getCause() instanceof TTransportException) { throw (TTransportException) rte.getCause(); } else { throw rte; } } } {code} 3. [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] {code:java} public static String getKerberosServiceTicket(String principal, String host, String serverHttpUrl, boolean assumeSubject) throws Exception { String serverPrincipal = ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, host); if (assumeSubject) { // With this option, we're assuming that the external application, // using the JDBC driver has done a JAAS kerberos login already AccessControlContext context = AccessController.getContext(); Subject subject = Subject.getSubject(context); if (subject == null) { throw new Exception("The Subject is not set"); } return Subject.doAs(subject, new HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); } else { // JAAS login from ticket cache to setup the client UserGroupInformation UserGroupInformation clientUGI = ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); } } {code} was: {code:java} * @deprecated This class is only useful in conjunction with * {@linkplain SecurityManager the Security Manager}, which is deprecated * and subject to removal in a future release. Consequently, this class * is also deprecated and subject to removal. There is no replacement for * the Security Manager or this class. */ @Deprecated(since="17", forRemoval=true) public final class AccessController { * @deprecated This class is only useful in conjunction with * {@linkplain SecurityManager the Security Manager}, which is deprecated * and subject to removal in a future release. Consequently, this class * is also deprecated and subject to removal. There is no replacement for * the Security Manager or this class
[jira] [Updated] (SPARK-45482) Clean up the usage of `AccessControlContext` and `AccessController`
[ https://issues.apache.org/jira/browse/SPARK-45482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-45482: - Description: {code:java} * @deprecated This class is only useful in conjunction with * {@linkplain SecurityManager the Security Manager}, which is deprecated * and subject to removal in a future release. Consequently, this class * is also deprecated and subject to removal. There is no replacement for * the Security Manager or this class. */ @Deprecated(since="17", forRemoval=true) public final class AccessController { * @deprecated This class is only useful in conjunction with * {@linkplain SecurityManager the Security Manager}, which is deprecated * and subject to removal in a future release. Consequently, this class * is also deprecated and subject to removal. There is no replacement for * the Security Manager or this class. */ @Deprecated(since="17", forRemoval=true) public final class AccessControlContext { {code} `AccessControlContext` and `AccessController` are marked as deprecated in Java 17, with `forRemoval` set to true. From the Javadoc, it can be seen that they do not have corresponding replacements. In Spark, there are three files that use AccessControlContext or AccessController: # [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/core/src/main/scala/org/apache/spark/serializer/SerializationDebugger.scala#L70-L73] {code:java} private[serializer] var enableDebugging: Boolean = { !AccessController.doPrivileged(new sun.security.action.GetBooleanAction( "sun.io.serialization.extendedDebugInfo")).booleanValue() } {code} # [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/TSubjectAssumingTransport.java#L42-L45] {code:java} public void open() throws TTransportException { try { AccessControlContext context = AccessController.getContext(); Subject subject = Subject.getSubject(context); Subject.doAs(subject, (PrivilegedExceptionAction) () -> { try { wrapped.open(); } catch (TTransportException tte) { // Wrap the transport exception in an RTE, since Subject.doAs() then goes // and unwraps this for us out of the doAs block. We then unwrap one // more time in our catch clause to get back the TTE. (ugh) throw new RuntimeException(tte); } return null; }); } catch (PrivilegedActionException ioe) { throw new RuntimeException("Received an ioe we never threw!", ioe); } catch (RuntimeException rte) { if (rte.getCause() instanceof TTransportException) { throw (TTransportException) rte.getCause(); } else { throw rte; } } } {code} # [https://github.com/apache/spark/blob/39cc4abaff73cb49f9d79d1d844fe5c9fa14c917/sql/hive-thriftserver/src/main/java/org/apache/hive/service/auth/HttpAuthUtils.java#L73] {code:java} public static String getKerberosServiceTicket(String principal, String host, String serverHttpUrl, boolean assumeSubject) throws Exception { String serverPrincipal = ShimLoader.getHadoopThriftAuthBridge().getServerPrincipal(principal, host); if (assumeSubject) { // With this option, we're assuming that the external application, // using the JDBC driver has done a JAAS kerberos login already AccessControlContext context = AccessController.getContext(); Subject subject = Subject.getSubject(context); if (subject == null) { throw new Exception("The Subject is not set"); } return Subject.doAs(subject, new HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); } else { // JAAS login from ticket cache to setup the client UserGroupInformation UserGroupInformation clientUGI = ShimLoader.getHadoopThriftAuthBridge().getCurrentUGIWithConf("kerberos"); return clientUGI.doAs(new HttpKerberosClientAction(serverPrincipal, serverHttpUrl)); } } {code} > Clean up the usage of `AccessControlContext` and `AccessController` > --- > > Key: SPARK-45482 > URL: https://issues.apache.org/jira/browse/SPARK-45482 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, SQL >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > > > {code:java} > * @deprecated This class is only useful in conjunction with > * {@linkplain SecurityManager the Security Manager}, which is > deprecated > * and subject to removal in a future release. Consequently, this class > * is also deprecated and subject to removal. There is no replacement > for > * the Sec
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45481: -- Assignee: Apache Spark (was: Jiaan Geng) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Currently, Spark supported most of all parquet compression codecs, the > parquet supported compression codecs and spark supported are not completely > one-on-one. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45479: -- Assignee: (was: Apache Spark) > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45479: -- Assignee: Apache Spark > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45481: --- Labels: pull-request-available (was: ) > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > Labels: pull-request-available > > Currently, Spark supported most of all parquet compression codecs, the > parquet supported compression codecs and spark supported are not completely > one-on-one. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45479: -- Assignee: (was: Apache Spark) > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45479: -- Assignee: Apache Spark > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45482) Clean up the usage of `AccessControlContext` and `AccessController`
Yang Jie created SPARK-45482: Summary: Clean up the usage of `AccessControlContext` and `AccessController` Key: SPARK-45482 URL: https://issues.apache.org/jira/browse/SPARK-45482 Project: Spark Issue Type: Sub-task Components: Spark Core, SQL Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45479: -- Assignee: (was: Apache Spark) > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45479: -- Assignee: Apache Spark > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
[ https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43664: -- Assignee: Apache Spark > Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests > --- > > Key: SPARK-43664 > URL: https://issues.apache.org/jira/browse/SPARK-43664 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Repro: run `SQLParityTests.test_sql_with_index_col` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
[ https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43664: -- Assignee: (was: Apache Spark) > Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests > --- > > Key: SPARK-43664 > URL: https://issues.apache.org/jira/browse/SPARK-43664 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Repro: run `SQLParityTests.test_sql_with_index_col` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45480: -- Assignee: (was: Apache Spark) > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
[ https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43664: -- Assignee: Apache Spark > Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests > --- > > Key: SPARK-43664 > URL: https://issues.apache.org/jira/browse/SPARK-43664 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Repro: run `SQLParityTests.test_sql_with_index_col` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43664) Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests
[ https://issues.apache.org/jira/browse/SPARK-43664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43664: -- Assignee: (was: Apache Spark) > Fix TABLE_OR_VIEW_NOT_FOUND from SQLParityTests > --- > > Key: SPARK-43664 > URL: https://issues.apache.org/jira/browse/SPARK-43664 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > > Repro: run `SQLParityTests.test_sql_with_index_col` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45480: -- Assignee: Apache Spark > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45480: -- Assignee: Apache Spark > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng reassigned SPARK-45481: -- Assignee: Jiaan Geng > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Assignee: Jiaan Geng >Priority: Major > > Currently, Spark supported most of all parquet compression codecs, the > parquet supported compression codecs and spark supported are not completely > one-on-one. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45480: -- Assignee: (was: Apache Spark) > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45481: --- Description: Currently, Spark supported most of all parquet compression codecs, the parquet supported compression codecs and spark supported are not completely one-on-one. There are a lot of magic strings copy from parquet compression codecs. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. was:Currently, StorageLevel provides fromString to get the StorageLevel's instance with StorageLevel's name. So developers or users have to copy the string literal about StorageLevel's name to set or get instance of StorageLevel. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Priority: Major > > Currently, Spark supported most of all parquet compression codecs, the > parquet supported compression codecs and spark supported are not completely > one-on-one. > There are a lot of magic strings copy from parquet compression codecs. This > issue lead to developers need to manually maintain its consistency. It is > easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45480: -- Assignee: (was: Apache Spark) > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45480: -- Assignee: Apache Spark > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45356) Optimize the Maven daily test configuration
[ https://issues.apache.org/jira/browse/SPARK-45356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-45356: -- Assignee: (was: Apache Spark) > Optimize the Maven daily test configuration > --- > > Key: SPARK-45356 > URL: https://issues.apache.org/jira/browse/SPARK-45356 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45480) Selectable SQL Plan
[ https://issues.apache.org/jira/browse/SPARK-45480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45480: --- Labels: pull-request-available (was: ) > Selectable SQL Plan > --- > > Key: SPARK-45480 > URL: https://issues.apache.org/jira/browse/SPARK-45480 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45481) Introduce a mapper for parquet compression codecs
Jiaan Geng created SPARK-45481: -- Summary: Introduce a mapper for parquet compression codecs Key: SPARK-45481 URL: https://issues.apache.org/jira/browse/SPARK-45481 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Jiaan Geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45481) Introduce a mapper for parquet compression codecs
[ https://issues.apache.org/jira/browse/SPARK-45481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiaan Geng updated SPARK-45481: --- Description: Currently, StorageLevel provides fromString to get the StorageLevel's instance with StorageLevel's name. So developers or users have to copy the string literal about StorageLevel's name to set or get instance of StorageLevel. This issue lead to developers need to manually maintain its consistency. It is easy to make mistakes and reduce development efficiency. > Introduce a mapper for parquet compression codecs > - > > Key: SPARK-45481 > URL: https://issues.apache.org/jira/browse/SPARK-45481 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Jiaan Geng >Priority: Major > > Currently, StorageLevel provides fromString to get the StorageLevel's > instance with StorageLevel's name. So developers or users have to copy the > string literal about StorageLevel's name to set or get instance of > StorageLevel. This issue lead to developers need to manually maintain its > consistency. It is easy to make mistakes and reduce development efficiency. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45480) Selectable SQL Plan
Kent Yao created SPARK-45480: Summary: Selectable SQL Plan Key: SPARK-45480 URL: https://issues.apache.org/jira/browse/SPARK-45480 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
[ https://issues.apache.org/jira/browse/SPARK-45479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-45479: --- Labels: pull-request-available (was: ) > Recover Python build in branch-3.3, 3.4 and 3.5 > --- > > Key: SPARK-45479 > URL: https://issues.apache.org/jira/browse/SPARK-45479 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/workflows/build_branch33.yml > https://github.com/apache/spark/actions/workflows/build_branch34.yml > https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-45479) Recover Python build in branch-3.3, 3.4 and 3.5
Hyukjin Kwon created SPARK-45479: Summary: Recover Python build in branch-3.3, 3.4 and 3.5 Key: SPARK-45479 URL: https://issues.apache.org/jira/browse/SPARK-45479 Project: Spark Issue Type: Improvement Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/actions/workflows/build_branch33.yml https://github.com/apache/spark/actions/workflows/build_branch34.yml https://github.com/apache/spark/actions/workflows/build_branch35.yml -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45475) Should use DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils
[ https://issues.apache.org/jira/browse/SPARK-45475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-45475. -- Fix Version/s: 3.5.1 4.0.0 Resolution: Fixed Issue resolved by pull request 43304 [https://github.com/apache/spark/pull/43304] > Should use DataFrame.foreachPartition instead of RDD.foreachPartition in > JdbcUtils > -- > > Key: SPARK-45475 > URL: https://issues.apache.org/jira/browse/SPARK-45475 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.1, 4.0.0 > > > See https://github.com/apache/spark/pull/39976#issuecomment-1752930380 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45475) Should use DataFrame.foreachPartition instead of RDD.foreachPartition in JdbcUtils
[ https://issues.apache.org/jira/browse/SPARK-45475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-45475: Assignee: Hyukjin Kwon > Should use DataFrame.foreachPartition instead of RDD.foreachPartition in > JdbcUtils > -- > > Key: SPARK-45475 > URL: https://issues.apache.org/jira/browse/SPARK-45475 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > > See https://github.com/apache/spark/pull/39976#issuecomment-1752930380 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhizhen Hou updated SPARK-45478: Description: *The SQL to reproduce the result* {code:java} create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; {code} *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} was: *The SQL to reproduce* {code:java} create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; {code} *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} > codegen sum(decimal_column / 2) computes div twice > -- > > Key: SPARK-45478 > URL: https://issues.apache.org/jira/browse/SPARK-45478 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhizhen Hou >Priority: Minor > > *The SQL to reproduce the result* > {code:java} > create table t_dec (c1 decimal(6,2)); > insert into t_dec values(1.0),(2.0),(null),(3.0); > explain codegen select sum(c1/2) from t_dec; {code} > > *Reasons may cause the result:* > > Function sum use If expression in updateExpressions: > `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` > > The three variables in if expression like this. > {code:java} > predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), > true]falseValue: (input[0, decimal(26,6), true] + > cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} > In sub expression elimination, only predicate is evaluated in > EquivalentExpressions# childrenToRecurse > {code:java} > private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match > { > case _: CodegenFallback => Nil > case i: If => i.predic
[jira] [Commented] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773579#comment-17773579 ] Zhizhen Hou commented on SPARK-45478: - The new compiled spark 4.0.0-SNAPSHOT from master branch also produce the same result. > codegen sum(decimal_column / 2) computes div twice > -- > > Key: SPARK-45478 > URL: https://issues.apache.org/jira/browse/SPARK-45478 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhizhen Hou >Priority: Minor > > *The SQL to reproduce* > create table t_dec (c1 decimal(6,2)); > insert into t_dec values(1.0),(2.0),(null),(3.0); > explain codegen select sum(c1/2) from t_dec; > > *Reasons may cause the result:* > > Function sum use If expression in updateExpressions: > `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` > > The three variables in if expression like this. > {code:java} > predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), > true]falseValue: (input[0, decimal(26,6), true] + > cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} > In sub expression elimination, only predicate is evaluated in > EquivalentExpressions# childrenToRecurse > {code:java} > private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match > { > case _: CodegenFallback => Nil > case i: If => i.predicate :: Nil > case c: CaseWhen => c.children.head :: Nil > case c: Coalesce => c.children.head :: Nil > case other => other.children > } {code} > I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => > i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. > > But the following comment in `childrenToRecurse` makes me not sure it will > cause any other problems. > {code:java} > // 2. If: common subexpressions will always be evaluated at the beginning, > but the true and > // false expressions in `If` may not get accessed, according to the predicate > // expression. We should only recurse into the predicate expression. {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
[ https://issues.apache.org/jira/browse/SPARK-45478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhizhen Hou updated SPARK-45478: Description: *The SQL to reproduce* {code:java} create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; {code} *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} was: *The SQL to reproduce* create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} > codegen sum(decimal_column / 2) computes div twice > -- > > Key: SPARK-45478 > URL: https://issues.apache.org/jira/browse/SPARK-45478 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Zhizhen Hou >Priority: Minor > > *The SQL to reproduce* > {code:java} > create table t_dec (c1 decimal(6,2)); > insert into t_dec values(1.0),(2.0),(null),(3.0); > explain codegen select sum(c1/2) from t_dec; {code} > > *Reasons may cause the result:* > > Function sum use If expression in updateExpressions: > `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` > > The three variables in if expression like this. > {code:java} > predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), > true]falseValue: (input[0, decimal(26,6), true] + > cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), > true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} > In sub expression elimination, only predicate is evaluated in > EquivalentExpressions# childrenToRecurse > {code:java} > private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match > { > case _: CodegenFallback => Nil > case i: If => i.predicate :: Nil > case c: CaseWhen => c.ch
[jira] [Created] (SPARK-45478) codegen sum(decimal_column / 2) computes div twice
Zhizhen Hou created SPARK-45478: --- Summary: codegen sum(decimal_column / 2) computes div twice Key: SPARK-45478 URL: https://issues.apache.org/jira/browse/SPARK-45478 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Zhizhen Hou *The SQL to reproduce* create table t_dec (c1 decimal(6,2)); insert into t_dec values(1.0),(2.0),(null),(3.0); explain codegen select sum(c1/2) from t_dec; *Reasons may cause the result:* Function sum use If expression in updateExpressions: `If(child.isNull, sum, sum + KnownNotNull(child).cast(resultType))` The three variables in if expression like this. {code:java} predicate: isnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true))trueValue: input[0, decimal(26,6), true]falseValue: (input[0, decimal(26,6), true] + cast(knownnotnull(CheckOverflow((promote_precision(input[2, decimal(10,0), true]) / 2), DecimalType(16,6), true)) as decimal(26,6))) {code} In sub expression elimination, only predicate is evaluated in EquivalentExpressions# childrenToRecurse {code:java} private def childrenToRecurse(expr: Expression): Seq[Expression] = expr match { case _: CodegenFallback => Nil case i: If => i.predicate :: Nil case c: CaseWhen => c.children.head :: Nil case c: Coalesce => c.children.head :: Nil case other => other.children } {code} I tried to replace `case i: If => i.predicate :: Nil` with 'case i: If => i.predicate :: trueValue :: falseValue :: Nil', and it produce correct result. But the following comment in `childrenToRecurse` makes me not sure it will cause any other problems. {code:java} // 2. If: common subexpressions will always be evaluated at the beginning, but the true and // false expressions in `If` may not get accessed, according to the predicate // expression. We should only recurse into the predicate expression. {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org