[jira] [Assigned] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs
[ https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35813: Assignee: Apache Spark > Add new adaptive config into sql-performance-tuning docs > > > Key: SPARK-35813 > URL: https://issues.apache.org/jira/browse/SPARK-35813 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and > `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at > sql-performance-tuning docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs
[ https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365252#comment-17365252 ] Apache Spark commented on SPARK-35813: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/32960 > Add new adaptive config into sql-performance-tuning docs > > > Key: SPARK-35813 > URL: https://issues.apache.org/jira/browse/SPARK-35813 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and > `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at > sql-performance-tuning docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs
[ https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35813: Assignee: (was: Apache Spark) > Add new adaptive config into sql-performance-tuning docs > > > Key: SPARK-35813 > URL: https://issues.apache.org/jira/browse/SPARK-35813 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and > `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at > sql-performance-tuning docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs
[ https://issues.apache.org/jira/browse/SPARK-35813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365251#comment-17365251 ] Apache Spark commented on SPARK-35813: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/32960 > Add new adaptive config into sql-performance-tuning docs > > > Key: SPARK-35813 > URL: https://issues.apache.org/jira/browse/SPARK-35813 > Project: Spark > Issue Type: Improvement > Components: docs >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Major > > Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and > `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at > sql-performance-tuning docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35813) Add new adaptive config into sql-performance-tuning docs
XiDuo You created SPARK-35813: - Summary: Add new adaptive config into sql-performance-tuning docs Key: SPARK-35813 URL: https://issues.apache.org/jira/browse/SPARK-35813 Project: Spark Issue Type: Improvement Components: docs Affects Versions: 3.2.0 Reporter: XiDuo You Describe the new config `spark.sql.adaptive.autoBroadcastJoinThreshold` and `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold` at sql-performance-tuning docs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35812) Throw an error if `version` and `timestamp` are used together in DataFrame.to_delta.
Haejoon Lee created SPARK-35812: --- Summary: Throw an error if `version` and `timestamp` are used together in DataFrame.to_delta. Key: SPARK-35812 URL: https://issues.apache.org/jira/browse/SPARK-35812 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee [read_delta |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.read_delta.html#databricks.koalas.read_delta]has an arguments named `version` and `timestamp`, but it cannot be used together. We should raise the proper error message when they are used together. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35811) Deprecate DataFrame.to_spark_io
Haejoon Lee created SPARK-35811: --- Summary: Deprecate DataFrame.to_spark_io Key: SPARK-35811 URL: https://issues.apache.org/jira/browse/SPARK-35811 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee We should deprecate the [DataFrame.to_spark_io|https://docs.google.com/document/d/1RxvQJVf736Vg9XU7uiCaRlNl-P7GdmFGa6U3Ab78JJk/edit#heading=h.todz8y4xdqrx] since it's duplicated with [DataFrame.spark.to_spark_io|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.to_spark_io.html], and it's not existed in pandas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35810) Remove ps.broadcast API
Haejoon Lee created SPARK-35810: --- Summary: Remove ps.broadcast API Key: SPARK-35810 URL: https://issues.apache.org/jira/browse/SPARK-35810 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee We have [ps.broadcast|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.broadcast.html] in pandas API on Spark, but it's duplicated with [DataFrame.spark.hint|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.hint.html] when using this API with "broadcast". So, we'd better remove this and [broadcast|http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.broadcast.html] function in PySpark as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35809) Add `index_col` argument for ps.sql.
Haejoon Lee created SPARK-35809: --- Summary: Add `index_col` argument for ps.sql. Key: SPARK-35809 URL: https://issues.apache.org/jira/browse/SPARK-35809 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee The current behavior of [ps.sql |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.sql.html]always lost the index, so we should add the `indxe_col` arguments for this API so that we can preserve the index. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35808) Always enable the `pandas_metadata` in DataFrame.parquet
Haejoon Lee created SPARK-35808: --- Summary: Always enable the `pandas_metadata` in DataFrame.parquet Key: SPARK-35808 URL: https://issues.apache.org/jira/browse/SPARK-35808 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee We have a argument named `pandas_metadata` in [ps.read_parquet|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.read_parquet.html], but seems we can just always enable so that it always respect the pandas metadata when reading the Parquet file written by pandas. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35807) Rename the `num_files` argument
[ https://issues.apache.org/jira/browse/SPARK-35807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haejoon Lee updated SPARK-35807: Description: We should rename the num_files argument in [DataFrame.to_csv |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and [DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html]. Because the behavior of num_files is not actually specify the number of files, but it specifies the number of partition. Or we just can remove, and use the +[DataFrame.spark.repartition|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.repartition.html]+ as a work around. was: We should rename the num_files argument in [DataFrame.to_csv |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and [DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html]. Because the behavior of num_files is not actually specify the number of files, but it specifies the number of partition. > Rename the `num_files` argument > --- > > Key: SPARK-35807 > URL: https://issues.apache.org/jira/browse/SPARK-35807 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > We should rename the num_files argument in [DataFrame.to_csv > |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and > > [DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html]. > Because the behavior of num_files is not actually specify the number of > files, but it specifies the number of partition. > Or we just can remove, and use the > +[DataFrame.spark.repartition|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.spark.repartition.html]+ > as a work around. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35807) Rename the `num_files` argument
Haejoon Lee created SPARK-35807: --- Summary: Rename the `num_files` argument Key: SPARK-35807 URL: https://issues.apache.org/jira/browse/SPARK-35807 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee We should rename the num_files argument in [DataFrame.to_csv |https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html]and [DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html]. Because the behavior of num_files is not actually specify the number of files, but it specifies the number of partition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35806) Rename the `mode` argument to avoid confusion with `mode` argument in pandas
Haejoon Lee created SPARK-35806: --- Summary: Rename the `mode` argument to avoid confusion with `mode` argument in pandas Key: SPARK-35806 URL: https://issues.apache.org/jira/browse/SPARK-35806 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee pandas on Spark has a argument named `mode` in the APIs below: * [DataFrame.to_csv|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_csv.html] * [DataFrame.to_json|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_json.html] * [DataFrame.to_table|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_table.html] * [DataFrame.to_delta|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_delta.html] * [DataFrame.to_parquet|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_parquet.html] * [DataFrame.to_orc|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_orc.html] * [DataFrame.to_spark_io|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.DataFrame.to_spark_io.html] And pandas has same argument, but the usage is different. So we should rename the argument to avoid confusion with pandas' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35747) Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-35747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-35747: - Description: In a secure Yarn cluster, even though HBase or Kafka, or Hive services are not used in the user application, yarn client unnecessarily trying to generate Delegations token from these services. This will add additional delays while submitting spark application in a yarn cluster Also during HBase delegation, token generation step in the application submit stage, HBaseDelegationTokenProvider prints a full Exception Stack trace and it causes a noisy warning. {code:java} WARN security.HBaseDelegationTokenProvider: Failed to get token from service hbase java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT okenProvider.scala:93) more than 100+ line exception stack trace{code} Hence, if these services are not used in the user Application, it is better add WARN message to disable Delegation Token generation for those services. ie, spark.security.credentials.hbase.enabled=false , spark.security.credentials.hive.enabled=false , spark.security.credentials.kafka.enabled=false was: In a secure Yarn cluster, even though HBase or Kafka, or Hive services are not used in the user application, yarn client unnecessarily trying to generate Delegations token from these services. This will add additional delays while submitting spark application in a yarn cluster Also during HBase delegation, token generation step in the application submit stage, HBaseDelegationTokenProvider prints a full Exception Stack trace and it causes a noisy warning. {code:java} WARN security.HBaseDelegationTokenProvider: Failed to get token from service hbase java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT okenProvider.scala:93) more than 100+ line exception stack trace{code} Hence, if these services are not used in the user Application, it is better add WARN message to disable Delegation Token generation for those services. ie, spark.security.credentials.hbase.enabled=false , spark.security.credentials.hive.enabled=false , spark.security.credentials.kafka.enabled=false > Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are > not running in a secure cluster > > > Key: SPARK-35747 > URL: https://issues.apache.org/jira/browse/SPARK-35747 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0, 3.1.2 >Reporter: Vinod KC >Priority: Minor > > In a secure Yarn cluster, even though HBase or Kafka, or Hive services are > not used in the user application, yarn client unnecessarily trying to > generate Delegations token from these services. This will add additional > delays while submitting spark application in a yarn cluster > Also during HBase delegation, token generation step in the application > submit stage, HBaseDelegationTokenProvider prints a full Exception Stack > trace and it causes a noisy warning. > {code:java} > WARN security.HBaseDelegationTokenProvider: Failed to get token from service > hbase > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT > okenProvider.scala:93) > more than 100+ line exception stack trace{code} > Hence, if these services are not used in the user Application, it is better > add WARN message to disable Delegation Token generation for those services. > ie, spark.security.credentials.hbase.enable
[jira] [Assigned] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range
[ https://issues.apache.org/jira/browse/SPARK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35780: Assignee: (was: Apache Spark) > Support DATE/TIMESTAMP literals across the full range > - > > Key: SPARK-35780 > URL: https://issues.apache.org/jira/browse/SPARK-35780 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Priority: Major > > DATE/TIMESTAMP literals support years to . > However, internally we support a range that is much larger. > I can add or subtract large intervals from a date/timestamp and the system > will happily process and display large negative and positive dates. > Since we obviously cannot put this genie back into the bottle the only thing > we can do is allow matching DATE/TIMESTAMP literals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range
[ https://issues.apache.org/jira/browse/SPARK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35780: Assignee: Apache Spark > Support DATE/TIMESTAMP literals across the full range > - > > Key: SPARK-35780 > URL: https://issues.apache.org/jira/browse/SPARK-35780 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Apache Spark >Priority: Major > > DATE/TIMESTAMP literals support years to . > However, internally we support a range that is much larger. > I can add or subtract large intervals from a date/timestamp and the system > will happily process and display large negative and positive dates. > Since we obviously cannot put this genie back into the bottle the only thing > we can do is allow matching DATE/TIMESTAMP literals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35805) Pandas API on Spark improvements
Haejoon Lee created SPARK-35805: --- Summary: Pandas API on Spark improvements Key: SPARK-35805 URL: https://issues.apache.org/jira/browse/SPARK-35805 Project: Spark Issue Type: Umbrella Components: PySpark Affects Versions: 3.2.0 Reporter: Haejoon Lee There are several things that need improvement in pandas on Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35780) Support DATE/TIMESTAMP literals across the full range
[ https://issues.apache.org/jira/browse/SPARK-35780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365234#comment-17365234 ] Apache Spark commented on SPARK-35780: -- User 'linhongliu-db' has created a pull request for this issue: https://github.com/apache/spark/pull/32959 > Support DATE/TIMESTAMP literals across the full range > - > > Key: SPARK-35780 > URL: https://issues.apache.org/jira/browse/SPARK-35780 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Priority: Major > > DATE/TIMESTAMP literals support years to . > However, internally we support a range that is much larger. > I can add or subtract large intervals from a date/timestamp and the system > will happily process and display large negative and positive dates. > Since we obviously cannot put this genie back into the bottle the only thing > we can do is allow matching DATE/TIMESTAMP literals. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-35065) Group exception messages in spark/sql (core)
[ https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-35065: --- Comment: was deleted (was: I'm working on.) > Group exception messages in spark/sql (core) > > > Key: SPARK-35065 > URL: https://issues.apache.org/jira/browse/SPARK-35065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Group all errors in sql/core/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35747) Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-35747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-35747: - Description: In a secure Yarn cluster, even though HBase or Kafka, or Hive services are not used in the user application, yarn client unnecessarily trying to generate Delegations token from these services. This will add additional delays while submitting spark application in a yarn cluster Also during HBase delegation, token generation step in the application submit stage, HBaseDelegationTokenProvider prints a full Exception Stack trace and it causes a noisy warning. {code:java} WARN security.HBaseDelegationTokenProvider: Failed to get token from service hbase java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT okenProvider.scala:93) more than 100+ line exception stack trace{code} Hence, if these services are not used in the user Application, it is better add WARN message to disable Delegation Token generation for those services. ie, spark.security.credentials.hbase.enabled=false , spark.security.credentials.hive.enabled=false , spark.security.credentials.kafka.enabled=false was: In a secure Yarn cluster where HBase service is down, even if the spark application is not using HBase, during the application submit stage, HBaseDelegationTokenProvider prints full Exception Stack trace and it causes a noisy warning. {code:java} WARN security.HBaseDelegationTokenProvider: Failed to get token from service hbase java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT okenProvider.scala:93) more than 100 line exception stack trace{code} Also, Application submission taking more time as `HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn` retries the connection to HBase master multiple times before it gives up. This slows down the application submission steps. Hence, if HBase is not used in the user Application, it is better to suggest user to disable HBase Delegation Token generation. ie, spark.security.credentials.hbase.enabled=false > Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are > not running in a secure cluster > > > Key: SPARK-35747 > URL: https://issues.apache.org/jira/browse/SPARK-35747 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0, 3.1.2 >Reporter: Vinod KC >Priority: Minor > > In a secure Yarn cluster, even though HBase or Kafka, or Hive services are > not used in the user application, yarn client unnecessarily trying to > generate Delegations token from these services. This will add additional > delays while submitting spark application in a yarn cluster > > Also during HBase delegation, token generation step in the application submit > stage, HBaseDelegationTokenProvider prints a full Exception Stack trace and > it causes a noisy warning. > {code:java} > WARN security.HBaseDelegationTokenProvider: Failed to get token from service > hbase > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT > okenProvider.scala:93) > more than 100+ line exception stack trace{code} > Hence, if these services are not used in the user Application, it is better > add WARN message to disable Delegation Token generation for those services. > ie, spark.security.credentials.hbase.enabled=false , > spark.security.credentials.hive.enabled=false , > spark.security.credentials.kafka.enabled=
[jira] [Commented] (SPARK-35065) Group exception messages in spark/sql (core)
[ https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365233#comment-17365233 ] Apache Spark commented on SPARK-35065: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/32958 > Group exception messages in spark/sql (core) > > > Key: SPARK-35065 > URL: https://issues.apache.org/jira/browse/SPARK-35065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Group all errors in sql/core/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35065) Group exception messages in spark/sql (core)
[ https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365232#comment-17365232 ] Apache Spark commented on SPARK-35065: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/32958 > Group exception messages in spark/sql (core) > > > Key: SPARK-35065 > URL: https://issues.apache.org/jira/browse/SPARK-35065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Group all errors in sql/core/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35065) Group exception messages in spark/sql (core)
[ https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35065: Assignee: Apache Spark > Group exception messages in spark/sql (core) > > > Key: SPARK-35065 > URL: https://issues.apache.org/jira/browse/SPARK-35065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > Group all errors in sql/core/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35065) Group exception messages in spark/sql (core)
[ https://issues.apache.org/jira/browse/SPARK-35065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35065: Assignee: (was: Apache Spark) > Group exception messages in spark/sql (core) > > > Key: SPARK-35065 > URL: https://issues.apache.org/jira/browse/SPARK-35065 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > Group all errors in sql/core/src/main/scala/org/apache/spark/sql -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35747) Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-35747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod KC updated SPARK-35747: - Summary: Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are not running in a secure cluster (was: Avoid printing full Exception stack trace, if HBase service is not running in a secure cluster ) > Avoid printing full Exception stack trace, if HBase/Kafka/Hive services are > not running in a secure cluster > > > Key: SPARK-35747 > URL: https://issues.apache.org/jira/browse/SPARK-35747 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0, 3.1.0, 3.1.2 >Reporter: Vinod KC >Priority: Minor > > In a secure Yarn cluster where HBase service is down, even if the spark > application is not using HBase, during the application submit stage, > HBaseDelegationTokenProvider prints full Exception Stack trace and it causes > a noisy warning. > > {code:java} > WARN security.HBaseDelegationTokenProvider: Failed to get token from service > hbase > java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.security.HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn(HBaseDelegationT > okenProvider.scala:93) > more than 100 line exception stack trace{code} > Also, Application submission taking more time as > `HBaseDelegationTokenProvider.obtainDelegationTokensWithHBaseConn` retries > the connection to HBase master multiple times before it gives up. This slows > down the application submission steps. Hence, if HBase is not used in the > user Application, it is better to suggest user to disable HBase Delegation > Token generation. > ie, spark.security.credentials.hbase.enabled=false > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30641) Project Matrix: Linear Models revisit and refactor
[ https://issues.apache.org/jira/browse/SPARK-30641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-30641: - Description: We had been refactoring linear models for a long time, and there still are some works in the future. After some discussions among [~huaxingao] [~srowen] [~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works under a sub-project Matrix, it includes: # *Blockification (vectorization of vectors)* ** vectors are stacked into matrices, so that high-level BLAS can be used for better performance. (about ~3x faster on sparse datasets, up to ~18x faster on dense datasets, see SPARK-31783 for details). ** Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and we need to blockify KMeans in the future. # *Standardization (virutal centering)* ** Existing impl of standardization in linear models does NOT center the vectors by removing the means, for the purpose of keeping dataset _*sparsity*_. However, this will cause feature values with small var be scaled to large values, and underlying solver like LBFGS can not efficiently handle this case. see SPARK-34448 for details. ** If internal vectors are centered (like famous GLMNET), the convergence ratio will be better. In the case in SPARK-34448, the number of iteration to convergence will be reduced from 93 to 6. Moreover, the final solution is much more close to the one in GLMNET. ** Luckily, we find a new way to _*virtually*_ center the vectors without densifying the dataset. Good results had been observed in LoR, we will take it into account in other linear models. # _*Initialization (To be discussed)*_ ** Initializing model coef with a given model, should be beneficial to: 1, convergence ratio (should reduce number of iterations); 2, model stability (may obtain a new solution more close to the previous one); # _*Early Stopping* *(To be discussed)*_ ** we can compute the test error in the procedure (like tree models), and stop the training procedure if test error begin to increase; If you want to add other features in these models, please comment in the ticket. was: We had been refactoring linear models for a long time, and there still are some works in the future. After some discuss among [~huaxingao] [~srowen] [~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works under a sub-project Matrix, it includes: # *Blockification (vectorization of vectors)* ** vectors are stacked into matrices, so that high-level BLAS can be used for better performance. (about ~3x faster on sparse datasets, up to ~18x faster on dense datasets, see SPARK-31783 for details). ** Since 3.1.1, LoR/SVC/LiR/AFT supports blockification, and we need to blockify KMeans in the future. # *Standardization (virutal centering)* ** Existing impl of standardization in linear models does NOT center the vectors by removing the means, for the purpose of keeping dataset _*sparsity*_. However, this will cause feature values with small var be scaled to large values, and underlying solver like LBFGS can not efficiently handle this case. see SPARK-34448 for details. ** If internal vectors are centers (like other famous impl, i.e. GLMNET/Scikit-Learn), the convergence ratio will be better. In the case in SPARK-34448, the number of iteration to convergence will be reduced from 93 to 6. Moreover, the final solution is much more close to the one in GLMNET. ** Luckily, we find a new way to _*virtually*_ center the vectors without densifying the dataset. Good results had been observed in LoR, we will take it into account in other linear models. # _*Initialization (To be discussed)*_ ** Initializing model coef with a given model, should be beneficial to: 1, convergence ratio (should reduce number of iterations); 2, model stability (may obtain a new solution more close to the previous one); # _*Early Stopping* *(To be discussed)*_ ** we can compute the test error in the procedure (like tree models), and stop the training procedure if test error begin to increase; If you want to add other features in these models, please comment in the ticket. > Project Matrix: Linear Models revisit and refactor > -- > > Key: SPARK-30641 > URL: https://issues.apache.org/jira/browse/SPARK-30641 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.1.0, 3.2.0 >Reporter: zhengruifeng >Priority: Major > > We had been refactoring linear models for a long time, and there still are > some works in the future. After some discussions among [~huaxingao] [~srowen] > [~weichenxu123] [~mengxr] [~podongfeng] , we decide to gather related works > under a sub-project Matrix, it includes: > # *Blockification (vectorization of vectors)* > ** ve
[jira] [Resolved] (SPARK-35303) Enable pinned thread mode by default
[ https://issues.apache.org/jira/browse/SPARK-35303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-35303. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32429 [https://github.com/apache/spark/pull/32429] > Enable pinned thread mode by default > > > Key: SPARK-35303 > URL: https://issues.apache.org/jira/browse/SPARK-35303 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.2.0 > > > Pinned thread mode was added at SPARK-22340. We should enable it back to map > Python thread to JVM thread in order to prevent potential issues such as > thread local inheritance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35303) Enable pinned thread mode by default
[ https://issues.apache.org/jira/browse/SPARK-35303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-35303: Assignee: Hyukjin Kwon > Enable pinned thread mode by default > > > Key: SPARK-35303 > URL: https://issues.apache.org/jira/browse/SPARK-35303 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Pinned thread mode was added at SPARK-22340. We should enable it back to map > Python thread to JVM thread in order to prevent potential issues such as > thread local inheritance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35804) can't read external hive table on spark
[ https://issues.apache.org/jira/browse/SPARK-35804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cao zhiyu updated SPARK-35804: -- Description: I create a external hive table with hdfs file which is formatted as json string. I can read the data field of this hive table with the help of org.apache.hive.hcatalog.data.JsonSerDe which is packed in hive-hcatalog-core.jar in hive shell. But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just can't read it. It gave me a error Table: Unable to get field from serde: org.apache.hive.hcatalog.data.JsonSerDe I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn libs and rerun ,there is no effect,even use --jars $jar_path/hive-hcatalog-core.jar.But when I browse the webpage of spark task ,I can actually find the jar in the env list. was: I create a external hive table with hdfs file which is formatted as json string. I can read the data field of this hive table with the help of org.apache.hive.hcatalog.data.JsonSerDe which is packed in hive-hcatalog-core.jar in hive shell. But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just can't read it. It gave me a error Table: Unable to get field from serde: org.apache.hive.hcatalog.data.JsonSerDe I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn libs and rerun ,there is no effect.Even when I browse the webpage of spark task ,I can actually find the jar in the env list. > can't read external hive table on spark > --- > > Key: SPARK-35804 > URL: https://issues.apache.org/jira/browse/SPARK-35804 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core, Spark Shell >Affects Versions: 2.3.2 > Environment: hdp 3.1.4 > hive-hcatalog-core-3.1.0.3.1.4.0-315.jar & hive-hcatalog-core-3.1.2 both I've > tried > >Reporter: cao zhiyu >Priority: Critical > Labels: JSON, external-tables, hive, spark > > I create a external hive table with hdfs file which is formatted as json > string. > I can read the data field of this hive table with the help of > org.apache.hive.hcatalog.data.JsonSerDe which is packed in > hive-hcatalog-core.jar in hive shell. > But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just > can't read it. > It gave me a error Table: Unable to get field from serde: > org.apache.hive.hcatalog.data.JsonSerDe > I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn > libs and rerun ,there is no effect,even use --jars > $jar_path/hive-hcatalog-core.jar.But when I browse the webpage of spark task > ,I can actually find the jar in the env list. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35804) can't read external hive table on spark
cao zhiyu created SPARK-35804: - Summary: can't read external hive table on spark Key: SPARK-35804 URL: https://issues.apache.org/jira/browse/SPARK-35804 Project: Spark Issue Type: Bug Components: PySpark, Spark Core, Spark Shell Affects Versions: 2.3.2 Environment: hdp 3.1.4 hive-hcatalog-core-3.1.0.3.1.4.0-315.jar & hive-hcatalog-core-3.1.2 both I've tried Reporter: cao zhiyu I create a external hive table with hdfs file which is formatted as json string. I can read the data field of this hive table with the help of org.apache.hive.hcatalog.data.JsonSerDe which is packed in hive-hcatalog-core.jar in hive shell. But when I try to use the spark (pyspark ,spark-shell or whatever) ,I just can't read it. It gave me a error Table: Unable to get field from serde: org.apache.hive.hcatalog.data.JsonSerDe I've copied the jar (hive-hcatalog-core.jar) to $spark_home/jars and yarn libs and rerun ,there is no effect.Even when I browse the webpage of spark task ,I can actually find the jar in the env list. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
[ https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365215#comment-17365215 ] Apache Spark commented on SPARK-35472: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/32957 > Enable disallow_untyped_defs mypy check for pyspark.pandas.generic. > --- > > Key: SPARK-35472 > URL: https://issues.apache.org/jira/browse/SPARK-35472 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
[ https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35472: Assignee: Apache Spark > Enable disallow_untyped_defs mypy check for pyspark.pandas.generic. > --- > > Key: SPARK-35472 > URL: https://issues.apache.org/jira/browse/SPARK-35472 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
[ https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365214#comment-17365214 ] Apache Spark commented on SPARK-35472: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/32957 > Enable disallow_untyped_defs mypy check for pyspark.pandas.generic. > --- > > Key: SPARK-35472 > URL: https://issues.apache.org/jira/browse/SPARK-35472 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35472) Enable disallow_untyped_defs mypy check for pyspark.pandas.generic.
[ https://issues.apache.org/jira/browse/SPARK-35472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35472: Assignee: (was: Apache Spark) > Enable disallow_untyped_defs mypy check for pyspark.pandas.generic. > --- > > Key: SPARK-35472 > URL: https://issues.apache.org/jira/browse/SPARK-35472 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35803) Spark SQL does not support creating views using DataSource v2 based data sources
David Rabinowitz created SPARK-35803: Summary: Spark SQL does not support creating views using DataSource v2 based data sources Key: SPARK-35803 URL: https://issues.apache.org/jira/browse/SPARK-35803 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.1.2, 2.4.8 Reporter: David Rabinowitz When a temporary view is created in Spark SQL using an external data source, Spark then tries to create the relevant relation using DataSource.resolveRelation() method. Unlike DataFrameReader.load(), resolveRelation() does not check if the provided DataSource implements the DataSourceV2 interface and instead tries to use the RelationProvider trait in order to generate the Relation. Furthermore, DataSourceV2Relation is not a subclass of BaseRelation, so it cannot be used in resolveRelation(). Last, I tried to implement the RelationProvider trait in my Java implementation of DataSourceV2, but the match inside resolveRelation() did not detect it as RelationProvider. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35802) Error loading the stages/stage/ page in spark UI
Helt Long created SPARK-35802: - Summary: Error loading the stages/stage/ page in spark UI Key: SPARK-35802 URL: https://issues.apache.org/jira/browse/SPARK-35802 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.1.2, 3.1.1, 3.0.1, 3.0.0 Reporter: Helt Long I try to load the sparkUI page for a specific stage, I get the following error: {quote}Unable to connect to the server. Looks like the Spark application must have ended. Please Switch to the history UI. {quote} Obviously the server is still alive and process new messages. Looking at the network tab shows one of the requests fails: {{curl 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable' Error 500 Request failed. HTTP ERROR 500 Problem accessing /api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable. Reason: Request failed.http://eclipse.org/jetty";>Powered by Jetty:// 9.4.z-SNAPSHOT }} requests to any other object that I've tested seem to work, for example {{curl 'http://:8080/proxy/app-20201008130147-0001/api/v1/applications/app-20201008130147-0001/stages/11/0/taskSummary'}} The exception is: {{/api/v1/applications/app-20201008130147-0001/stages/11/0/taskTable javax.servlet.ServletException: java.lang.NullPointerException at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:410) at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319) at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:873) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1623) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1610) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:753) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.sparkproject.jetty.server.Server.handle(Server.java:505) at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:370) at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.sparkproject.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:698) at org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:804) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NullPointerException at org.apache.spark.status.api.v1.StagesResource.$anonfun$doPagination$1(StagesResource.scala:175) at org.apache.spark.status.api.v1.BaseAppResource.$anonfun$withUI$1(ApiRootResource.scala:140) at org.apache.spark.ui.SparkUI.withSparkUI(SparkUI.scala:107) at org.apache.spark.status.api.v1.BaseAppResource.withUI(ApiRootResource.scala:135) at org.apache.spark.status.api.v1.BaseAppResour
[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses
[ https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365194#comment-17365194 ] Sarth Frey commented on SPARK-22674: I can confirm this is affecting PySpark 3.1.1 > PySpark breaks serialization of namedtuple subclasses > - > > Key: SPARK-22674 > URL: https://issues.apache.org/jira/browse/SPARK-22674 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0, 2.3.0, 3.1.1 >Reporter: Jonas Amrich >Priority: Major > > Pyspark monkey patches the namedtuple class to make it serializable, however > this breaks serialization of its subclasses. With current implementation, any > subclass will be serialized (and deserialized) as it's parent namedtuple. > Consider this code, which will fail with {{AttributeError: 'Point' object has > no attribute 'sum'}}: > {code} > from collections import namedtuple > Point = namedtuple("Point", "x y") > class PointSubclass(Point): > def sum(self): > return self.x + self.y > rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]]) > rdd.collect()[0][0].sum() > {code} > Moreover, as PySpark hijacks all namedtuples in the main module, importing > pyspark breaks serialization of namedtuple subclasses even in code which is > not related to spark / distributed execution. I don't see any clean solution > to this; a possible workaround may be to limit serialization hack only to > direct namedtuple subclasses like in > https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses
[ https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sarth Frey updated SPARK-22674: --- Affects Version/s: 3.1.1 > PySpark breaks serialization of namedtuple subclasses > - > > Key: SPARK-22674 > URL: https://issues.apache.org/jira/browse/SPARK-22674 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.2.0, 2.3.0, 3.1.1 >Reporter: Jonas Amrich >Priority: Major > > Pyspark monkey patches the namedtuple class to make it serializable, however > this breaks serialization of its subclasses. With current implementation, any > subclass will be serialized (and deserialized) as it's parent namedtuple. > Consider this code, which will fail with {{AttributeError: 'Point' object has > no attribute 'sum'}}: > {code} > from collections import namedtuple > Point = namedtuple("Point", "x y") > class PointSubclass(Point): > def sum(self): > return self.x + self.y > rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]]) > rdd.collect()[0][0].sum() > {code} > Moreover, as PySpark hijacks all namedtuples in the main module, importing > pyspark breaks serialization of namedtuple subclasses even in code which is > not related to spark / distributed execution. I don't see any clean solution > to this; a possible workaround may be to limit serialization hack only to > direct namedtuple subclasses like in > https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35790) Spark Package Python Import does not work for namespace packages
[ https://issues.apache.org/jira/browse/SPARK-35790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Hamilton updated SPARK-35790: -- Description: If one includes python files within several jars that comprise a python "namespace package" [https://www.python.org/dev/peps/pep-0420/] Then only one of packages is imported was: If one includes python in a jar it will automatically be added to the classpath allowing for the distribution of java + python packages with a single jar If one depends on a mixed jar the python is not properly loaded Summary: Spark Package Python Import does not work for namespace packages (was: Spark Package Python Import does not work for depenant jars) > Spark Package Python Import does not work for namespace packages > > > Key: SPARK-35790 > URL: https://issues.apache.org/jira/browse/SPARK-35790 > Project: Spark > Issue Type: Bug > Components: Build, PySpark, Spark Submit >Affects Versions: 3.0.0, 3.1.2 >Reporter: Mark Hamilton >Priority: Major > > If one includes python files within several jars that comprise a python > "namespace package" > [https://www.python.org/dev/peps/pep-0420/] > Then only one of packages is imported -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
[ https://issues.apache.org/jira/browse/SPARK-35469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365138#comment-17365138 ] Apache Spark commented on SPARK-35469: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/32956 > Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors. > - > > Key: SPARK-35469 > URL: https://issues.apache.org/jira/browse/SPARK-35469 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
[ https://issues.apache.org/jira/browse/SPARK-35469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35469: Assignee: (was: Apache Spark) > Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors. > - > > Key: SPARK-35469 > URL: https://issues.apache.org/jira/browse/SPARK-35469 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35469) Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors.
[ https://issues.apache.org/jira/browse/SPARK-35469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35469: Assignee: Apache Spark > Enable disallow_untyped_defs mypy check for pyspark.pandas.accessors. > - > > Key: SPARK-35469 > URL: https://issues.apache.org/jira/browse/SPARK-35469 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35801) SPIP: Support MERGE in Data Source V2
Anton Okolnychyi created SPARK-35801: Summary: SPIP: Support MERGE in Data Source V2 Key: SPARK-35801 URL: https://issues.apache.org/jira/browse/SPARK-35801 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Anton Okolnychyi [MERGE INTO|https://en.wikipedia.org/wiki/Merge_(SQL)] is well suited to large-scale workloads because it can express operations to insert, update, or delete multiple rows in a single SQL command. Many updates can be expressed as MERGE INTO queries that would otherwise require much more SQL. Common patterns for updating partitions are to read, union, and overwrite or read, diff, and append. Using MERGE INTO, these operations are easier to express and can be more efficient to run. Hive supports [MERGE INTO|https://blog.cloudera.com/update-hive-tables-easy-way/] and Spark should implement similar support. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35095) Use ANSI intervals in streaming join tests
[ https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-35095. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32953 [https://github.com/apache/spark/pull/32953] > Use ANSI intervals in streaming join tests > -- > > Key: SPARK-35095 > URL: https://issues.apache.org/jira/browse/SPARK-35095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > Enable ANSI intervals in the tests: > - StreamingOuterJoinSuite.right outer with watermark range condition > - StreamingOuterJoinSuite.left outer with watermark range condition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35095) Use ANSI intervals in streaming join tests
[ https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-35095: Assignee: Kousuke Saruta > Use ANSI intervals in streaming join tests > -- > > Key: SPARK-35095 > URL: https://issues.apache.org/jira/browse/SPARK-35095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > > Enable ANSI intervals in the tests: > - StreamingOuterJoinSuite.right outer with watermark range condition > - StreamingOuterJoinSuite.left outer with watermark range condition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18107) Insert overwrite statement runs much slower in spark-sql than it does in hive-client
[ https://issues.apache.org/jira/browse/SPARK-18107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365072#comment-17365072 ] Hemanth commented on SPARK-18107: - We are seeing this issue still exists on Spark versions 2.3.1 and 2.4.7. > Insert overwrite statement runs much slower in spark-sql than it does in > hive-client > > > Key: SPARK-18107 > URL: https://issues.apache.org/jira/browse/SPARK-18107 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 > Environment: spark 2.0.0 > hive 2.0.1 >Reporter: snodawn >Assignee: L. C. Hsieh >Priority: Major > Fix For: 2.1.0 > > > I find insert overwrite statement running in spark-sql or spark-shell spends > much more time than it does in hive-client (i start it in > apache-hive-2.0.1-bin/bin/hive ), where spark costs about ten minutes but > hive-client just costs less than 20 seconds. > These are the steps I took. > Test sql is : > insert overwrite table login4game partition(pt='mix_en',dt='2016-10-21') > select distinct account_name,role_id,server,'1476979200' as recdate, 'mix' as > platform, 'mix' as pid, 'mix' as dev from tbllog_login where pt='mix_en' and > dt='2016-10-21' ; > there are 257128 lines of data in tbllog_login with > partition(pt='mix_en',dt='2016-10-21') > ps: > I'm sure it must be "insert overwrite" costing a lot of time in spark, may be > when doing overwrite, it need to spend a lot of time in io or in something > else. > I also compare the executing time between insert overwrite statement and > insert into statement. > 1. insert overwrite statement and insert into statement in spark: > insert overwrite statement costs about 10 minutes > insert into statement costs about 30 seconds > 2. insert into statement in spark and insert into statement in hive-client: > spark costs about 30 seconds > hive-client costs about 20 seconds > the difference is little that we can ignore > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35670) Upgrade ZSTD-JNI to 1.5.0-1
[ https://issues.apache.org/jira/browse/SPARK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-35670. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32826 [https://github.com/apache/spark/pull/32826] > Upgrade ZSTD-JNI to 1.5.0-1 > --- > > Key: SPARK-35670 > URL: https://issues.apache.org/jira/browse/SPARK-35670 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: apache.org >Assignee: apache.org >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35670) Upgrade ZSTD-JNI to 1.5.0-1
[ https://issues.apache.org/jira/browse/SPARK-35670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-35670: - Assignee: apache.org > Upgrade ZSTD-JNI to 1.5.0-1 > --- > > Key: SPARK-35670 > URL: https://issues.apache.org/jira/browse/SPARK-35670 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: apache.org >Assignee: apache.org >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35720) Support casting of String to timestamp without time zone type
[ https://issues.apache.org/jira/browse/SPARK-35720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-35720: -- Assignee: Gengliang Wang > Support casting of String to timestamp without time zone type > - > > Key: SPARK-35720 > URL: https://issues.apache.org/jira/browse/SPARK-35720 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Extend the Cast expression and support in casting StringType > toTimestampWithoutTZType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35720) Support casting of String to timestamp without time zone type
[ https://issues.apache.org/jira/browse/SPARK-35720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-35720. Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32936 [https://github.com/apache/spark/pull/32936] > Support casting of String to timestamp without time zone type > - > > Key: SPARK-35720 > URL: https://issues.apache.org/jira/browse/SPARK-35720 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Extend the Cast expression and support in casting StringType > toTimestampWithoutTZType -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-35800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35800: Assignee: Apache Spark > Improving testability of GroupState in streaming flatMapGroupsWithState > --- > > Key: SPARK-35800 > URL: https://issues.apache.org/jira/browse/SPARK-35800 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Tathagata Das >Assignee: Apache Spark >Priority: Major > > GroupStateImpl is the internal implementation of the GroupState interface > which mean to be not exposed. Thus, it only has a private constructor. Such > access control does benefit encapsulation, however, this introduces > difficulties for unit tests and the users are calling the engine to construct > such GroupState instances in order to test their customized state transition > functions. > The solution is to introduce new interfaces that allow users to create > instances of GroupState but also access internal values of what they have set > (for example, has to state been updated, or removed). This would allow them > to write unit tests of the state transition function with custom GroupState > objects and then verifying whether the state was updated in an expected way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-35800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365056#comment-17365056 ] Apache Spark commented on SPARK-35800: -- User 'lizhangdatabricks' has created a pull request for this issue: https://github.com/apache/spark/pull/32938 > Improving testability of GroupState in streaming flatMapGroupsWithState > --- > > Key: SPARK-35800 > URL: https://issues.apache.org/jira/browse/SPARK-35800 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Tathagata Das >Priority: Major > > GroupStateImpl is the internal implementation of the GroupState interface > which mean to be not exposed. Thus, it only has a private constructor. Such > access control does benefit encapsulation, however, this introduces > difficulties for unit tests and the users are calling the engine to construct > such GroupState instances in order to test their customized state transition > functions. > The solution is to introduce new interfaces that allow users to create > instances of GroupState but also access internal values of what they have set > (for example, has to state been updated, or removed). This would allow them > to write unit tests of the state transition function with custom GroupState > objects and then verifying whether the state was updated in an expected way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState
[ https://issues.apache.org/jira/browse/SPARK-35800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35800: Assignee: (was: Apache Spark) > Improving testability of GroupState in streaming flatMapGroupsWithState > --- > > Key: SPARK-35800 > URL: https://issues.apache.org/jira/browse/SPARK-35800 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Tathagata Das >Priority: Major > > GroupStateImpl is the internal implementation of the GroupState interface > which mean to be not exposed. Thus, it only has a private constructor. Such > access control does benefit encapsulation, however, this introduces > difficulties for unit tests and the users are calling the engine to construct > such GroupState instances in order to test their customized state transition > functions. > The solution is to introduce new interfaces that allow users to create > instances of GroupState but also access internal values of what they have set > (for example, has to state been updated, or removed). This would allow them > to write unit tests of the state transition function with custom GroupState > objects and then verifying whether the state was updated in an expected way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35800) Improving testability of GroupState in streaming flatMapGroupsWithState
Tathagata Das created SPARK-35800: - Summary: Improving testability of GroupState in streaming flatMapGroupsWithState Key: SPARK-35800 URL: https://issues.apache.org/jira/browse/SPARK-35800 Project: Spark Issue Type: New Feature Components: Structured Streaming Affects Versions: 3.1.2 Reporter: Tathagata Das GroupStateImpl is the internal implementation of the GroupState interface which mean to be not exposed. Thus, it only has a private constructor. Such access control does benefit encapsulation, however, this introduces difficulties for unit tests and the users are calling the engine to construct such GroupState instances in order to test their customized state transition functions. The solution is to introduce new interfaces that allow users to create instances of GroupState but also access internal values of what they have set (for example, has to state been updated, or removed). This would allow them to write unit tests of the state transition function with custom GroupState objects and then verifying whether the state was updated in an expected way. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35095) Use ANSI intervals in streaming join tests
[ https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35095: Assignee: Apache Spark > Use ANSI intervals in streaming join tests > -- > > Key: SPARK-35095 > URL: https://issues.apache.org/jira/browse/SPARK-35095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Enable ANSI intervals in the tests: > - StreamingOuterJoinSuite.right outer with watermark range condition > - StreamingOuterJoinSuite.left outer with watermark range condition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35095) Use ANSI intervals in streaming join tests
[ https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365052#comment-17365052 ] Apache Spark commented on SPARK-35095: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32953 > Use ANSI intervals in streaming join tests > -- > > Key: SPARK-35095 > URL: https://issues.apache.org/jira/browse/SPARK-35095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Enable ANSI intervals in the tests: > - StreamingOuterJoinSuite.right outer with watermark range condition > - StreamingOuterJoinSuite.left outer with watermark range condition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35095) Use ANSI intervals in streaming join tests
[ https://issues.apache.org/jira/browse/SPARK-35095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35095: Assignee: (was: Apache Spark) > Use ANSI intervals in streaming join tests > -- > > Key: SPARK-35095 > URL: https://issues.apache.org/jira/browse/SPARK-35095 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Enable ANSI intervals in the tests: > - StreamingOuterJoinSuite.right outer with watermark range condition > - StreamingOuterJoinSuite.left outer with watermark range condition -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
[ https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35799: Assignee: (was: Apache Spark) > Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec > --- > > Key: SPARK-35799 > URL: https://issues.apache.org/jira/browse/SPARK-35799 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Priority: Minor > > Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the > operator {{FlatMapGroupsWithStateExec}}, but currently it just > [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] > the iterator creation time. > Fix it to measure similar to how other stateful operators measure. Example > one > [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. > This measurement is not perfect due to the nature of the lazy iterator and > also includes the time the consumer operator spent in processing the current > operator output, but it should give a good signal when comparing the metric > in one microbatch to the metric in another microbatch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
[ https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35799: Assignee: Apache Spark > Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec > --- > > Key: SPARK-35799 > URL: https://issues.apache.org/jira/browse/SPARK-35799 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Assignee: Apache Spark >Priority: Minor > > Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the > operator {{FlatMapGroupsWithStateExec}}, but currently it just > [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] > the iterator creation time. > Fix it to measure similar to how other stateful operators measure. Example > one > [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. > This measurement is not perfect due to the nature of the lazy iterator and > also includes the time the consumer operator spent in processing the current > operator output, but it should give a good signal when comparing the metric > in one microbatch to the metric in another microbatch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
[ https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365049#comment-17365049 ] Apache Spark commented on SPARK-35799: -- User 'vkorukanti' has created a pull request for this issue: https://github.com/apache/spark/pull/32952 > Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec > --- > > Key: SPARK-35799 > URL: https://issues.apache.org/jira/browse/SPARK-35799 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Priority: Minor > > Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the > operator {{FlatMapGroupsWithStateExec}}, but currently it just > [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] > the iterator creation time. > Fix it to measure similar to how other stateful operators measure. Example > one > [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. > This measurement is not perfect due to the nature of the lazy iterator and > also includes the time the consumer operator spent in processing the current > operator output, but it should give a good signal when comparing the metric > in one microbatch to the metric in another microbatch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
[ https://issues.apache.org/jira/browse/SPARK-35799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Venki Korukanti updated SPARK-35799: Description: Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the operator {{FlatMapGroupsWithStateExec}}, but currently it just [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] the iterator creation time. Fix it to measure similar to how other stateful operators measure. Example one [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. This measurement is not perfect due to the nature of the lazy iterator and also includes the time the consumer operator spent in processing the current operator output, but it should give a good signal when comparing the metric in one microbatch to the metric in another microbatch. was: Metric `allUpdatesTimeMs` meant to capture the start to end walltime of the operator `FlatMapGroupsWithStateExec`, but currently it just [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] the iterator creation time. Fix it to measure similar to how other stateful operators measure. Example one [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. This measurement is not perfect due to the nature of the lazy iterator and also includes the time the consumer operator spent in processing the current operator output, but it should give a good signal when comparing the metric in one microbatch to the metric in another microbatch. > Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec > --- > > Key: SPARK-35799 > URL: https://issues.apache.org/jira/browse/SPARK-35799 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.1.2 >Reporter: Venki Korukanti >Priority: Minor > > Metric {{allUpdatesTimeMs}} meant to capture the start to end walltime of the > operator {{FlatMapGroupsWithStateExec}}, but currently it just > [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] > the iterator creation time. > Fix it to measure similar to how other stateful operators measure. Example > one > [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. > This measurement is not perfect due to the nature of the lazy iterator and > also includes the time the consumer operator spent in processing the current > operator output, but it should give a good signal when comparing the metric > in one microbatch to the metric in another microbatch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35799) Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec
Venki Korukanti created SPARK-35799: --- Summary: Fix the allUpdatesTimeMs metric measuring in FlatMapGroupsWithStateExec Key: SPARK-35799 URL: https://issues.apache.org/jira/browse/SPARK-35799 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.1.2 Reporter: Venki Korukanti Metric `allUpdatesTimeMs` meant to capture the start to end walltime of the operator `FlatMapGroupsWithStateExec`, but currently it just [captures|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala#L121] the iterator creation time. Fix it to measure similar to how other stateful operators measure. Example one [here|https://github.com/apache/spark/blob/79362c4efcb6bd4b575438330a14a6191cca5e4b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala#L406]. This measurement is not perfect due to the nature of the lazy iterator and also includes the time the consumer operator spent in processing the current operator output, but it should give a good signal when comparing the metric in one microbatch to the metric in another microbatch. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34898) Send ExecutorMetricsUpdate EventLog appropriately
[ https://issues.apache.org/jira/browse/SPARK-34898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mridul Muralidharan resolved SPARK-34898. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31992 [https://github.com/apache/spark/pull/31992] > Send ExecutorMetricsUpdate EventLog appropriately > - > > Key: SPARK-34898 > URL: https://issues.apache.org/jira/browse/SPARK-34898 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > In current EventLoggingListener, we won't write > SparkListenerExecutorMetricsUpdate message at all > {code:java} > override def onExecutorMetricsUpdate(event: > SparkListenerExecutorMetricsUpdate): Unit = { > if (shouldLogStageExecutorMetrics) { > event.executorUpdates.foreach { case (stageKey1, newPeaks) => > liveStageExecutorMetrics.foreach { case (stageKey2, metricsPerExecutor) > => > // If the update came from the driver, stageKey1 will be the dummy > key (-1, -1), > // so record those peaks for all active stages. > // Otherwise, record the peaks for the matching stage. > if (stageKey1 == DRIVER_STAGE_KEY || stageKey1 == stageKey2) { > val metrics = metricsPerExecutor.getOrElseUpdate( > event.execId, new ExecutorMetrics()) > metrics.compareAndUpdatePeakValues(newPeaks) > } > } > } > } > } > {code} > It causes this effect that we can't get driver peakMemoryMetrics in SHS. We > can get executor's since it will update with TaskEnd events. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35789) Lateral join should only be used with subquery
[ https://issues.apache.org/jira/browse/SPARK-35789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35789. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32937 [https://github.com/apache/spark/pull/32937] > Lateral join should only be used with subquery > -- > > Key: SPARK-35789 > URL: https://issues.apache.org/jira/browse/SPARK-35789 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.2.0 > > > This is a follow up for SPARK-34382. Currently the keyword LATERAL can be > used in front of a `relationPrimary`, which consists of more than just > subqueries, for example: > select * from t1, lateral t2 > Such syntax is not allowed in Postgres. LATERAL should only be used in front > of a subquery. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35789) Lateral join should only be used with subquery
[ https://issues.apache.org/jira/browse/SPARK-35789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35789: --- Assignee: Allison Wang > Lateral join should only be used with subquery > -- > > Key: SPARK-35789 > URL: https://issues.apache.org/jira/browse/SPARK-35789 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > This is a follow up for SPARK-34382. Currently the keyword LATERAL can be > used in front of a `relationPrimary`, which consists of more than just > subqueries, for example: > select * from t1, lateral t2 > Such syntax is not allowed in Postgres. LATERAL should only be used in front > of a subquery. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35782) leveldbjni doesn't work in Apple Silicon on macOS
[ https://issues.apache.org/jira/browse/SPARK-35782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365014#comment-17365014 ] DB Tsai commented on SPARK-35782: - [~yikunkero] It will only work for Apple Silicon on Linux but not macOS. For macOS, we need to recompile for this specific OS. > leveldbjni doesn't work in Apple Silicon on macOS > - > > Key: SPARK-35782 > URL: https://issues.apache.org/jira/browse/SPARK-35782 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: DB Tsai >Priority: Major > > leveldbjni doesn't contain the native library for Apple Silicon on macOS. We > will need to build native library for Apple Silicon on macOS, and cut a new > release so Spark can use it. > However, it is not maintained for a long time, and the last release was in > 2016. Per > [discussion|http://apache-spark-developers-list.1001551.n3.nabble.com/leveldbjni-dependency-td30146.html] > in spark dev mailing list, other platform also runs into the same support > issue. Perhaps, we should we consider racksdb as replacement. > Note, here is the rocksdb task to support Apple Silicon, > https://github.com/facebook/rocksdb/issues/7720 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33603) Group exception messages in execution/command
[ https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365011#comment-17365011 ] Apache Spark commented on SPARK-33603: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/32951 > Group exception messages in execution/command > - > > Key: SPARK-33603 > URL: https://issues.apache.org/jira/browse/SPARK-33603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/command' > || Filename || Count || > | AnalyzeColumnCommand.scala| 3 | > | AnalyzePartitionCommand.scala | 2 | > | AnalyzeTableCommand.scala | 1 | > | SetCommand.scala | 2 | > | createDataSourceTables.scala | 2 | > | ddl.scala | 1 | > | functions.scala | 4 | > | tables.scala | 7 | > | views.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33603) Group exception messages in execution/command
[ https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17365010#comment-17365010 ] Apache Spark commented on SPARK-33603: -- User 'dgd-contributor' has created a pull request for this issue: https://github.com/apache/spark/pull/32951 > Group exception messages in execution/command > - > > Key: SPARK-33603 > URL: https://issues.apache.org/jira/browse/SPARK-33603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/command' > || Filename || Count || > | AnalyzeColumnCommand.scala| 3 | > | AnalyzePartitionCommand.scala | 2 | > | AnalyzeTableCommand.scala | 1 | > | SetCommand.scala | 2 | > | createDataSourceTables.scala | 2 | > | ddl.scala | 1 | > | functions.scala | 4 | > | tables.scala | 7 | > | views.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33603) Group exception messages in execution/command
[ https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33603: Assignee: (was: Apache Spark) > Group exception messages in execution/command > - > > Key: SPARK-33603 > URL: https://issues.apache.org/jira/browse/SPARK-33603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/command' > || Filename || Count || > | AnalyzeColumnCommand.scala| 3 | > | AnalyzePartitionCommand.scala | 2 | > | AnalyzeTableCommand.scala | 1 | > | SetCommand.scala | 2 | > | createDataSourceTables.scala | 2 | > | ddl.scala | 1 | > | functions.scala | 4 | > | tables.scala | 7 | > | views.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33603) Group exception messages in execution/command
[ https://issues.apache.org/jira/browse/SPARK-33603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33603: Assignee: Apache Spark > Group exception messages in execution/command > - > > Key: SPARK-33603 > URL: https://issues.apache.org/jira/browse/SPARK-33603 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/command' > || Filename || Count || > | AnalyzeColumnCommand.scala| 3 | > | AnalyzePartitionCommand.scala | 2 | > | AnalyzeTableCommand.scala | 1 | > | SetCommand.scala | 2 | > | createDataSourceTables.scala | 2 | > | ddl.scala | 1 | > | functions.scala | 4 | > | tables.scala | 7 | > | views.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34054) BlockManagerDecommissioner cleanup
[ https://issues.apache.org/jira/browse/SPARK-34054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34054. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31102 [https://github.com/apache/spark/pull/31102] > BlockManagerDecommissioner cleanup > -- > > Key: SPARK-34054 > URL: https://issues.apache.org/jira/browse/SPARK-34054 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Fix For: 3.2.0 > > > Code cleanup for BlockManagerDecommissioner to fix/improve some issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34054) BlockManagerDecommissioner cleanup
[ https://issues.apache.org/jira/browse/SPARK-34054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34054: --- Assignee: wuyi > BlockManagerDecommissioner cleanup > -- > > Key: SPARK-34054 > URL: https://issues.apache.org/jira/browse/SPARK-34054 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > > Code cleanup for BlockManagerDecommissioner to fix/improve some issues. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35798) Fix SparkPlan.sqlContext usage
[ https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35798. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 32947 [https://github.com/apache/spark/pull/32947] > Fix SparkPlan.sqlContext usage > -- > > Key: SPARK-35798 > URL: https://issues.apache.org/jira/browse/SPARK-35798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Minor > Fix For: 3.2.0 > > > There might be SparkPlan nodes where canonicalization on executor side can > cause issues. > Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35798) Fix SparkPlan.sqlContext usage
[ https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35798: --- Assignee: Peter Toth > Fix SparkPlan.sqlContext usage > -- > > Key: SPARK-35798 > URL: https://issues.apache.org/jira/browse/SPARK-35798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Minor > > There might be SparkPlan nodes where canonicalization on executor side can > cause issues. > Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35792) View should not capture configs used in `RelationConversions`
[ https://issues.apache.org/jira/browse/SPARK-35792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-35792. - Fix Version/s: 3.1.3 3.2.0 Resolution: Fixed Issue resolved by pull request 32941 [https://github.com/apache/spark/pull/32941] > View should not capture configs used in `RelationConversions` > - > > Key: SPARK-35792 > URL: https://issues.apache.org/jira/browse/SPARK-35792 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Linhong Liu >Priority: Major > Fix For: 3.2.0, 3.1.3 > > > RelationConversions is actually a optimization rule while it's executed in > the analysis phase. For view, it's designed to only capture sementic configs, > so we should ignore the configs related to `RelationConversions` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35792) View should not capture configs used in `RelationConversions`
[ https://issues.apache.org/jira/browse/SPARK-35792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-35792: --- Assignee: Linhong Liu > View should not capture configs used in `RelationConversions` > - > > Key: SPARK-35792 > URL: https://issues.apache.org/jira/browse/SPARK-35792 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Linhong Liu >Assignee: Linhong Liu >Priority: Major > > RelationConversions is actually a optimization rule while it's executed in > the analysis phase. For view, it's designed to only capture sementic configs, > so we should ignore the configs related to `RelationConversions` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type
[ https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35726: Assignee: (was: Apache Spark) > Truncate java.time.Duration by fields of day-time interval type > --- > > Key: SPARK-35726 > URL: https://issues.apache.org/jira/browse/SPARK-35726 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Truncate input java.time.Duration instances using fields of > DayTimeIntervalType. For example, if DayTimeIntervalType has the end field > HOUR, granularity of DayTimeIntervalType values should hours too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type
[ https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35726: Assignee: (was: Apache Spark) > Truncate java.time.Duration by fields of day-time interval type > --- > > Key: SPARK-35726 > URL: https://issues.apache.org/jira/browse/SPARK-35726 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Truncate input java.time.Duration instances using fields of > DayTimeIntervalType. For example, if DayTimeIntervalType has the end field > HOUR, granularity of DayTimeIntervalType values should hours too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type
[ https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364923#comment-17364923 ] Apache Spark commented on SPARK-35726: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/32950 > Truncate java.time.Duration by fields of day-time interval type > --- > > Key: SPARK-35726 > URL: https://issues.apache.org/jira/browse/SPARK-35726 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Truncate input java.time.Duration instances using fields of > DayTimeIntervalType. For example, if DayTimeIntervalType has the end field > HOUR, granularity of DayTimeIntervalType values should hours too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35773) Parse year-month interval literals to tightest types
[ https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364922#comment-17364922 ] Apache Spark commented on SPARK-35773: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32949 > Parse year-month interval literals to tightest types > > > Key: SPARK-35773 > URL: https://issues.apache.org/jira/browse/SPARK-35773 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > Modify AstBuilder.visitInterval to parse year-month interval literals to > tightest types. For example: > INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but > not as YearMonthIntervalType(YEAR, MONTH). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35726) Truncate java.time.Duration by fields of day-time interval type
[ https://issues.apache.org/jira/browse/SPARK-35726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35726: Assignee: Apache Spark > Truncate java.time.Duration by fields of day-time interval type > --- > > Key: SPARK-35726 > URL: https://issues.apache.org/jira/browse/SPARK-35726 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Truncate input java.time.Duration instances using fields of > DayTimeIntervalType. For example, if DayTimeIntervalType has the end field > HOUR, granularity of DayTimeIntervalType values should hours too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35773) Parse year-month interval literals to tightest types
[ https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364921#comment-17364921 ] Apache Spark commented on SPARK-35773: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32949 > Parse year-month interval literals to tightest types > > > Key: SPARK-35773 > URL: https://issues.apache.org/jira/browse/SPARK-35773 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > Modify AstBuilder.visitInterval to parse year-month interval literals to > tightest types. For example: > INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but > not as YearMonthIntervalType(YEAR, MONTH). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35773) Parse year-month interval literals to tightest types
[ https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35773: Assignee: Apache Spark (was: Kousuke Saruta) > Parse year-month interval literals to tightest types > > > Key: SPARK-35773 > URL: https://issues.apache.org/jira/browse/SPARK-35773 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0 > > > Modify AstBuilder.visitInterval to parse year-month interval literals to > tightest types. For example: > INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but > not as YearMonthIntervalType(YEAR, MONTH). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35773) Parse year-month interval literals to tightest types
[ https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364920#comment-17364920 ] Apache Spark commented on SPARK-35773: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32949 > Parse year-month interval literals to tightest types > > > Key: SPARK-35773 > URL: https://issues.apache.org/jira/browse/SPARK-35773 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > Modify AstBuilder.visitInterval to parse year-month interval literals to > tightest types. For example: > INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but > not as YearMonthIntervalType(YEAR, MONTH). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35773) Parse year-month interval literals to tightest types
[ https://issues.apache.org/jira/browse/SPARK-35773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35773: Assignee: Kousuke Saruta (was: Apache Spark) > Parse year-month interval literals to tightest types > > > Key: SPARK-35773 > URL: https://issues.apache.org/jira/browse/SPARK-35773 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.2.0 > > > Modify AstBuilder.visitInterval to parse year-month interval literals to > tightest types. For example: > INTERVAL '10' YEAR should be parsed as YearMonthIntervalType(YEAR, YEAR) but > not as YearMonthIntervalType(YEAR, MONTH). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types
[ https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364919#comment-17364919 ] Apache Spark commented on SPARK-35749: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32949 > Parse unit list interval literals as year-month/day-time interval types > --- > > Key: SPARK-35749 > URL: https://issues.apache.org/jira/browse/SPARK-35749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Priority: Major > > Currently, unit list interval literals like `interval '1' year '2' months` or > `interval '1' day` or `interval '2' hours` are parsed as > `CalendarIntervalType`. > Such fields should be parsed as `YearMonthIntervalType` or > `DayTimeIntervalType`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types
[ https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35749: Assignee: Apache Spark > Parse unit list interval literals as year-month/day-time interval types > --- > > Key: SPARK-35749 > URL: https://issues.apache.org/jira/browse/SPARK-35749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > Currently, unit list interval literals like `interval '1' year '2' months` or > `interval '1' day` or `interval '2' hours` are parsed as > `CalendarIntervalType`. > Such fields should be parsed as `YearMonthIntervalType` or > `DayTimeIntervalType`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types
[ https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35749: Assignee: (was: Apache Spark) > Parse unit list interval literals as year-month/day-time interval types > --- > > Key: SPARK-35749 > URL: https://issues.apache.org/jira/browse/SPARK-35749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Priority: Major > > Currently, unit list interval literals like `interval '1' year '2' months` or > `interval '1' day` or `interval '2' hours` are parsed as > `CalendarIntervalType`. > Such fields should be parsed as `YearMonthIntervalType` or > `DayTimeIntervalType`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types
[ https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364918#comment-17364918 ] Apache Spark commented on SPARK-35749: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/32949 > Parse unit list interval literals as year-month/day-time interval types > --- > > Key: SPARK-35749 > URL: https://issues.apache.org/jira/browse/SPARK-35749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Priority: Major > > Currently, unit list interval literals like `interval '1' year '2' months` or > `interval '1' day` or `interval '2' hours` are parsed as > `CalendarIntervalType`. > Such fields should be parsed as `YearMonthIntervalType` or > `DayTimeIntervalType`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types
[ https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35749: --- Description: Currently, unit list interval literals like `interval '1' year '2' months` or `interval '1' day` or `interval '2' hours` are parsed as `CalendarIntervalType`. Such fields should be parsed as `YearMonthIntervalType` or `DayTimeIntervalType`. was: Currently, single unit field interval literals like `interval '1' year '2' months` or `interval '1' day` or `interval '2' hours` are parsed as `CalendarIntervalType`. Such fields should be parsed as `YearMonthIntervalType` or `DayTimeIntervalType`. > Parse unit list interval literals as year-month/day-time interval types > --- > > Key: SPARK-35749 > URL: https://issues.apache.org/jira/browse/SPARK-35749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Priority: Major > > Currently, unit list interval literals like `interval '1' year '2' months` or > `interval '1' day` or `interval '2' hours` are parsed as > `CalendarIntervalType`. > Such fields should be parsed as `YearMonthIntervalType` or > `DayTimeIntervalType`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35749) Parse unit list interval literals as year-month/day-time interval types
[ https://issues.apache.org/jira/browse/SPARK-35749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-35749: --- Summary: Parse unit list interval literals as year-month/day-time interval types (was: Parse multiple unit fields interval literals as year-month/day-time interval types) > Parse unit list interval literals as year-month/day-time interval types > --- > > Key: SPARK-35749 > URL: https://issues.apache.org/jira/browse/SPARK-35749 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Priority: Major > > Currently, single unit field interval literals like `interval '1' year '2' > months` or `interval '1' day` or `interval '2' hours` are parsed as > `CalendarIntervalType`. > Such fields should be parsed as `YearMonthIntervalType` or > `DayTimeIntervalType`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
[ https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364909#comment-17364909 ] Apache Spark commented on SPARK-35796: -- User 'toujours33' has created a pull request for this issue: https://github.com/apache/spark/pull/32948 > UT `handles k8s cluster mode` fails on MacOs >= 10.15 > - > > Key: SPARK-35796 > URL: https://issues.apache.org/jira/browse/SPARK-35796 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.3 > Environment: MacOs 10.15.7 >Reporter: Yazhi Wang >Priority: Minor > > When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for > `handles k8s cluster mode` test after pr > [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to > `File(path).getCanonicalFile().toURI()` function with absolute path as > parameter will return path begin with /System/Volumes/Data. > eg. /home/testjars.jar will get > [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35798) Fix SparkPlan.sqlContext usage
[ https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35798: Assignee: (was: Apache Spark) > Fix SparkPlan.sqlContext usage > -- > > Key: SPARK-35798 > URL: https://issues.apache.org/jira/browse/SPARK-35798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Minor > > There might be SparkPlan nodes where canonicalization on executor side can > cause issues. > Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35798) Fix SparkPlan.sqlContext usage
[ https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364903#comment-17364903 ] Apache Spark commented on SPARK-35798: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/32947 > Fix SparkPlan.sqlContext usage > -- > > Key: SPARK-35798 > URL: https://issues.apache.org/jira/browse/SPARK-35798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Minor > > There might be SparkPlan nodes where canonicalization on executor side can > cause issues. > Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35798) Fix SparkPlan.sqlContext usage
[ https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35798: Assignee: Apache Spark > Fix SparkPlan.sqlContext usage > -- > > Key: SPARK-35798 > URL: https://issues.apache.org/jira/browse/SPARK-35798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Minor > > There might be SparkPlan nodes where canonicalization on executor side can > cause issues. > Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35798) Fix SparkPlan.sqlContext usage
[ https://issues.apache.org/jira/browse/SPARK-35798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364902#comment-17364902 ] Apache Spark commented on SPARK-35798: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/32947 > Fix SparkPlan.sqlContext usage > -- > > Key: SPARK-35798 > URL: https://issues.apache.org/jira/browse/SPARK-35798 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Minor > > There might be SparkPlan nodes where canonicalization on executor side can > cause issues. > Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
[ https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35796: Assignee: Apache Spark > UT `handles k8s cluster mode` fails on MacOs >= 10.15 > - > > Key: SPARK-35796 > URL: https://issues.apache.org/jira/browse/SPARK-35796 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.3 > Environment: MacOs 10.15.7 >Reporter: Yazhi Wang >Assignee: Apache Spark >Priority: Minor > > When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for > `handles k8s cluster mode` test after pr > [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to > `File(path).getCanonicalFile().toURI()` function with absolute path as > parameter will return path begin with /System/Volumes/Data. > eg. /home/testjars.jar will get > [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
[ https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35796: Assignee: (was: Apache Spark) > UT `handles k8s cluster mode` fails on MacOs >= 10.15 > - > > Key: SPARK-35796 > URL: https://issues.apache.org/jira/browse/SPARK-35796 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.3 > Environment: MacOs 10.15.7 >Reporter: Yazhi Wang >Priority: Minor > > When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for > `handles k8s cluster mode` test after pr > [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to > `File(path).getCanonicalFile().toURI()` function with absolute path as > parameter will return path begin with /System/Volumes/Data. > eg. /home/testjars.jar will get > [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35796) UT `handles k8s cluster mode` fails on MacOs >= 10.15
[ https://issues.apache.org/jira/browse/SPARK-35796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17364899#comment-17364899 ] Apache Spark commented on SPARK-35796: -- User 'toujours33' has created a pull request for this issue: https://github.com/apache/spark/pull/32946 > UT `handles k8s cluster mode` fails on MacOs >= 10.15 > - > > Key: SPARK-35796 > URL: https://issues.apache.org/jira/browse/SPARK-35796 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.1.3 > Environment: MacOs 10.15.7 >Reporter: Yazhi Wang >Priority: Minor > > When I run SparkSubmitSuite on MacOs 10.15.7, I got AssertionError for > `handles k8s cluster mode` test after pr > [SPARK-35691|https://issues.apache.org/jira/browse/SPARK-35691] due to > `File(path).getCanonicalFile().toURI()` function with absolute path as > parameter will return path begin with /System/Volumes/Data. > eg. /home/testjars.jar will get > [file:/System/Volumes/Data/home/testjars.jar|file:///System/Volumes/Data/home/testjars.jar] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35798) Fix SparkPlan.sqlContext usage
Peter Toth created SPARK-35798: -- Summary: Fix SparkPlan.sqlContext usage Key: SPARK-35798 URL: https://issues.apache.org/jira/browse/SPARK-35798 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Peter Toth There might be SparkPlan nodes where canonicalization on executor side can cause issues. Mode details here: https://github.com/apache/spark/pull/32885/files#r651019687 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org