[jira] [Comment Edited] (SPARK-33582) Hive partition pruning support not-equals
[ https://issues.apache.org/jira/browse/SPARK-33582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239919#comment-17239919 ] Yuming Wang edited comment on SPARK-33582 at 11/28/20, 7:52 AM: This change should after SPARK-33581, I have prepared the pr: https://github.com/wangyum/spark/tree/SPARK-33582 was (Author: q79969786): This changed should after SPARK-33581, I have prepared the pr: https://github.com/wangyum/spark/tree/SPARK-33582 > Hive partition pruning support not-equals > - > > Key: SPARK-33582 > URL: https://issues.apache.org/jira/browse/SPARK-33582 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > https://github.com/apache/hive/blob/b8bd4594bef718b1eeac9fceb437d7df7b480ed1/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L2194-L2207 > https://issues.apache.org/jira/browse/HIVE-2702 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33582) Hive partition pruning support not-equals
[ https://issues.apache.org/jira/browse/SPARK-33582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239919#comment-17239919 ] Yuming Wang commented on SPARK-33582: - This changed should after SPARK-33581, I have prepared the pr: https://github.com/wangyum/spark/tree/SPARK-33582 > Hive partition pruning support not-equals > - > > Key: SPARK-33582 > URL: https://issues.apache.org/jira/browse/SPARK-33582 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > https://github.com/apache/hive/blob/b8bd4594bef718b1eeac9fceb437d7df7b480ed1/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L2194-L2207 > https://issues.apache.org/jira/browse/HIVE-2702 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33582) Hive partition pruning support not-equals
Yuming Wang created SPARK-33582: --- Summary: Hive partition pruning support not-equals Key: SPARK-33582 URL: https://issues.apache.org/jira/browse/SPARK-33582 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.0 Reporter: Yuming Wang Assignee: Yuming Wang https://github.com/apache/hive/blob/b8bd4594bef718b1eeac9fceb437d7df7b480ed1/itests/hive-unit/src/test/java/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java#L2194-L2207 https://issues.apache.org/jira/browse/HIVE-2702 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33581) Refactor HivePartitionFilteringSuite
[ https://issues.apache.org/jira/browse/SPARK-33581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239887#comment-17239887 ] Apache Spark commented on SPARK-33581: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/30525 > Refactor HivePartitionFilteringSuite > > > Key: SPARK-33581 > URL: https://issues.apache.org/jira/browse/SPARK-33581 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Refactor HivePartitionFilteringSuite, to make it easy to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33581) Refactor HivePartitionFilteringSuite
[ https://issues.apache.org/jira/browse/SPARK-33581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33581: Assignee: Apache Spark > Refactor HivePartitionFilteringSuite > > > Key: SPARK-33581 > URL: https://issues.apache.org/jira/browse/SPARK-33581 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > Refactor HivePartitionFilteringSuite, to make it easy to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33581) Refactor HivePartitionFilteringSuite
[ https://issues.apache.org/jira/browse/SPARK-33581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33581: Assignee: (was: Apache Spark) > Refactor HivePartitionFilteringSuite > > > Key: SPARK-33581 > URL: https://issues.apache.org/jira/browse/SPARK-33581 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Refactor HivePartitionFilteringSuite, to make it easy to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33581) Refactor HivePartitionFilteringSuite
[ https://issues.apache.org/jira/browse/SPARK-33581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-33581: Description: Refactor HivePartitionFilteringSuite, to make it easy to maintain. (was: Refactor HivePartitionFilteringSuite, to make it easy maintain.) > Refactor HivePartitionFilteringSuite > > > Key: SPARK-33581 > URL: https://issues.apache.org/jira/browse/SPARK-33581 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Refactor HivePartitionFilteringSuite, to make it easy to maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33581) Refactor HivePartitionFilteringSuite
Yuming Wang created SPARK-33581: --- Summary: Refactor HivePartitionFilteringSuite Key: SPARK-33581 URL: https://issues.apache.org/jira/browse/SPARK-33581 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.1.0 Reporter: Yuming Wang Refactor HivePartitionFilteringSuite, to make it easy maintain. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33580) resolveDependencyPaths should use classifier attribute of artifact
[ https://issues.apache.org/jira/browse/SPARK-33580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239882#comment-17239882 ] Apache Spark commented on SPARK-33580: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30524 > resolveDependencyPaths should use classifier attribute of artifact > -- > > Key: SPARK-33580 > URL: https://issues.apache.org/jira/browse/SPARK-33580 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > `resolveDependencyPaths` now takes artifact type to decide to add "-tests" > postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is > "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use > classifier instead of type to construct file path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33580) resolveDependencyPaths should use classifier attribute of artifact
[ https://issues.apache.org/jira/browse/SPARK-33580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33580: Assignee: L. C. Hsieh (was: Apache Spark) > resolveDependencyPaths should use classifier attribute of artifact > -- > > Key: SPARK-33580 > URL: https://issues.apache.org/jira/browse/SPARK-33580 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > `resolveDependencyPaths` now takes artifact type to decide to add "-tests" > postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is > "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use > classifier instead of type to construct file path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33580) resolveDependencyPaths should use classifier attribute of artifact
[ https://issues.apache.org/jira/browse/SPARK-33580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33580: Assignee: Apache Spark (was: L. C. Hsieh) > resolveDependencyPaths should use classifier attribute of artifact > -- > > Key: SPARK-33580 > URL: https://issues.apache.org/jira/browse/SPARK-33580 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > `resolveDependencyPaths` now takes artifact type to decide to add "-tests" > postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is > "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use > classifier instead of type to construct file path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33580) resolveDependencyPaths should use classifier attribute of artifact
[ https://issues.apache.org/jira/browse/SPARK-33580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239881#comment-17239881 ] Apache Spark commented on SPARK-33580: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/30524 > resolveDependencyPaths should use classifier attribute of artifact > -- > > Key: SPARK-33580 > URL: https://issues.apache.org/jira/browse/SPARK-33580 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > `resolveDependencyPaths` now takes artifact type to decide to add "-tests" > postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is > "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use > classifier instead of type to construct file path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33580) resolveDependencyPaths should use classifier attribute of artifact
[ https://issues.apache.org/jira/browse/SPARK-33580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] L. C. Hsieh updated SPARK-33580: Description: `resolveDependencyPaths` now takes artifact type to decide to add "-tests" postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use classifier instead of type to construct file path. (was: `resolveDependencyPaths` now takes artifact type to decide to add -tests postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use classifier instead of type to construct file path.) > resolveDependencyPaths should use classifier attribute of artifact > -- > > Key: SPARK-33580 > URL: https://issues.apache.org/jira/browse/SPARK-33580 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > `resolveDependencyPaths` now takes artifact type to decide to add "-tests" > postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is > "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use > classifier instead of type to construct file path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33580) resolveDependencyPaths should use classifier attribute of artifact
L. C. Hsieh created SPARK-33580: --- Summary: resolveDependencyPaths should use classifier attribute of artifact Key: SPARK-33580 URL: https://issues.apache.org/jira/browse/SPARK-33580 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh `resolveDependencyPaths` now takes artifact type to decide to add -tests postfix. However, the path pattern of ivy in `resolveMavenCoordinates` is "[organization]_[artifact]-[revision](-[classifier]).[ext]". We should use classifier instead of type to construct file path. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33579) Executors blank page behind proxy
[ https://issues.apache.org/jira/browse/SPARK-33579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239782#comment-17239782 ] Apache Spark commented on SPARK-33579: -- User 'pgillet' has created a pull request for this issue: https://github.com/apache/spark/pull/30523 > Executors blank page behind proxy > - > > Key: SPARK-33579 > URL: https://issues.apache.org/jira/browse/SPARK-33579 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Spark 3.0.1 on Kubernetes >Reporter: Pascal GILLET >Priority: Minor > Labels: core, ui > > When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), > executors page is blank. > In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we > should avoid the use of location.origin when constructing URLs for internal > API calls within the JavaScript. > Instead, we should use {{apiRoot}} global variable. > On one hand, it would allow to build relative URLs. On the other hand, > {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be > set to change the root path of the Web UI. > If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, > and we end up with an executors blank page. > I encounter this bug when accessing the Web UI behind a proxy (in my case a > Kubernetes Ingress). > > See also > [https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33579) Executors blank page behind proxy
[ https://issues.apache.org/jira/browse/SPARK-33579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33579: Assignee: (was: Apache Spark) > Executors blank page behind proxy > - > > Key: SPARK-33579 > URL: https://issues.apache.org/jira/browse/SPARK-33579 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Spark 3.0.1 on Kubernetes >Reporter: Pascal GILLET >Priority: Minor > Labels: core, ui > > When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), > executors page is blank. > In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we > should avoid the use of location.origin when constructing URLs for internal > API calls within the JavaScript. > Instead, we should use {{apiRoot}} global variable. > On one hand, it would allow to build relative URLs. On the other hand, > {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be > set to change the root path of the Web UI. > If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, > and we end up with an executors blank page. > I encounter this bug when accessing the Web UI behind a proxy (in my case a > Kubernetes Ingress). > > See also > [https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33579) Executors blank page behind proxy
[ https://issues.apache.org/jira/browse/SPARK-33579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239780#comment-17239780 ] Apache Spark commented on SPARK-33579: -- User 'pgillet' has created a pull request for this issue: https://github.com/apache/spark/pull/30523 > Executors blank page behind proxy > - > > Key: SPARK-33579 > URL: https://issues.apache.org/jira/browse/SPARK-33579 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Spark 3.0.1 on Kubernetes >Reporter: Pascal GILLET >Priority: Minor > Labels: core, ui > > When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), > executors page is blank. > In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we > should avoid the use of location.origin when constructing URLs for internal > API calls within the JavaScript. > Instead, we should use {{apiRoot}} global variable. > On one hand, it would allow to build relative URLs. On the other hand, > {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be > set to change the root path of the Web UI. > If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, > and we end up with an executors blank page. > I encounter this bug when accessing the Web UI behind a proxy (in my case a > Kubernetes Ingress). > > See also > [https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33579) Executors blank page behind proxy
[ https://issues.apache.org/jira/browse/SPARK-33579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33579: Assignee: Apache Spark > Executors blank page behind proxy > - > > Key: SPARK-33579 > URL: https://issues.apache.org/jira/browse/SPARK-33579 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Spark 3.0.1 on Kubernetes >Reporter: Pascal GILLET >Assignee: Apache Spark >Priority: Minor > Labels: core, ui > > When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), > executors page is blank. > In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we > should avoid the use of location.origin when constructing URLs for internal > API calls within the JavaScript. > Instead, we should use {{apiRoot}} global variable. > On one hand, it would allow to build relative URLs. On the other hand, > {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be > set to change the root path of the Web UI. > If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, > and we end up with an executors blank page. > I encounter this bug when accessing the Web UI behind a proxy (in my case a > Kubernetes Ingress). > > See also > [https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33579) Executors blank page behind proxy
[ https://issues.apache.org/jira/browse/SPARK-33579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal GILLET updated SPARK-33579: -- Description: When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), executors page is blank. In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we should avoid the use of location.origin when constructing URLs for internal API calls within the JavaScript. Instead, we should use {{apiRoot}} global variable. On one hand, it would allow to build relative URLs. On the other hand, {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be set to change the root path of the Web UI. If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). See also [https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115] was: When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), executors page is blank. In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we should avoid the use of location.origin when constructing URLs for internal API calls within the JavaScript. Instead, we should use {{apiRoot}} global variable. On one hand, it would allow to build relative URLs. On the other hand, {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be set to change the root path of the Web UI. If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). > Executors blank page behind proxy > - > > Key: SPARK-33579 > URL: https://issues.apache.org/jira/browse/SPARK-33579 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Spark 3.0.1 on Kubernetes >Reporter: Pascal GILLET >Priority: Minor > Labels: core, ui > > When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), > executors page is blank. > In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we > should avoid the use of location.origin when constructing URLs for internal > API calls within the JavaScript. > Instead, we should use {{apiRoot}} global variable. > On one hand, it would allow to build relative URLs. On the other hand, > {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be > set to change the root path of the Web UI. > If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, > and we end up with an executors blank page. > I encounter this bug when accessing the Web UI behind a proxy (in my case a > Kubernetes Ingress). > > See also > [https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-699163115] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33579) Executors blank page behind proxy
[ https://issues.apache.org/jira/browse/SPARK-33579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pascal GILLET updated SPARK-33579: -- Description: When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), executors page is blank. In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we should avoid the use of location.origin when constructing URLs for internal API calls within the JavaScript. Instead, we should use {{apiRoot}} global variable. On one hand, it would allow to build relative URLs. On the other hand, {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be set to change the root path of the Web UI. If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). was: When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), executors page is blank. In{{ /core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we should avoid the use of location.origin when constructing URLs for internal API calls within the JavaScript. Instead, we should use {{apiRoot}} global variable. On one hand, it would allow to build relative URLs. On the other hand, {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be set to change the root path of the Web UI. If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). > Executors blank page behind proxy > - > > Key: SPARK-33579 > URL: https://issues.apache.org/jira/browse/SPARK-33579 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.1 > Environment: Spark 3.0.1 on Kubernetes >Reporter: Pascal GILLET >Priority: Minor > Labels: core, ui > > When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), > executors page is blank. > In {{/core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we > should avoid the use of location.origin when constructing URLs for internal > API calls within the JavaScript. > Instead, we should use {{apiRoot}} global variable. > On one hand, it would allow to build relative URLs. On the other hand, > {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be > set to change the root path of the Web UI. > If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, > and we end up with an executors blank page. > I encounter this bug when accessing the Web UI behind a proxy (in my case a > Kubernetes Ingress). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33579) Executors blank page behind proxy
Pascal GILLET created SPARK-33579: - Summary: Executors blank page behind proxy Key: SPARK-33579 URL: https://issues.apache.org/jira/browse/SPARK-33579 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.1 Environment: Spark 3.0.1 on Kubernetes Reporter: Pascal GILLET When accessing the Web UI behind a proxy (e.g. a Kubernetes ingress), executors page is blank. In{{ /core/src/main/resources/org/apache/spark/ui/static/utils.js}}, we should avoid the use of location.origin when constructing URLs for internal API calls within the JavaScript. Instead, we should use {{apiRoot}} global variable. On one hand, it would allow to build relative URLs. On the other hand, {{apiRoot}} reflects the Spark property {{spark.ui.proxyBase}} which can be set to change the root path of the Web UI. If {{spark.ui.proxyBase}} is actually set, original URLs become incorrect, and we end up with an executors blank page. I encounter this bug when accessing the Web UI behind a proxy (in my case a Kubernetes Ingress). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon updated SPARK-33571: -- Description: The handling of old dates written with older Spark versions (<2.4.6) using the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working correctly. >From what I understand it should work like this: * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 1900-01-01T00:00:00Z * Only applies when reading or writing parquet files * When reading parquet files written with Spark < 2.4.6 which contain dates or timestamps before the above mentioned moments in time a `SparkUpgradeException` should be raised informing the user to choose either `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` * When reading parquet files written with Spark < 2.4.6 which contain dates or timestamps before the above mentioned moments in time and `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should show the same values in Spark 3.0.1. with for example `df.show()` as they did in Spark 2.4.5 * When reading parquet files written with Spark < 2.4.6 which contain dates or timestamps before the above mentioned moments in time and `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps should show different values in Spark 3.0.1. with for example `df.show()` as they did in Spark 2.4.5 * When writing parqet files with Spark > 3.0.0 which contain dates or timestamps before the above mentioned moment in time a `SparkUpgradeException` should be raised informing the user to choose either `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` First of all I'm not 100% sure all of this is correct. I've been unable to find any clear documentation on the expected behavior. The understanding I have was pieced together from the mailing list ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] the blog post linked there and looking at the Spark code. >From our testing we're seeing several issues: * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. that contains fields of type `TimestampType` which contain timestamps before the above mentioned moments in time without `datetimeRebaseModeInRead` set doesn't raise the `SparkUpgradeException`, it succeeds without any changes to the resulting dataframe compares to that dataframe in Spark 2.4.5 * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. that contains fields of type `TimestampType` or `DateType` which contain dates or timestamps before the above mentioned moments in time with `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the dataframe as when using `CORRECTED`, so it seems like no rebasing is happening. I've made some scripts to help with testing/show the behavior, it uses pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the outputs in a comment below as well. was: The handling of old dates written with older Spark versions (<2.4.6) using the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working correctly. >From what I understand it should work like this: * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 1900-01-01T00:00:00Z * Only applies when reading or writing parquet files * When reading parquet files written with Spark < 2.4.6 which contain dates or timestamps before the above mentioned moments in time a `SparkUpgradeException` should be raised informing the user to choose either `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` * When reading parquet files written with Spark < 2.4.6 which contain dates or timestamps before the above mentioned moments in time and `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should show the same values in Spark 3.0.1. with for example `df.show()` as they did in Spark 2.4.5 * When reading parquet files written with Spark < 2.4.6 which contain dates or timestamps before the above mentioned moments in time and `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps should show different values in Spark 3.0.1. with for example `df.show()` as they did in Spark 2.4.5 * When writing parqet files with Spark > 3.0.0 which contain dates or timestamps before the above mentioned moment in time a `SparkUpgradeException` should be raised informing the user to choose either `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` First of all I'm not 100% sure all of this is correct. I've been unable to find any clear documentation on the expected behavior. The understanding I have was pieced together from the mailing list
[jira] [Updated] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly
[ https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon updated SPARK-33571: -- Component/s: PySpark > Handling of hybrid to proleptic calendar when reading and writing Parquet > data not working correctly > > > Key: SPARK-33571 > URL: https://issues.apache.org/jira/browse/SPARK-33571 > Project: Spark > Issue Type: Bug > Components: PySpark, Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Simon >Priority: Major > > The handling of old dates written with older Spark versions (<2.4.6) using > the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working > correctly. > From what I understand it should work like this: > * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before > 1900-01-01T00:00:00Z > * Only applies when reading or writing parquet files > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead` > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should > show the same values in Spark 3.0.1. with for example `df.show()` as they did > in Spark 2.4.5 > * When reading parquet files written with Spark < 2.4.6 which contain dates > or timestamps before the above mentioned moments in time and > `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps > should show different values in Spark 3.0.1. with for example `df.show()` as > they did in Spark 2.4.5 > * When writing parqet files with Spark > 3.0.0 which contain dates or > timestamps before the above mentioned moment in time a > `SparkUpgradeException` should be raised informing the user to choose either > `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite` > First of all I'm not 100% sure all of this is correct. I've been unable to > find any clear documentation on the expected behavior. The understanding I > have was pieced together from the mailing list > ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)] > the blog post linked there and looking at the Spark code. > From our testing we're seeing several issues: > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` which contain timestamps before > the above mentioned moments in time without `datetimeRebaseModeInRead` set > doesn't raise the `SparkUpgradeException`, it succeeds without any changes to > the resulting dataframe compares to that dataframe in Spark 2.4.5 > * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. > that contains fields of type `TimestampType` or `DateType` which contain > dates or timestamps before the above mentioned moments in time with > `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the > dataframe as when using `CORRECTED`, so it seems like no rebasing is > happening. > I've made some scripts to help with testing/show the behavior, it uses > pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here > [https://github.com/simonvanderveldt/spark3-rebasemode-issue.] I'll post the > outputs in a comment below as well. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33141) capture SQL configs when creating permanent views
[ https://issues.apache.org/jira/browse/SPARK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33141: --- Assignee: Leanken.Lin > capture SQL configs when creating permanent views > - > > Key: SPARK-33141 > URL: https://issues.apache.org/jira/browse/SPARK-33141 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33141) capture SQL configs when creating permanent views
[ https://issues.apache.org/jira/browse/SPARK-33141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33141. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30289 [https://github.com/apache/spark/pull/30289] > capture SQL configs when creating permanent views > - > > Key: SPARK-33141 > URL: https://issues.apache.org/jira/browse/SPARK-33141 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > TODO -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid
[ https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33498. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30442 [https://github.com/apache/spark/pull/30442] > Datetime parsing should fail if the input string can't be parsed, or the > pattern string is invalid > -- > > Key: SPARK-33498 > URL: https://issues.apache.org/jira/browse/SPARK-33498 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > Fix For: 3.1.0 > > > Datetime parsing should fail if the input string can't be parsed, or the > pattern string is invalid, when ANSI mode is enable. This patch should update > GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid
[ https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33498: --- Assignee: Leanken.Lin > Datetime parsing should fail if the input string can't be parsed, or the > pattern string is invalid > -- > > Key: SPARK-33498 > URL: https://issues.apache.org/jira/browse/SPARK-33498 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Leanken.Lin >Assignee: Leanken.Lin >Priority: Major > > Datetime parsing should fail if the input string can't be parsed, or the > pattern string is invalid, when ANSI mode is enable. This patch should update > GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33578) enableHiveSupport is invalid after sparkContext that without hive support created
[ https://issues.apache.org/jira/browse/SPARK-33578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239653#comment-17239653 ] Apache Spark commented on SPARK-33578: -- User 'yui2010' has created a pull request for this issue: https://github.com/apache/spark/pull/30522 > enableHiveSupport is invalid after sparkContext that without hive support > created > -- > > Key: SPARK-33578 > URL: https://issues.apache.org/jira/browse/SPARK-33578 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: steven zhang >Priority: Minor > Fix For: 3.1.0 > > > reproduce as follow code: > SparkConf sparkConf = new SparkConf().setAppName("hello"); > sparkConf.set("spark.master", "local"); > JavaSparkContext jssc = new JavaSparkContext(sparkConf); > spark = SparkSession.builder() > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("hive.exec.dynamici.partition", > true).config("hive.exec.dynamic.partition.mode", "nonstrict") > .config("hive.metastore.uris", "thrift://hivemetastore:9083") > .enableHiveSupport() > .master("local") > .getOrCreate(); > spark.sql("select * from hudi_db.hudi_test_order").show(); > > it will produce follow Exception > AssertionError: assertion failed: No plan for HiveTableRelation > [`hudi_db`.`hudi_test_order` … (at current master branch) > org.apache.spark.sql.AnalysisException: Table or view not found: > `hudi_db`.`hudi_test_order`; (at spark v2.4.4) > > the reason is SparkContext#getOrCreate(SparkConf) will return activeContext > that include previous spark config if it has > but the input SparkConf is the newest which include previous spark config and > options. > enableHiveSupport will set options (“spark.sql.catalogImplementation", > "hive”) when spark session created it will miss this conf > SharedState load conf from sparkContext and will miss hive catalog -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33578) enableHiveSupport is invalid after sparkContext that without hive support created
[ https://issues.apache.org/jira/browse/SPARK-33578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33578: Assignee: (was: Apache Spark) > enableHiveSupport is invalid after sparkContext that without hive support > created > -- > > Key: SPARK-33578 > URL: https://issues.apache.org/jira/browse/SPARK-33578 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: steven zhang >Priority: Minor > Fix For: 3.1.0 > > > reproduce as follow code: > SparkConf sparkConf = new SparkConf().setAppName("hello"); > sparkConf.set("spark.master", "local"); > JavaSparkContext jssc = new JavaSparkContext(sparkConf); > spark = SparkSession.builder() > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("hive.exec.dynamici.partition", > true).config("hive.exec.dynamic.partition.mode", "nonstrict") > .config("hive.metastore.uris", "thrift://hivemetastore:9083") > .enableHiveSupport() > .master("local") > .getOrCreate(); > spark.sql("select * from hudi_db.hudi_test_order").show(); > > it will produce follow Exception > AssertionError: assertion failed: No plan for HiveTableRelation > [`hudi_db`.`hudi_test_order` … (at current master branch) > org.apache.spark.sql.AnalysisException: Table or view not found: > `hudi_db`.`hudi_test_order`; (at spark v2.4.4) > > the reason is SparkContext#getOrCreate(SparkConf) will return activeContext > that include previous spark config if it has > but the input SparkConf is the newest which include previous spark config and > options. > enableHiveSupport will set options (“spark.sql.catalogImplementation", > "hive”) when spark session created it will miss this conf > SharedState load conf from sparkContext and will miss hive catalog -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33578) enableHiveSupport is invalid after sparkContext that without hive support created
[ https://issues.apache.org/jira/browse/SPARK-33578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33578: Assignee: Apache Spark > enableHiveSupport is invalid after sparkContext that without hive support > created > -- > > Key: SPARK-33578 > URL: https://issues.apache.org/jira/browse/SPARK-33578 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: steven zhang >Assignee: Apache Spark >Priority: Minor > Fix For: 3.1.0 > > > reproduce as follow code: > SparkConf sparkConf = new SparkConf().setAppName("hello"); > sparkConf.set("spark.master", "local"); > JavaSparkContext jssc = new JavaSparkContext(sparkConf); > spark = SparkSession.builder() > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("hive.exec.dynamici.partition", > true).config("hive.exec.dynamic.partition.mode", "nonstrict") > .config("hive.metastore.uris", "thrift://hivemetastore:9083") > .enableHiveSupport() > .master("local") > .getOrCreate(); > spark.sql("select * from hudi_db.hudi_test_order").show(); > > it will produce follow Exception > AssertionError: assertion failed: No plan for HiveTableRelation > [`hudi_db`.`hudi_test_order` … (at current master branch) > org.apache.spark.sql.AnalysisException: Table or view not found: > `hudi_db`.`hudi_test_order`; (at spark v2.4.4) > > the reason is SparkContext#getOrCreate(SparkConf) will return activeContext > that include previous spark config if it has > but the input SparkConf is the newest which include previous spark config and > options. > enableHiveSupport will set options (“spark.sql.catalogImplementation", > "hive”) when spark session created it will miss this conf > SharedState load conf from sparkContext and will miss hive catalog -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33578) enableHiveSupport is invalid after sparkContext that without hive support created
[ https://issues.apache.org/jira/browse/SPARK-33578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] steven zhang updated SPARK-33578: - Description: reproduce as follow code: SparkConf sparkConf = new SparkConf().setAppName("hello"); sparkConf.set("spark.master", "local"); JavaSparkContext jssc = new JavaSparkContext(sparkConf); spark = SparkSession.builder() .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("hive.exec.dynamici.partition", true).config("hive.exec.dynamic.partition.mode", "nonstrict") .config("hive.metastore.uris", "thrift://hivemetastore:9083") .enableHiveSupport() .master("local") .getOrCreate(); spark.sql("select * from hudi_db.hudi_test_order").show(); it will produce follow Exception AssertionError: assertion failed: No plan for HiveTableRelation [`hudi_db`.`hudi_test_order` … (at current master branch) org.apache.spark.sql.AnalysisException: Table or view not found: `hudi_db`.`hudi_test_order`; (at spark v2.4.4) the reason is SparkContext#getOrCreate(SparkConf) will return activeContext that include previous spark config if it has but the input SparkConf is the newest which include previous spark config and options. enableHiveSupport will set options (“spark.sql.catalogImplementation", "hive”) when spark session created it will miss this conf SharedState load conf from sparkContext and will miss hive catalog was: reproduce as follow code: SparkConf sparkConf = new SparkConf().setAppName("hello"); sparkConf.set("spark.master", "local"); JavaSparkContext jssc = new JavaSparkContext(sparkConf); spark = SparkSession.builder() .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("hive.exec.dynamici.partition", true).config("hive.exec.dynamic.partition.mode", "nonstrict") .config("hive.metastore.uris", "thrift://hivemetastore:9083") .enableHiveSupport() .master("local") .getOrCreate(); spark.sql("select * from hudi_db.hudi_test_order").show(); it will produce follow Exception AssertionError: assertion failed: No plan for HiveTableRelation [`hudi_db`.`hudi_test_order` … (at current master branch) org.apache.spark.sql.AnalysisException: Table or view not found: `hudi_db`.`hudi_test_order`; (at spark v2.4.4) The reason is SparkContext#getOrCreate(SparkConf) will return activeContext that include previous spark config if it has but the input SparkConf is the newest which include previous spark config and options. enableHiveSupport will set options (“spark.sql.catalogImplementation", "hive”) when spark session created it will miss this conf SharedState load conf from sparkContext and will miss hive catalog > enableHiveSupport is invalid after sparkContext that without hive support > created > -- > > Key: SPARK-33578 > URL: https://issues.apache.org/jira/browse/SPARK-33578 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: steven zhang >Priority: Minor > Fix For: 3.1.0 > > > reproduce as follow code: > SparkConf sparkConf = new SparkConf().setAppName("hello"); > sparkConf.set("spark.master", "local"); > JavaSparkContext jssc = new JavaSparkContext(sparkConf); > spark = SparkSession.builder() > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("hive.exec.dynamici.partition", > true).config("hive.exec.dynamic.partition.mode", "nonstrict") > .config("hive.metastore.uris", "thrift://hivemetastore:9083") > .enableHiveSupport() > .master("local") > .getOrCreate(); > spark.sql("select * from hudi_db.hudi_test_order").show(); > > it will produce follow Exception > AssertionError: assertion failed: No plan for HiveTableRelation > [`hudi_db`.`hudi_test_order` … (at current master branch) > org.apache.spark.sql.AnalysisException: Table or view not found: > `hudi_db`.`hudi_test_order`; (at spark v2.4.4) > > the reason is SparkContext#getOrCreate(SparkConf) will return activeContext > that include previous spark config if it has > but the input SparkConf is the newest which include previous spark config and > options. > enableHiveSupport will set options (“spark.sql.catalogImplementation", > "hive”) when spark session created it will miss this conf > SharedState
[jira] [Created] (SPARK-33578) enableHiveSupport is invalid after sparkContext that without hive support created
steven zhang created SPARK-33578: Summary: enableHiveSupport is invalid after sparkContext that without hive support created Key: SPARK-33578 URL: https://issues.apache.org/jira/browse/SPARK-33578 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.0 Reporter: steven zhang Fix For: 3.1.0 reproduce as follow code: SparkConf sparkConf = new SparkConf().setAppName("hello"); sparkConf.set("spark.master", "local"); JavaSparkContext jssc = new JavaSparkContext(sparkConf); spark = SparkSession.builder() .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .config("hive.exec.dynamici.partition", true).config("hive.exec.dynamic.partition.mode", "nonstrict") .config("hive.metastore.uris", "thrift://hivemetastore:9083") .enableHiveSupport() .master("local") .getOrCreate(); spark.sql("select * from hudi_db.hudi_test_order").show(); it will produce follow Exception AssertionError: assertion failed: No plan for HiveTableRelation [`hudi_db`.`hudi_test_order` … (at current master branch) org.apache.spark.sql.AnalysisException: Table or view not found: `hudi_db`.`hudi_test_order`; (at spark v2.4.4) The reason is SparkContext#getOrCreate(SparkConf) will return activeContext that include previous spark config if it has but the input SparkConf is the newest which include previous spark config and options. enableHiveSupport will set options (“spark.sql.catalogImplementation", "hive”) when spark session created it will miss this conf SharedState load conf from sparkContext and will miss hive catalog -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed
[ https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239646#comment-17239646 ] Yang Jie commented on SPARK-33557: -- I'm not sure whether the configurations related to "spark.network.timeout" really meets the expected behavior. Needs to be investigated ~ > spark.storage.blockManagerSlaveTimeoutMs default value does not follow > spark.network.timeout value when the latter was changed > -- > > Key: SPARK-33557 > URL: https://issues.apache.org/jira/browse/SPARK-33557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Ohad >Priority: Minor > > According to the documentation "spark.network.timeout" is the default timeout > for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the > user sets "spark.network.timeout" the effective value of > "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was > not specifically changed. > However this is not the case since the default value of > "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of > "spark.network.timeout" (120s) > > "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object > of "org.apache.spark.internal.config" as follows: > {code:java} > private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT = > ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs") > .version("0.7.0") > .timeConf(TimeUnit.MILLISECONDS) > .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString) > {code} > So it seems like the its default value is indeed "fixed" to > "spark.network.timeout" default value. > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33557) spark.storage.blockManagerSlaveTimeoutMs default value does not follow spark.network.timeout value when the latter was changed
[ https://issues.apache.org/jira/browse/SPARK-33557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239631#comment-17239631 ] Yang Jie commented on SPARK-33557: -- It seems that changing value of "spark.network.timeout" doesn't really change the value of STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT and their relationship is maintained by code. For example, the treatment of "spark.shuffle.io.connectionTimeout" is as follows: {code:java} /** Connect timeout in milliseconds. Default 120 secs. */ public int connectionTimeoutMs() { long defaultNetworkTimeoutS = JavaUtils.timeStringAsSec( conf.get("spark.network.timeout", "120s")); long defaultTimeoutMs = JavaUtils.timeStringAsSec( conf.get(SPARK_NETWORK_IO_CONNECTIONTIMEOUT_KEY, defaultNetworkTimeoutS + "s")) * 1000; return (int) defaultTimeoutMs; } {code} But it seems that there is no similar treatment forSTORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT in HeartbeatReceiver and MesosCoarseGrainedSchedulerBackend {code:java} private val executorTimeoutMs = sc.conf.get(config.STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT) {code} {code:java} mesosExternalShuffleClient.get .registerDriverWithShuffleService( agent.hostname, externalShufflePort, sc.conf.get(config.STORAGE_BLOCKMANAGER_HEARTBEAT_TIMEOUT), sc.conf.get(config.EXECUTOR_HEARTBEAT_INTERVAL)) {code} Maybe need to be fixed by code changes. > spark.storage.blockManagerSlaveTimeoutMs default value does not follow > spark.network.timeout value when the latter was changed > -- > > Key: SPARK-33557 > URL: https://issues.apache.org/jira/browse/SPARK-33557 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Ohad >Priority: Minor > > According to the documentation "spark.network.timeout" is the default timeout > for "spark.storage.blockManagerSlaveTimeoutMs" which implies that when the > user sets "spark.network.timeout" the effective value of > "spark.storage.blockManagerSlaveTimeoutMs" should also be changed if it was > not specifically changed. > However this is not the case since the default value of > "spark.storage.blockManagerSlaveTimeoutMs" is always the default value of > "spark.network.timeout" (120s) > > "spark.storage.blockManagerSlaveTimeoutMs" is defined in the package object > of "org.apache.spark.internal.config" as follows: > {code:java} > private[spark] val STORAGE_BLOCKMANAGER_SLAVE_TIMEOUT = > ConfigBuilder("spark.storage.blockManagerSlaveTimeoutMs") > .version("0.7.0") > .timeConf(TimeUnit.MILLISECONDS) > .createWithDefaultString(Network.NETWORK_TIMEOUT.defaultValueString) > {code} > So it seems like the its default value is indeed "fixed" to > "spark.network.timeout" default value. > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28646) Allow usage of `count` only for parameterless aggregate function
[ https://issues.apache.org/jira/browse/SPARK-28646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239619#comment-17239619 ] jiaan.geng commented on SPARK-28646: I will take a look! > Allow usage of `count` only for parameterless aggregate function > > > Key: SPARK-28646 > URL: https://issues.apache.org/jira/browse/SPARK-28646 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark allows calls to `count` even for non parameterless aggregate > function. For example, the following query actually works: > {code:sql}SELECT count() OVER () FROM tenk1;{code} > In PgSQL, on the other hand, the following error is thrown: > {code:sql}ERROR: count(*) must be used to call a parameterless aggregate > function{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28645) Throw an error on window redefinition
[ https://issues.apache.org/jira/browse/SPARK-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28645. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30512 [https://github.com/apache/spark/pull/30512] > Throw an error on window redefinition > - > > Key: SPARK-28645 > URL: https://issues.apache.org/jira/browse/SPARK-28645 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Assignee: jiaan.geng >Priority: Major > Fix For: 3.1.0 > > > Currently in Spark one could redefine a window. For instance: > {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w > AS (ORDER BY unique1);{code} > The window `w` is defined two times. In PgSQL, on the other hand, a thrown > will happen: > {code:sql}ERROR: window "w" is already defined{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28645) Throw an error on window redefinition
[ https://issues.apache.org/jira/browse/SPARK-28645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28645: --- Assignee: jiaan.geng > Throw an error on window redefinition > - > > Key: SPARK-28645 > URL: https://issues.apache.org/jira/browse/SPARK-28645 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Assignee: jiaan.geng >Priority: Major > > Currently in Spark one could redefine a window. For instance: > {code:sql}select count(*) OVER w FROM tenk1 WINDOW w AS (ORDER BY unique1), w > AS (ORDER BY unique1);{code} > The window `w` is defined two times. In PgSQL, on the other hand, a thrown > will happen: > {code:sql}ERROR: window "w" is already defined{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33522) Improve exception messages while handling UnresolvedTableOrView
[ https://issues.apache.org/jira/browse/SPARK-33522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-33522: --- Assignee: Terry Kim > Improve exception messages while handling UnresolvedTableOrView > --- > > Key: SPARK-33522 > URL: https://issues.apache.org/jira/browse/SPARK-33522 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > > Improve exception messages while handling UnresolvedTableOrView. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33522) Improve exception messages while handling UnresolvedTableOrView
[ https://issues.apache.org/jira/browse/SPARK-33522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-33522. - Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 30475 [https://github.com/apache/spark/pull/30475] > Improve exception messages while handling UnresolvedTableOrView > --- > > Key: SPARK-33522 > URL: https://issues.apache.org/jira/browse/SPARK-33522 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Minor > Fix For: 3.1.0 > > > Improve exception messages while handling UnresolvedTableOrView. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33576) PythonException: An exception was thrown from a UDF: 'OSError: Invalid IPC message: negative bodyLength'.
[ https://issues.apache.org/jira/browse/SPARK-33576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darshat updated SPARK-33576: Description: Hello, We are using Databricks on Azure to process large amount of ecommerce data. Databricks runtime is 7.3 which includes Apache spark 3.0.1 and Scala 2.12. During processing, there is a groupby operation on the DataFrame that consistently gets an exception of this type: {color:#ff}PythonException: An exception was thrown from a UDF: 'OSError: Invalid IPC message: negative bodyLength'. Full traceback below: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 654, in main process() File "/databricks/spark/python/pyspark/worker.py", line 646, in process serializer.dump_stream(out_iter, outfile) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 281, in dump_stream timely_flush_timeout_ms=self.timely_flush_timeout_ms) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 97, in dump_stream for batch in iterator: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 271, in init_stream_yield_batches for series in iterator: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 287, in load_stream for batch in batches: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 228, in load_stream for batch in batches: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 118, in load_stream for batch in reader: File "pyarrow/ipc.pxi", line 412, in __iter__ File "pyarrow/ipc.pxi", line 432, in pyarrow.lib._CRecordBatchReader.read_next_batch File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status OSError: Invalid IPC message: negative bodyLength{color} Code that causes this: {color:#ff}x = df.groupby('providerid').apply(domain_features){color} {color:#ff}display(x.info()){color} Dataframe size - 22 million rows, 31 columns One of the columns is a string ('providerid') on which we do a groupby followed by an apply operation. There are 3 distinct provider ids in this set. While trying to enumerate/count the results, we get this exception. We've put all possible checks in the code for null values, or corrupt data and we are not able to track this to application level code. I hope we can get some help troubleshooting this as this is a blocker for rolling out at scale. The cluster has 8 nodes + driver, all 28GB RAM. I can provide any other settings that could be useful. Hope to get some insights into the problem. Thanks, Darshat Shah was: Hello, We are using Databricks on Azure to process large amount of ecommerce data. Databricks runtime is 7.3 which includes Apache spark 3.0.1 and Scala 2.12. During processing, there is a groupby operation on the DataFrame that consistently gets an exception of this type: {color:#FF}PythonException: An exception was thrown from a UDF: 'OSError: Invalid IPC message: negative bodyLength'. Full traceback below: Traceback (most recent call last): File "/databricks/spark/python/pyspark/worker.py", line 654, in main process() File "/databricks/spark/python/pyspark/worker.py", line 646, in process serializer.dump_stream(out_iter, outfile) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 281, in dump_stream timely_flush_timeout_ms=self.timely_flush_timeout_ms) File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 97, in dump_stream for batch in iterator: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 271, in init_stream_yield_batches for series in iterator: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 287, in load_stream for batch in batches: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 228, in load_stream for batch in batches: File "/databricks/spark/python/pyspark/sql/pandas/serializers.py", line 118, in load_stream for batch in reader: File "pyarrow/ipc.pxi", line 412, in __iter__ File "pyarrow/ipc.pxi", line 432, in pyarrow.lib._CRecordBatchReader.read_next_batch File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status OSError: Invalid IPC message: negative bodyLength{color} Code that causes this: {color:#57d9a3}## df has 22 million rows and 3 distinct provider ids. Domain features adds couple of computed columns to the dataframe{color} {color:#FF}x = df.groupby('providerid').apply(domain_features){color} {color:#FF}display(x.info()){color} We've put all possible checks in the code for null values, or corrupt data and we are not able to track this to application level code. I hope we can get some help troubleshooting this as this is a blocker for rolling out at scale. Dataframe size - 22 million rows, 31 columns One of the columns is a string ('providerid') on which we do a groupby followed by an
[jira] [Commented] (SPARK-33577) Add support for V1Table in stream writer table API
[ https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239577#comment-17239577 ] Apache Spark commented on SPARK-33577: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/30521 > Add support for V1Table in stream writer table API > -- > > Key: SPARK-33577 > URL: https://issues.apache.org/jira/browse/SPARK-33577 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > After SPARK-32896, we have table API for stream writer but only support > DataSource v2 tables. Here we add the following enhancements: > * Create non-existing tables by default > * Support both managed and external V1Tables -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33577) Add support for V1Table in stream writer table API
[ https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33577: Assignee: Apache Spark > Add support for V1Table in stream writer table API > -- > > Key: SPARK-33577 > URL: https://issues.apache.org/jira/browse/SPARK-33577 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Assignee: Apache Spark >Priority: Major > > After SPARK-32896, we have table API for stream writer but only support > DataSource v2 tables. Here we add the following enhancements: > * Create non-existing tables by default > * Support both managed and external V1Tables -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33577) Add support for V1Table in stream writer table API
[ https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33577: Assignee: (was: Apache Spark) > Add support for V1Table in stream writer table API > -- > > Key: SPARK-33577 > URL: https://issues.apache.org/jira/browse/SPARK-33577 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > After SPARK-32896, we have table API for stream writer but only support > DataSource v2 tables. Here we add the following enhancements: > * Create non-existing tables by default > * Support both managed and external V1Tables -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33577) Add support for V1Table in stream writer table API
[ https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239576#comment-17239576 ] Apache Spark commented on SPARK-33577: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/30521 > Add support for V1Table in stream writer table API > -- > > Key: SPARK-33577 > URL: https://issues.apache.org/jira/browse/SPARK-33577 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > After SPARK-32896, we have table API for stream writer but only support > DataSource v2 tables. Here we add the following enhancements: > * Create non-existing tables by default > * Support both managed and external V1Tables -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33577) Add support for V1Table in stream writer table API
[ https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-33577: Description: After SPARK-32896, we have table API for stream writer but only support DataSource v2 tables. Here we add the following enhancements: * Create non-existing tables by default * Support both managed and external V1Tables was: After SPARK-32896, we have table API for stream writer but only support DataSource v2 tables. Here we add the following supports: * Create non-existing tables by default * Support both managed and external V1Tables > Add support for V1Table in stream writer table API > -- > > Key: SPARK-33577 > URL: https://issues.apache.org/jira/browse/SPARK-33577 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > After SPARK-32896, we have table API for stream writer but only support > DataSource v2 tables. Here we add the following enhancements: > * Create non-existing tables by default > * Support both managed and external V1Tables -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33577) Add support for V1Table in stream writer table API
Yuanjian Li created SPARK-33577: --- Summary: Add support for V1Table in stream writer table API Key: SPARK-33577 URL: https://issues.apache.org/jira/browse/SPARK-33577 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 3.0.0 Reporter: Yuanjian Li After SPARK-32896, we have table API for stream writer but only support DataSource v2 tables. Here we add the following supports: * Create non-existing tables by default * Support both managed and external V1Tables -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33486) Collapse Partial and Final Aggregation into Complete Aggregation mode
[ https://issues.apache.org/jira/browse/SPARK-33486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prakhar Jain updated SPARK-33486: - Issue Type: Improvement (was: Task) > Collapse Partial and Final Aggregation into Complete Aggregation mode > - > > Key: SPARK-33486 > URL: https://issues.apache.org/jira/browse/SPARK-33486 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.4, 2.4.7, 3.0.0, 3.0.1 >Reporter: Prakhar Jain >Priority: Major > > We should merge the Partial and Final Aggregation into one if there is no > exchange between them. > > Example: select col1, max(col2) from t1 join t2 on col1 group by col1 > In this case, after the SortMergeJoin, Spark will do PartialAggregation and > then FinalAggregation. So it will create HashTables two times which is not > required. If there is lot of data after Join with many distinct col1's then > there is a possibility of Spill also in HashAggregateExec. So Spill will also > happen twice which can be avoided. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33486) Collapse Partial and Final Aggregation into Complete Aggregation mode
[ https://issues.apache.org/jira/browse/SPARK-33486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17239574#comment-17239574 ] Prakhar Jain commented on SPARK-33486: -- [~dongjoon] Sure. Updating the Issue Type to Improvement. > Collapse Partial and Final Aggregation into Complete Aggregation mode > - > > Key: SPARK-33486 > URL: https://issues.apache.org/jira/browse/SPARK-33486 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 2.3.4, 2.4.7, 3.0.0, 3.0.1 >Reporter: Prakhar Jain >Priority: Major > > We should merge the Partial and Final Aggregation into one if there is no > exchange between them. > > Example: select col1, max(col2) from t1 join t2 on col1 group by col1 > In this case, after the SortMergeJoin, Spark will do PartialAggregation and > then FinalAggregation. So it will create HashTables two times which is not > required. If there is lot of data after Join with many distinct col1's then > there is a possibility of Spill also in HashAggregateExec. So Spill will also > happen twice which can be avoided. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org