[jira] [Commented] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory
[ https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390851#comment-17390851 ] Senthil Kumar commented on SPARK-36327: --- Hi [~sunchao] Hive is creating .staging directories inside "/db/table" location but Spark-sql creates .staging directories inside /db/" location when we use hadoop federation(viewFs). But works as expected (creating .staging inside /db/table/ location for other filesystems like hdfs). HIVE: {{ # beeline > use dicedb; > insert into table part_test partition (j=1) values (1); ... INFO : Loading data to table dicedb.part_test partition (j=1) from **viewfs://cloudera/user/daisuke/dicedb/part_test/j=1/.hive-staging_hive_2021-07-19_13-04-44_989_6775328876605030677-1/-ext-1** }} but spark's behaviour, {{ spark-sql> use dicedb; spark-sql> insert into table part_test partition (j=2) values (2); 21/07/19 13:07:37 INFO FileUtils: Creating directory if it doesn't exist: **viewfs://cloudera/user/daisuke/dicedb/.hive-staging_hive_2021-07-19_13-07-37_317_5083528872437596950-1** ... }} The reason why we require this change is , if we allow spark-sql to create .staging directory inside /db/ location then we will end-up with security issues. We need to provide permission for "viewfs:///db/" location to all users who submit spark jobs. After this change is applied spark-sql creates .staging inside /db/table/, similar to hive, as below, {{ spark-sql> use dicedb; 21/07/28 00:22:47 INFO SparkSQLCLIDriver: Time taken: 0.929 seconds spark-sql> insert into table part_test partition (j=8) values (8); 21/07/28 00:23:25 INFO HiveMetaStoreClient: Closed a connection to metastore, current connections: 1 21/07/28 00:23:26 INFO FileUtils: Creating directory if it doesn't exist: **viewfs://cloudera/user/daisuke/dicedb/part_test/.hive-staging_hive_2021-07-28_00-23-26_109_4548714524589026450-1** }} The reason why we don't see this issue in Hive but only occurs in Spark-sql: In hive, "/db/table/tmp" directory structure is passed for path and hence path.getParent returns "db/table/" . But in Spark we just pass "/db/table" so it is not required to use "path.getParent" for hadoop federation(viewfs) > Spark sql creates staging dir inside database directory rather than creating > inside table directory > --- > > Key: SPARK-36327 > URL: https://issues.apache.org/jira/browse/SPARK-36327 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Senthil Kumar >Priority: Minor > > Spark sql creates staging dir inside database directory rather than creating > inside table directory. > > This arises only when viewfs:// is configured. When the location is hdfs://, > it doesn't occur. > > Based on further investigation in file *SaveAsHiveFile.scala*, I could see > that the directory hierarchy has been not properly handled for viewFS > condition. > Parent path(db path) is passed rather than passing the actual directory(table > location). > {{ > // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2 > private def newVersionExternalTempPath( > path: Path, > hadoopConf: Configuration, > stagingDir: String): Path = { > val extURI: URI = path.toUri > if (extURI.getScheme == "viewfs") > { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) } > else > { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), > "-ext-1") } > } > }} > Please refer below lines > === > if (extURI.getScheme == "viewfs") { > getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) > === -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36370) Avoid using SelectionMixin._builtin_table which is removed in pandas 1.3
Takuya Ueshin created SPARK-36370: - Summary: Avoid using SelectionMixin._builtin_table which is removed in pandas 1.3 Key: SPARK-36370 URL: https://issues.apache.org/jira/browse/SPARK-36370 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36369) Fix Index.union to follow pandas 1.3
Takuya Ueshin created SPARK-36369: - Summary: Fix Index.union to follow pandas 1.3 Key: SPARK-36369 URL: https://issues.apache.org/jira/browse/SPARK-36369 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36368) Fix CategoricalOps.astype to follow pandas 1.3
Takuya Ueshin created SPARK-36368: - Summary: Fix CategoricalOps.astype to follow pandas 1.3 Key: SPARK-36368 URL: https://issues.apache.org/jira/browse/SPARK-36368 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-36367: -- Issue Type: Umbrella (was: Improvement) > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36367: Assignee: Apache Spark > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390822#comment-17390822 ] Apache Spark commented on SPARK-36367: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33598 > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36367: Assignee: (was: Apache Spark) > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
[ https://issues.apache.org/jira/browse/SPARK-36367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390821#comment-17390821 ] Apache Spark commented on SPARK-36367: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33598 > Fix the behavior to follow pandas >= 1.3 > > > Key: SPARK-36367 > URL: https://issues.apache.org/jira/browse/SPARK-36367 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36367) Fix the behavior to follow pandas >= 1.3
Takuya Ueshin created SPARK-36367: - Summary: Fix the behavior to follow pandas >= 1.3 Key: SPARK-36367 URL: https://issues.apache.org/jira/browse/SPARK-36367 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin Pandas 1.3 has been released. We should follow the new pandas behavior. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36366) Google Kubernetes Engine authentication fails
Tiago Reis created SPARK-36366: -- Summary: Google Kubernetes Engine authentication fails Key: SPARK-36366 URL: https://issues.apache.org/jira/browse/SPARK-36366 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.1.2 Environment: {code} $ kubectl version Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.15", GitCommit:"73dd5c840662bb066a146d0871216333181f4b64", GitTreeState:"clean", BuildDate:"2021-01-13T13:22:41Z", GoVersion:"go1.13.15", Compiler:"gc", Platform:"darwin/amd64"} Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.19-gke.1701", GitCommit:"d7cecefb99b58e8968f59b59d76448eb1e6ea403", GitTreeState:"clean", BuildDate:"2021-06-23T21:51:59Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"} $ spark-submit --version version 3.1.2 Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10 {code} Reporter: Tiago Reis When connecting to a Google Kubernetes Engine, a command {{gcloud container clusters get-credentials}} is used that generates a {{~/.kube/config}} file. The distinctive trait in this config file is that it uses an {{auth-provider}} relying on {{gcloud}} to inject the keys {{expiry}} and {{access-token}} from the general Google SDK auth config, as seen here: {code:json} users: - name: gke_my-project_my-region_my-cluster user: auth-provider: config: cmd-args: config config-helper --format=json cmd-path: /Users/reist01/google-cloud-sdk/bin/gcloud expiry-key: '{.credential.token_expiry}' token-key: '{.credential.access_token}' {code} {{kubectl}}, because it uses {{client-go}}, supports the auth-provider and fetches the token and expiry from the json returne by config-helper. As Spark is using the fabric8 client, this is yet to be supported, breaking when running spark-submit: {code:java} Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://my-endpoint/api/v1/namespaces/my-namespace/pods. Message: Forbidden! User gke_my-project_my-region_my-cluster doesn't have permission. pods is forbidden: User "system:anonymous" cannot create resource "pods" in API group "" in the namespace "my-namespace". {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36366) Google Kubernetes Engine authentication fails
[ https://issues.apache.org/jira/browse/SPARK-36366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tiago Reis updated SPARK-36366: --- Description: When connecting to a Google Kubernetes Engine, a command {{gcloud container clusters get-credentials}} is used that generates a {{~/.kube/config}} file. The distinctive trait in this config file is that it uses an {{auth-provider}} relying on {{gcloud}} to inject the keys {{expiry}} and {{access-token}} from the general Google SDK auth config, as seen here: {code:json} users: - name: gke_my-project_my-region_my-cluster user: auth-provider: config: cmd-args: config config-helper --format=json cmd-path: /Users/user/google-cloud-sdk/bin/gcloud expiry-key: '{.credential.token_expiry}' token-key: '{.credential.access_token}' {code} {{kubectl}}, because it uses {{client-go}}, supports the auth-provider and fetches the token and expiry from the json returne by config-helper. As Spark is using the fabric8 client, this is yet to be supported, breaking when running spark-submit: {code:java} Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://my-endpoint/api/v1/namespaces/my-namespace/pods. Message: Forbidden! User gke_my-project_my-region_my-cluster doesn't have permission. pods is forbidden: User "system:anonymous" cannot create resource "pods" in API group "" in the namespace "my-namespace". {code} was: When connecting to a Google Kubernetes Engine, a command {{gcloud container clusters get-credentials}} is used that generates a {{~/.kube/config}} file. The distinctive trait in this config file is that it uses an {{auth-provider}} relying on {{gcloud}} to inject the keys {{expiry}} and {{access-token}} from the general Google SDK auth config, as seen here: {code:json} users: - name: gke_my-project_my-region_my-cluster user: auth-provider: config: cmd-args: config config-helper --format=json cmd-path: /Users/reist01/google-cloud-sdk/bin/gcloud expiry-key: '{.credential.token_expiry}' token-key: '{.credential.access_token}' {code} {{kubectl}}, because it uses {{client-go}}, supports the auth-provider and fetches the token and expiry from the json returne by config-helper. As Spark is using the fabric8 client, this is yet to be supported, breaking when running spark-submit: {code:java} Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://my-endpoint/api/v1/namespaces/my-namespace/pods. Message: Forbidden! User gke_my-project_my-region_my-cluster doesn't have permission. pods is forbidden: User "system:anonymous" cannot create resource "pods" in API group "" in the namespace "my-namespace". {code} > Google Kubernetes Engine authentication fails > - > > Key: SPARK-36366 > URL: https://issues.apache.org/jira/browse/SPARK-36366 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.1.2 > Environment: {code} > $ kubectl version > Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.15", > GitCommit:"73dd5c840662bb066a146d0871216333181f4b64", GitTreeState:"clean", > BuildDate:"2021-01-13T13:22:41Z", GoVersion:"go1.13.15", Compiler:"gc", > Platform:"darwin/amd64"} > Server Version: version.Info{Major:"1", Minor:"18+", > GitVersion:"v1.18.19-gke.1701", > GitCommit:"d7cecefb99b58e8968f59b59d76448eb1e6ea403", GitTreeState:"clean", > BuildDate:"2021-06-23T21:51:59Z", GoVersion:"go1.13.15b4", Compiler:"gc", > Platform:"linux/amd64"} > $ spark-submit --version > version 3.1.2 > Using Scala version 2.12.10, OpenJDK 64-Bit Server VM, 11.0.10 > {code} >Reporter: Tiago Reis >Priority: Minor > Labels: google, kubernetes, kubernetesexecutor, newbie > > When connecting to a Google Kubernetes Engine, a command {{gcloud container > clusters get-credentials}} is used that generates a {{~/.kube/config}} file. > The distinctive trait in this config file is that it uses an > {{auth-provider}} relying on {{gcloud}} to inject the keys {{expiry}} and > {{access-token}} from the general Google SDK auth config, as seen here: > {code:json} > users: > - name: gke_my-project_my-region_my-cluster > user: > auth-provider: > config: > cmd-args: config config-helper --format=json > cmd-path: /Users/user/google-cloud-sdk/bin/gcloud > expiry-key: '{.credential.token_expiry}' > token-key: '{.credential.access_token}' > {code} > {{kubectl}}, because it uses {{client-go}}, supports the auth-provider and > fetches the token and expiry from the json returne by config-helper. As Spark > is using the fabric8 client, this is yet to be supported, breaking when
[jira] [Assigned] (SPARK-36365) Remove old workarounds related to null ordering.
[ https://issues.apache.org/jira/browse/SPARK-36365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36365: Assignee: (was: Apache Spark) > Remove old workarounds related to null ordering. > > > Key: SPARK-36365 > URL: https://issues.apache.org/jira/browse/SPARK-36365 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > In pandas-on-Spark, there are still some remaining places to call > {{Column._jc.(asc|desc)_nulls_(first|last)}} as a workaround from Koalas to > support Spark 2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36365) Remove old workarounds related to null ordering.
[ https://issues.apache.org/jira/browse/SPARK-36365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36365: Assignee: Apache Spark > Remove old workarounds related to null ordering. > > > Key: SPARK-36365 > URL: https://issues.apache.org/jira/browse/SPARK-36365 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > > In pandas-on-Spark, there are still some remaining places to call > {{Column._jc.(asc|desc)_nulls_(first|last)}} as a workaround from Koalas to > support Spark 2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36365) Remove old workarounds related to null ordering.
[ https://issues.apache.org/jira/browse/SPARK-36365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390813#comment-17390813 ] Apache Spark commented on SPARK-36365: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/33597 > Remove old workarounds related to null ordering. > > > Key: SPARK-36365 > URL: https://issues.apache.org/jira/browse/SPARK-36365 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > In pandas-on-Spark, there are still some remaining places to call > {{Column._jc.(asc|desc)_nulls_(first|last)}} as a workaround from Koalas to > support Spark 2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36365) Remove old workarounds related to null ordering.
[ https://issues.apache.org/jira/browse/SPARK-36365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-36365: -- Summary: Remove old workarounds related to null ordering. (was: Remove old workarounds related to ordering.) > Remove old workarounds related to null ordering. > > > Key: SPARK-36365 > URL: https://issues.apache.org/jira/browse/SPARK-36365 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > > In pandas-on-Spark, there are still some remaining places to call > {{Column._jc.(asc|desc)_nulls_(first|last)}} as a workaround from Koalas to > support Spark 2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36338) Move distributed-sequence implementation to Scala side
[ https://issues.apache.org/jira/browse/SPARK-36338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390807#comment-17390807 ] Apache Spark commented on SPARK-36338: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/33596 > Move distributed-sequence implementation to Scala side > -- > > Key: SPARK-36338 > URL: https://issues.apache.org/jira/browse/SPARK-36338 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.2.0 > > > https://github.com/apache/spark/blob/c22f7a4834e6fb7b69c4cc4af87c61c2fbbe0786/python/pyspark/pandas/internal.py#L925-L945 > This can be implemented in JVM side to make it more performance without extra > serializations, and working around the nullability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36345: Assignee: Dongjoon Hyun > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Dongjoon Hyun >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36345. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33595 [https://github.com/apache/spark/pull/33595] > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36365) Remove old workarounds related to ordering.
Takuya Ueshin created SPARK-36365: - Summary: Remove old workarounds related to ordering. Key: SPARK-36365 URL: https://issues.apache.org/jira/browse/SPARK-36365 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Takuya Ueshin In pandas-on-Spark, there are still some remaining places to call {{Column._jc.(asc|desc)_nulls_(first|last)}} as a workaround from Koalas to support Spark 2.3. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36364) Move window and aggregate functions to DataTypeOps
Xinrong Meng created SPARK-36364: Summary: Move window and aggregate functions to DataTypeOps Key: SPARK-36364 URL: https://issues.apache.org/jira/browse/SPARK-36364 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36140) Replace DataTypeOps tests that have operations on different Series
[ https://issues.apache.org/jira/browse/SPARK-36140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng resolved SPARK-36140. -- Resolution: Done > Replace DataTypeOps tests that have operations on different Series > -- > > Key: SPARK-36140 > URL: https://issues.apache.org/jira/browse/SPARK-36140 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Replace DataTypeOps tests that have operations on different Series for a > shorter test duration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36345: Assignee: (was: Apache Spark) > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390791#comment-17390791 ] Apache Spark commented on SPARK-36345: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33595 > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390790#comment-17390790 ] Apache Spark commented on SPARK-36345: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33595 > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36345: Assignee: Apache Spark > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Apache Spark >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-35881: -- Fix Version/s: 3.2.0 > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.2.0, 3.3.0 > > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-35881: -- Fix Version/s: (was: 3.2.0) > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.3.0 > > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36363) AKS SPark UI does not have executor tab showing up
Koushik created SPARK-36363: --- Summary: AKS SPark UI does not have executor tab showing up Key: SPARK-36363 URL: https://issues.apache.org/jira/browse/SPARK-36363 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: Koushik Spark UI Executor tab showing blank and i see the below error in the network tab : https://keplerfnet-aks-prod.az.3pc.att.com/proxy:10.128.0.76:4043/executors/ Failed to load resource: the server responded with a status of 404 () DevTools failed to load source map: Could not load content for [https://keplerfnet-aks-prod.az.3pc.att.com/proxy:10.128.0.76:4043/static/vis.map|https://ind01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeplerfnet-aks-prod.az.3pc.att.com%2Fproxy%3A10.128.0.76%3A4043%2Fstatic%2Fvis.map=04%7C01%7CKoushik.Gopal%40TechMahindra.com%7C71ec48c8fa8d4ecc123908d95388dd8e%7Cedf442f5b9944c86a131b42b03a16c95%7C0%7C0%7C637632669559893674%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000=UyrYVdDO4vfzwq4%2Fl4GHN6Gm8QC%2FMrvrGMl50FUCGrI%3D=0]: HTTP error: status code 502, net::ERR_HTTP_RESPONSE_CODE_FAILURE -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-35881. --- Fix Version/s: 3.3.0 3.2.0 Resolution: Fixed > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 3.2.0, 3.3.0 > > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35881) [SQL] AQE does not support columnar execution for the final query stage
[ https://issues.apache.org/jira/browse/SPARK-35881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves reassigned SPARK-35881: - Assignee: Andy Grove > [SQL] AQE does not support columnar execution for the final query stage > --- > > Key: SPARK-35881 > URL: https://issues.apache.org/jira/browse/SPARK-35881 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > > In AdaptiveSparkPlanExec, a query is broken down into stages and these stages > are executed until the entire query has been executed. These stages can be > row-based or columnar. However, the final stage, produced by the private > getFinalPhysicalPlan method is always assumed to be row-based. The only way > to execute the final stage is by calling the various doExecute methods on > AdaptiveSparkPlanExec, and doExecuteColumnar is not implemented. The > supportsColumnar method also always returns false. > In the RAPIDS Accelerator for Apache Spark, we currently call the private > getFinalPhysicalPlan method using reflection and then determine if that plan > is columnar or not, and then call the appropriate doExecute method, bypassing > the doExecute methods on AdaptiveSparkPlanExec. We would like a supported > mechanism for executing a columnar AQE plan so that we do not need to use > reflection. > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36350) Make nanvl work with DataTypeOps
[ https://issues.apache.org/jira/browse/SPARK-36350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-36350. --- Fix Version/s: 3.2.0 Assignee: Takuya Ueshin Resolution: Fixed Issue resolved by pull request 33582 https://github.com/apache/spark/pull/33582 > Make nanvl work with DataTypeOps > > > Key: SPARK-36350 > URL: https://issues.apache.org/jira/browse/SPARK-36350 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.2.0 > > > We can move some logic related to {{F.nanvl}} to {{DataTypeOps}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36362) Omnibus Java code static analyzer warning fixes
[ https://issues.apache.org/jira/browse/SPARK-36362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390705#comment-17390705 ] Apache Spark commented on SPARK-36362: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/33594 > Omnibus Java code static analyzer warning fixes > --- > > Key: SPARK-36362 > URL: https://issues.apache.org/jira/browse/SPARK-36362 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Tests >Affects Versions: 3.2.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > > Inspired by a recent Java code touch-up, I wanted to fix in one pass several > lingering non-trivial issues with the Java code that a static analyzer turns > up. Only a few of these have material effects, but some do, and figured we > could avoid taking N PRs over time to address these. > * Some int*int multiplications that widen to long maybe could overflow > * Unnecessarily non-static inner classes > * Some tests "catch (AssertionError)" and do nothing > * Manual array iteration vs very slightly faster/simpler foreach > * Incorrect generic types that just happen to not cause a runtime error > * Missed opportunities for try-close > * Mutable enums which shouldn't be > * .. and a few other minor things -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36362) Omnibus Java code static analyzer warning fixes
[ https://issues.apache.org/jira/browse/SPARK-36362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36362: Assignee: Apache Spark (was: Sean R. Owen) > Omnibus Java code static analyzer warning fixes > --- > > Key: SPARK-36362 > URL: https://issues.apache.org/jira/browse/SPARK-36362 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Tests >Affects Versions: 3.2.0 >Reporter: Sean R. Owen >Assignee: Apache Spark >Priority: Minor > > Inspired by a recent Java code touch-up, I wanted to fix in one pass several > lingering non-trivial issues with the Java code that a static analyzer turns > up. Only a few of these have material effects, but some do, and figured we > could avoid taking N PRs over time to address these. > * Some int*int multiplications that widen to long maybe could overflow > * Unnecessarily non-static inner classes > * Some tests "catch (AssertionError)" and do nothing > * Manual array iteration vs very slightly faster/simpler foreach > * Incorrect generic types that just happen to not cause a runtime error > * Missed opportunities for try-close > * Mutable enums which shouldn't be > * .. and a few other minor things -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36362) Omnibus Java code static analyzer warning fixes
[ https://issues.apache.org/jira/browse/SPARK-36362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36362: Assignee: Sean R. Owen (was: Apache Spark) > Omnibus Java code static analyzer warning fixes > --- > > Key: SPARK-36362 > URL: https://issues.apache.org/jira/browse/SPARK-36362 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL, Tests >Affects Versions: 3.2.0 >Reporter: Sean R. Owen >Assignee: Sean R. Owen >Priority: Minor > > Inspired by a recent Java code touch-up, I wanted to fix in one pass several > lingering non-trivial issues with the Java code that a static analyzer turns > up. Only a few of these have material effects, but some do, and figured we > could avoid taking N PRs over time to address these. > * Some int*int multiplications that widen to long maybe could overflow > * Unnecessarily non-static inner classes > * Some tests "catch (AssertionError)" and do nothing > * Manual array iteration vs very slightly faster/simpler foreach > * Incorrect generic types that just happen to not cause a runtime error > * Missed opportunities for try-close > * Mutable enums which shouldn't be > * .. and a few other minor things -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36362) Omnibus Java code static analyzer warning fixes
Sean R. Owen created SPARK-36362: Summary: Omnibus Java code static analyzer warning fixes Key: SPARK-36362 URL: https://issues.apache.org/jira/browse/SPARK-36362 Project: Spark Issue Type: Improvement Components: Spark Core, SQL, Tests Affects Versions: 3.2.0 Reporter: Sean R. Owen Assignee: Sean R. Owen Inspired by a recent Java code touch-up, I wanted to fix in one pass several lingering non-trivial issues with the Java code that a static analyzer turns up. Only a few of these have material effects, but some do, and figured we could avoid taking N PRs over time to address these. * Some int*int multiplications that widen to long maybe could overflow * Unnecessarily non-static inner classes * Some tests "catch (AssertionError)" and do nothing * Manual array iteration vs very slightly faster/simpler foreach * Incorrect generic types that just happen to not cause a runtime error * Missed opportunities for try-close * Mutable enums which shouldn't be * .. and a few other minor things -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
[ https://issues.apache.org/jira/browse/SPARK-36358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36358. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33593 [https://github.com/apache/spark/pull/33593] > Upgrade Kubernetes Client Version to 5.6.0 > -- > > Key: SPARK-36358 > URL: https://issues.apache.org/jira/browse/SPARK-36358 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.3.0 > > > This way [Retry HTTP operation in case IOException too (exponential > backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be > included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
[ https://issues.apache.org/jira/browse/SPARK-36358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36358: Assignee: Attila Zsolt Piros (was: Apache Spark) > Upgrade Kubernetes Client Version to 5.6.0 > -- > > Key: SPARK-36358 > URL: https://issues.apache.org/jira/browse/SPARK-36358 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > > This way [Retry HTTP operation in case IOException too (exponential > backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be > included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
[ https://issues.apache.org/jira/browse/SPARK-36358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390627#comment-17390627 ] Apache Spark commented on SPARK-36358: -- User 'attilapiros' has created a pull request for this issue: https://github.com/apache/spark/pull/33593 > Upgrade Kubernetes Client Version to 5.6.0 > -- > > Key: SPARK-36358 > URL: https://issues.apache.org/jira/browse/SPARK-36358 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > > This way [Retry HTTP operation in case IOException too (exponential > backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be > included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
[ https://issues.apache.org/jira/browse/SPARK-36358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390626#comment-17390626 ] Apache Spark commented on SPARK-36358: -- User 'attilapiros' has created a pull request for this issue: https://github.com/apache/spark/pull/33593 > Upgrade Kubernetes Client Version to 5.6.0 > -- > > Key: SPARK-36358 > URL: https://issues.apache.org/jira/browse/SPARK-36358 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > > This way [Retry HTTP operation in case IOException too (exponential > backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be > included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
[ https://issues.apache.org/jira/browse/SPARK-36358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36358: Assignee: Apache Spark (was: Attila Zsolt Piros) > Upgrade Kubernetes Client Version to 5.6.0 > -- > > Key: SPARK-36358 > URL: https://issues.apache.org/jira/browse/SPARK-36358 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Attila Zsolt Piros >Assignee: Apache Spark >Priority: Major > > This way [Retry HTTP operation in case IOException too (exponential > backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be > included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36360) StreamingSource duplicates appName
[ https://issues.apache.org/jira/browse/SPARK-36360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36360: Assignee: Apache Spark > StreamingSource duplicates appName > -- > > Key: SPARK-36360 > URL: https://issues.apache.org/jira/browse/SPARK-36360 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Marcel Neumann >Assignee: Apache Spark >Priority: Minor > > The StreamingSource includes the appName in its sourceName. This is not > desired for people using a custom namespace for metrics reporting using > {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} > will still be included in the name of the metric. Using a metrics namespace > results in a duplicated indicator for {{spark.app.name}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36360) StreamingSource duplicates appName
[ https://issues.apache.org/jira/browse/SPARK-36360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36360: Assignee: (was: Apache Spark) > StreamingSource duplicates appName > -- > > Key: SPARK-36360 > URL: https://issues.apache.org/jira/browse/SPARK-36360 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Marcel Neumann >Priority: Minor > > The StreamingSource includes the appName in its sourceName. This is not > desired for people using a custom namespace for metrics reporting using > {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} > will still be included in the name of the metric. Using a metrics namespace > results in a duplicated indicator for {{spark.app.name}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36360) StreamingSource duplicates appName
[ https://issues.apache.org/jira/browse/SPARK-36360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390585#comment-17390585 ] Apache Spark commented on SPARK-36360: -- User 'mrclneumann' has created a pull request for this issue: https://github.com/apache/spark/pull/33592 > StreamingSource duplicates appName > -- > > Key: SPARK-36360 > URL: https://issues.apache.org/jira/browse/SPARK-36360 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Marcel Neumann >Priority: Minor > > The StreamingSource includes the appName in its sourceName. This is not > desired for people using a custom namespace for metrics reporting using > {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} > will still be included in the name of the metric. Using a metrics namespace > results in a duplicated indicator for {{spark.app.name}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36338) Move distributed-sequence implementation to Scala side
[ https://issues.apache.org/jira/browse/SPARK-36338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36338. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33570 [https://github.com/apache/spark/pull/33570] > Move distributed-sequence implementation to Scala side > -- > > Key: SPARK-36338 > URL: https://issues.apache.org/jira/browse/SPARK-36338 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.2.0 > > > https://github.com/apache/spark/blob/c22f7a4834e6fb7b69c4cc4af87c61c2fbbe0786/python/pyspark/pandas/internal.py#L925-L945 > This can be implemented in JVM side to make it more performance without extra > serializations, and working around the nullability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36338) Move distributed-sequence implementation to Scala side
[ https://issues.apache.org/jira/browse/SPARK-36338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36338: Assignee: Hyukjin Kwon > Move distributed-sequence implementation to Scala side > -- > > Key: SPARK-36338 > URL: https://issues.apache.org/jira/browse/SPARK-36338 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > https://github.com/apache/spark/blob/c22f7a4834e6fb7b69c4cc4af87c61c2fbbe0786/python/pyspark/pandas/internal.py#L925-L945 > This can be implemented in JVM side to make it more performance without extra > serializations, and working around the nullability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36092) Migrate to GitHub Actions Codecov from Jenkins
[ https://issues.apache.org/jira/browse/SPARK-36092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36092: Assignee: Apache Spark > Migrate to GitHub Actions Codecov from Jenkins > -- > > Key: SPARK-36092 > URL: https://issues.apache.org/jira/browse/SPARK-36092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Assignee: Apache Spark >Priority: Major > > We currently uses manual Codecov site to work around our Jenkins CI security > issue. Now we use GitHub Actions so we can leverage Codecov to report the > coverage for PySpark. > See also https://github.com/codecov/codecov-action -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36092) Migrate to GitHub Actions Codecov from Jenkins
[ https://issues.apache.org/jira/browse/SPARK-36092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390580#comment-17390580 ] Apache Spark commented on SPARK-36092: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/33591 > Migrate to GitHub Actions Codecov from Jenkins > -- > > Key: SPARK-36092 > URL: https://issues.apache.org/jira/browse/SPARK-36092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > We currently uses manual Codecov site to work around our Jenkins CI security > issue. Now we use GitHub Actions so we can leverage Codecov to report the > coverage for PySpark. > See also https://github.com/codecov/codecov-action -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36092) Migrate to GitHub Actions Codecov from Jenkins
[ https://issues.apache.org/jira/browse/SPARK-36092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36092: Assignee: (was: Apache Spark) > Migrate to GitHub Actions Codecov from Jenkins > -- > > Key: SPARK-36092 > URL: https://issues.apache.org/jira/browse/SPARK-36092 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.2.0 >Reporter: Hyukjin Kwon >Priority: Major > > We currently uses manual Codecov site to work around our Jenkins CI security > issue. Now we use GitHub Actions so we can leverage Codecov to report the > coverage for PySpark. > See also https://github.com/codecov/codecov-action -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36361) Install coverage in Python 3.9 and PyPy 3 in GitHub Actions image
Hyukjin Kwon created SPARK-36361: Summary: Install coverage in Python 3.9 and PyPy 3 in GitHub Actions image Key: SPARK-36361 URL: https://issues.apache.org/jira/browse/SPARK-36361 Project: Spark Issue Type: Improvement Components: Build, Project Infra Affects Versions: 3.3.0 Reporter: Hyukjin Kwon SPARK-36092 requires coverage package to be installed in both Python 3.9 and PyPy. Currently this is being manually installed. To save installtation time, it would be great to have them installed in the image we use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36360) StreamingSource duplicates appName
[ https://issues.apache.org/jira/browse/SPARK-36360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Neumann updated SPARK-36360: --- Description: The StreamingSource includes the appName in its sourceName. This is not desired for people using a custom namespace for metrics reporting using {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} will still be included in the name of the metric. Using a metrics namespace results in a duplicated indicator for {{spark.app.name}}. (was: The StreamingSource includes the appName in its sourceName. This is not desired for people using a custom namespace for metrics reporting using {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} will still be included in the name of the metric. Not using a metrics namespace results in a duplicated indicator for {{spark.app.name}}.) > StreamingSource duplicates appName > -- > > Key: SPARK-36360 > URL: https://issues.apache.org/jira/browse/SPARK-36360 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Marcel Neumann >Priority: Minor > > The StreamingSource includes the appName in its sourceName. This is not > desired for people using a custom namespace for metrics reporting using > {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} > will still be included in the name of the metric. Using a metrics namespace > results in a duplicated indicator for {{spark.app.name}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36360) StreamingSource duplicates appName
Marcel Neumann created SPARK-36360: -- Summary: StreamingSource duplicates appName Key: SPARK-36360 URL: https://issues.apache.org/jira/browse/SPARK-36360 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.1.2 Reporter: Marcel Neumann The StreamingSource includes the appName in its sourceName. This is not desired for people using a custom namespace for metrics reporting using {{spark.metrics.namespace}} configuration property as the {{spark.app.name}} will still be included in the name of the metric. Not using a metrics namespace results in a duplicated indicator for {{spark.app.name}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36359) Coalesce returns the first expression if it is non nullable
[ https://issues.apache.org/jira/browse/SPARK-36359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36359: Assignee: Apache Spark > Coalesce returns the first expression if it is non nullable > --- > > Key: SPARK-36359 > URL: https://issues.apache.org/jira/browse/SPARK-36359 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36254) Install mlflow/sklearn in Github Actions CI
[ https://issues.apache.org/jira/browse/SPARK-36254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390538#comment-17390538 ] Apache Spark commented on SPARK-36254: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/33589 > Install mlflow/sklearn in Github Actions CI > --- > > Key: SPARK-36254 > URL: https://issues.apache.org/jira/browse/SPARK-36254 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > Since the pandas-on-Spark includes the mlflow features and related tests, we > should install the mlflow and its dependencies our Github Actions CI so that > the test won't be skipped from Spark 3.2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36359) Coalesce returns the first expression if it is non nullable
[ https://issues.apache.org/jira/browse/SPARK-36359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390537#comment-17390537 ] Apache Spark commented on SPARK-36359: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/33590 > Coalesce returns the first expression if it is non nullable > --- > > Key: SPARK-36359 > URL: https://issues.apache.org/jira/browse/SPARK-36359 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36359) Coalesce returns the first expression if it is non nullable
[ https://issues.apache.org/jira/browse/SPARK-36359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36359: Assignee: (was: Apache Spark) > Coalesce returns the first expression if it is non nullable > --- > > Key: SPARK-36359 > URL: https://issues.apache.org/jira/browse/SPARK-36359 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36359) Coalesce returns the first expression if it is non nullable
Yuming Wang created SPARK-36359: --- Summary: Coalesce returns the first expression if it is non nullable Key: SPARK-36359 URL: https://issues.apache.org/jira/browse/SPARK-36359 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
[ https://issues.apache.org/jira/browse/SPARK-36358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Attila Zsolt Piros updated SPARK-36358: --- Description: This way [Retry HTTP operation in case IOException too (exponential backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be included (was: This way [https://github.com/fabric8io/kubernetes-client/pull/3293|Retry HTTP operation in case IOException too (exponential backoff)] will be included) > Upgrade Kubernetes Client Version to 5.6.0 > -- > > Key: SPARK-36358 > URL: https://issues.apache.org/jira/browse/SPARK-36358 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.3.0 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > > This way [Retry HTTP operation in case IOException too (exponential > backoff)|https://github.com/fabric8io/kubernetes-client/pull/3293] will be > included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36358) Upgrade Kubernetes Client Version to 5.6.0
Attila Zsolt Piros created SPARK-36358: -- Summary: Upgrade Kubernetes Client Version to 5.6.0 Key: SPARK-36358 URL: https://issues.apache.org/jira/browse/SPARK-36358 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.3.0 Reporter: Attila Zsolt Piros Assignee: Attila Zsolt Piros This way [https://github.com/fabric8io/kubernetes-client/pull/3293|Retry HTTP operation in case IOException too (exponential backoff)] will be included -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28330) ANSI SQL: Top-level in
[ https://issues.apache.org/jira/browse/SPARK-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390508#comment-17390508 ] Alexander Bij commented on SPARK-28330: --- I'm looking forward to this feature! I noticed it is absends when using DBeaver sql-client (simba-spark driver) to look at Table data. It's downloading full datasets when viewing tables. Comparing it to Hive SQL offset is implemented and working in DBeaver, scrolling pages when looking at tables. All the PR's are closed (not merged) and mentioned the work is suspended (at 27-april-2021) _At lease you can upvote the feature to raise importance_ > ANSI SQL: Top-level in > > > Key: SPARK-28330 > URL: https://issues.apache.org/jira/browse/SPARK-28330 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > h2. {{LIMIT}} and {{OFFSET}} > LIMIT and OFFSET allow you to retrieve just a portion of the rows that are > generated by the rest of the query: > {noformat} > SELECT select_list > FROM table_expression > [ ORDER BY ... ] > [ LIMIT { number | ALL } ] [ OFFSET number ] > {noformat} > If a limit count is given, no more than that many rows will be returned (but > possibly fewer, if the query itself yields fewer rows). LIMIT ALL is the same > as omitting the LIMIT clause, as is LIMIT with a NULL argument. > OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 > is the same as omitting the OFFSET clause, as is OFFSET with a NULL argument. > If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting > to count the LIMIT rows that are returned. > https://www.postgresql.org/docs/11/queries-limit.html > *Feature ID*: F861 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36346) Support TimestampNTZ type in Orc file source
[ https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36346: Assignee: (was: Apache Spark) > Support TimestampNTZ type in Orc file source > > > Key: SPARK-36346 > URL: https://issues.apache.org/jira/browse/SPARK-36346 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > As per https://orc.apache.org/docs/types.html, Orc supports both > TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): > * A TIMESTAMP => TIMESTAMP_LTZ > * Timestamp with local time zone => TIMESTAMP_NTZ > In Spark 3.1 or prior, Spark only considered TIMESTAMP. > Since 3.2, with the support of timestamp without time zone type: > * Orc writer follows the definition and uses "Timestamp with local time zone" > on writing TIMESTAMP_NTZ. > * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36346) Support TimestampNTZ type in Orc file source
[ https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36346: Assignee: Apache Spark > Support TimestampNTZ type in Orc file source > > > Key: SPARK-36346 > URL: https://issues.apache.org/jira/browse/SPARK-36346 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > As per https://orc.apache.org/docs/types.html, Orc supports both > TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): > * A TIMESTAMP => TIMESTAMP_LTZ > * Timestamp with local time zone => TIMESTAMP_NTZ > In Spark 3.1 or prior, Spark only considered TIMESTAMP. > Since 3.2, with the support of timestamp without time zone type: > * Orc writer follows the definition and uses "Timestamp with local time zone" > on writing TIMESTAMP_NTZ. > * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36346) Support TimestampNTZ type in Orc file source
[ https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390506#comment-17390506 ] Apache Spark commented on SPARK-36346: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/33588 > Support TimestampNTZ type in Orc file source > > > Key: SPARK-36346 > URL: https://issues.apache.org/jira/browse/SPARK-36346 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > As per https://orc.apache.org/docs/types.html, Orc supports both > TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): > * A TIMESTAMP => TIMESTAMP_LTZ > * Timestamp with local time zone => TIMESTAMP_NTZ > In Spark 3.1 or prior, Spark only considered TIMESTAMP. > Since 3.2, with the support of timestamp without time zone type: > * Orc writer follows the definition and uses "Timestamp with local time zone" > on writing TIMESTAMP_NTZ. > * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36357) Support pushdown Timestamp with local time zone for orc
[ https://issues.apache.org/jira/browse/SPARK-36357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390504#comment-17390504 ] jiaan.geng commented on SPARK-36357: I'm working on. > Support pushdown Timestamp with local time zone for orc > --- > > Key: SPARK-36357 > URL: https://issues.apache.org/jira/browse/SPARK-36357 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36357) Support pushdown Timestamp with local time zone for orc
jiaan.geng created SPARK-36357: -- Summary: Support pushdown Timestamp with local time zone for orc Key: SPARK-36357 URL: https://issues.apache.org/jira/browse/SPARK-36357 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36346) Support TimestampNTZ type in Orc file source
[ https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36346: --- Description: As per https://orc.apache.org/docs/types.html, Orc supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): * A TIMESTAMP => TIMESTAMP_LTZ * Timestamp with local time zone => TIMESTAMP_NTZ In Spark 3.1 or prior, Spark only considered TIMESTAMP. Since 3.2, with the support of timestamp without time zone type: * Orc writer follows the definition and uses "Timestamp with local time zone" on writing TIMESTAMP_NTZ. * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. was: As per https://orc.apache.org/docs/types.html, Orc supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): A TIMESTAMP => TIMESTAMP_LTZ Timestamp with local time zone => TIMESTAMP_NTZ In Spark 3.1 or prior, Spark only considered TIMESTAMP. Since 3.2, with the support of timestamp without time zone type: * Orc writer follows the definition and uses "Timestamp with local time zone" on writing TIMESTAMP_NTZ. * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. > Support TimestampNTZ type in Orc file source > > > Key: SPARK-36346 > URL: https://issues.apache.org/jira/browse/SPARK-36346 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > As per https://orc.apache.org/docs/types.html, Orc supports both > TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): > * A TIMESTAMP => TIMESTAMP_LTZ > * Timestamp with local time zone => TIMESTAMP_NTZ > In Spark 3.1 or prior, Spark only considered TIMESTAMP. > Since 3.2, with the support of timestamp without time zone type: > * Orc writer follows the definition and uses "Timestamp with local time zone" > on writing TIMESTAMP_NTZ. > * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36346) Support TimestampNTZ type in Orc file source
[ https://issues.apache.org/jira/browse/SPARK-36346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36346: --- Description: As per https://orc.apache.org/docs/types.html, Orc supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): A TIMESTAMP => TIMESTAMP_LTZ Timestamp with local time zone => TIMESTAMP_NTZ In Spark 3.1 or prior, Spark only considered TIMESTAMP. Since 3.2, with the support of timestamp without time zone type: * Orc writer follows the definition and uses "Timestamp with local time zone" on writing TIMESTAMP_NTZ. * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. > Support TimestampNTZ type in Orc file source > > > Key: SPARK-36346 > URL: https://issues.apache.org/jira/browse/SPARK-36346 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > As per https://orc.apache.org/docs/types.html, Orc supports both > TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current default timestamp type): > A TIMESTAMP => TIMESTAMP_LTZ > Timestamp with local time zone => TIMESTAMP_NTZ > In Spark 3.1 or prior, Spark only considered TIMESTAMP. > Since 3.2, with the support of timestamp without time zone type: > * Orc writer follows the definition and uses "Timestamp with local time zone" > on writing TIMESTAMP_NTZ. > * Orc reader converts the "Timestamp with local time zone" to TIMESTAMP_NTZ. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36356) RemoveRedundantAlias should keep output schema
angerszhu created SPARK-36356: - Summary: RemoveRedundantAlias should keep output schema Key: SPARK-36356 URL: https://issues.apache.org/jira/browse/SPARK-36356 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36353) RemoveNoopOperators should keep output schema
[ https://issues.apache.org/jira/browse/SPARK-36353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-36353: -- Description: !image-2021-07-30-17-46-59-196.png|width=539,height=220! [https://github.com/apache/spark/pull/33587] Only first level? > RemoveNoopOperators should keep output schema > - > > Key: SPARK-36353 > URL: https://issues.apache.org/jira/browse/SPARK-36353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2021-07-30-17-46-59-196.png > > > !image-2021-07-30-17-46-59-196.png|width=539,height=220! > [https://github.com/apache/spark/pull/33587] > > Only first level? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36353) RemoveNoopOperators should keep output schema
[ https://issues.apache.org/jira/browse/SPARK-36353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390458#comment-17390458 ] angerszhu commented on SPARK-36353: --- Raise a pr soon > RemoveNoopOperators should keep output schema > - > > Key: SPARK-36353 > URL: https://issues.apache.org/jira/browse/SPARK-36353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2021-07-30-17-46-59-196.png > > > !image-2021-07-30-17-46-59-196.png|width=539,height=220! > [https://github.com/apache/spark/pull/33587] > > Only first level? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36353) RemoveNoopOperators should keep output schema
[ https://issues.apache.org/jira/browse/SPARK-36353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-36353: -- Attachment: image-2021-07-30-17-46-59-196.png > RemoveNoopOperators should keep output schema > - > > Key: SPARK-36353 > URL: https://issues.apache.org/jira/browse/SPARK-36353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > Attachments: image-2021-07-30-17-46-59-196.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36355) NamedExpression add method `withName(newName: String)`
[ https://issues.apache.org/jira/browse/SPARK-36355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390456#comment-17390456 ] Apache Spark commented on SPARK-36355: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33587 > NamedExpression add method `withName(newName: String)` > -- > > Key: SPARK-36355 > URL: https://issues.apache.org/jira/browse/SPARK-36355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36355) NamedExpression add method `withName(newName: String)`
[ https://issues.apache.org/jira/browse/SPARK-36355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36355: Assignee: Apache Spark > NamedExpression add method `withName(newName: String)` > -- > > Key: SPARK-36355 > URL: https://issues.apache.org/jira/browse/SPARK-36355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36355) NamedExpression add method `withName(newName: String)`
[ https://issues.apache.org/jira/browse/SPARK-36355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36355: Assignee: (was: Apache Spark) > NamedExpression add method `withName(newName: String)` > -- > > Key: SPARK-36355 > URL: https://issues.apache.org/jira/browse/SPARK-36355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36355) NamedExpression add method `withName(newName: String)`
angerszhu created SPARK-36355: - Summary: NamedExpression add method `withName(newName: String)` Key: SPARK-36355 URL: https://issues.apache.org/jira/browse/SPARK-36355 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36065) date_trunc returns incorrect output
[ https://issues.apache.org/jira/browse/SPARK-36065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390421#comment-17390421 ] Peter Toth commented on SPARK-36065: I think the output is correct as there was a time zone change (+00:02:16) at 1891-10-01 00:00:00 in Bratislava and that means that 1891-10-01 00:00:00 = 1891-10-01 00:02:16. I found this site that shows the TZ changes: https://www.timeanddate.com/time/zone/slovakia/bratislava > date_trunc returns incorrect output > --- > > Key: SPARK-36065 > URL: https://issues.apache.org/jira/browse/SPARK-36065 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Sumeet >Priority: Major > Labels: date_trunc, sql, timestamp > > Hi, > Running date_trunc on any hour of "1891-10-01" returns incorrect output for > "Europe/Bratislava" timezone. > Use the following steps in order to reproduce the issue: > * Run spark-shell using: > {code:java} > TZ="Europe/Bratislava" ./bin/spark-shell --conf > spark.driver.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf > spark.executor.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf > spark.sql.session.timeZone="Europe/Bratislava"{code} > * Generate test data: > {code:java} > ((0 until 9).map(hour => s"1891-10-01 00:0$hour:00") ++ (10 until > 24).map(hour => s"1891-10-01 > 00:$hour:00")).toDF("ts_string").createOrReplaceTempView("temp_ts") > {code} > * Run query: > {code:java} > sql("select ts_string, cast(ts_string as TIMESTAMP) as ts, date_trunc('day', > ts_string) from temp_ts").show(false) > {code} > * Output: > {code:java} > +---+---+--+ > |ts_string |ts |date_trunc(day, ts_string)| > +---+---+--+ > |1891-10-01 00:00:00|1891-10-01 00:02:16|1891-10-01 00:02:16 | > |1891-10-01 00:01:00|1891-10-01 00:03:16|1891-10-01 00:02:16 | > |1891-10-01 00:02:00|1891-10-01 00:04:16|1891-10-01 00:02:16 | > |1891-10-01 00:03:00|1891-10-01 00:03:00|1891-10-01 00:02:16 | > |1891-10-01 00:04:00|1891-10-01 00:04:00|1891-10-01 00:02:16 | > |1891-10-01 00:05:00|1891-10-01 00:05:00|1891-10-01 00:02:16 | > |1891-10-01 00:06:00|1891-10-01 00:06:00|1891-10-01 00:02:16 | > |1891-10-01 00:07:00|1891-10-01 00:07:00|1891-10-01 00:02:16 | > |1891-10-01 00:08:00|1891-10-01 00:08:00|1891-10-01 00:02:16 | > |1891-10-01 00:10:00|1891-10-01 00:10:00|1891-10-01 00:02:16 | > |1891-10-01 00:11:00|1891-10-01 00:11:00|1891-10-01 00:02:16 | > |1891-10-01 00:12:00|1891-10-01 00:12:00|1891-10-01 00:02:16 | > |1891-10-01 00:13:00|1891-10-01 00:13:00|1891-10-01 00:02:16 | > |1891-10-01 00:14:00|1891-10-01 00:14:00|1891-10-01 00:02:16 | > |1891-10-01 00:15:00|1891-10-01 00:15:00|1891-10-01 00:02:16 | > |1891-10-01 00:16:00|1891-10-01 00:16:00|1891-10-01 00:02:16 | > |1891-10-01 00:17:00|1891-10-01 00:17:00|1891-10-01 00:02:16 | > |1891-10-01 00:18:00|1891-10-01 00:18:00|1891-10-01 00:02:16 | > |1891-10-01 00:19:00|1891-10-01 00:19:00|1891-10-01 00:02:16 | > |1891-10-01 00:20:00|1891-10-01 00:20:00|1891-10-01 00:02:16 | > +---+---+--+ > only showing top 20 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36354) EventLogFileReaders should not complain in case of no event log files
[ https://issues.apache.org/jira/browse/SPARK-36354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390390#comment-17390390 ] Apache Spark commented on SPARK-36354: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33586 > EventLogFileReaders should not complain in case of no event log files > - > > Key: SPARK-36354 > URL: https://issues.apache.org/jira/browse/SPARK-36354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > 21/07/30 07:38:26 WARN FsHistoryProvider: Error while reading new log > s3a://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771 > java.lang.IllegalArgumentException: requirement failed: Log directory must > contain at least one event log file! > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.history.RollingEventLogFilesFileReader.files$lzycompute(EventLogFileReaders.scala:216) > {code} > {code} > $ aws s3 ls s3://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771/ > 2021-06-26 22:31:40 0 > appstatus_spark-95b5c736c8e44037afcf152534d08771.inprogress > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36354) EventLogFileReaders should not complain in case of no event log files
[ https://issues.apache.org/jira/browse/SPARK-36354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36354: Assignee: Apache Spark > EventLogFileReaders should not complain in case of no event log files > - > > Key: SPARK-36354 > URL: https://issues.apache.org/jira/browse/SPARK-36354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > {code} > 21/07/30 07:38:26 WARN FsHistoryProvider: Error while reading new log > s3a://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771 > java.lang.IllegalArgumentException: requirement failed: Log directory must > contain at least one event log file! > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.history.RollingEventLogFilesFileReader.files$lzycompute(EventLogFileReaders.scala:216) > {code} > {code} > $ aws s3 ls s3://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771/ > 2021-06-26 22:31:40 0 > appstatus_spark-95b5c736c8e44037afcf152534d08771.inprogress > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36354) EventLogFileReaders should not complain in case of no event log files
[ https://issues.apache.org/jira/browse/SPARK-36354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36354: Assignee: (was: Apache Spark) > EventLogFileReaders should not complain in case of no event log files > - > > Key: SPARK-36354 > URL: https://issues.apache.org/jira/browse/SPARK-36354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > 21/07/30 07:38:26 WARN FsHistoryProvider: Error while reading new log > s3a://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771 > java.lang.IllegalArgumentException: requirement failed: Log directory must > contain at least one event log file! > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.history.RollingEventLogFilesFileReader.files$lzycompute(EventLogFileReaders.scala:216) > {code} > {code} > $ aws s3 ls s3://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771/ > 2021-06-26 22:31:40 0 > appstatus_spark-95b5c736c8e44037afcf152534d08771.inprogress > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36354) EventLogFileReaders should not complain in case of no event log files
[ https://issues.apache.org/jira/browse/SPARK-36354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390392#comment-17390392 ] Apache Spark commented on SPARK-36354: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/33586 > EventLogFileReaders should not complain in case of no event log files > - > > Key: SPARK-36354 > URL: https://issues.apache.org/jira/browse/SPARK-36354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2, 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > 21/07/30 07:38:26 WARN FsHistoryProvider: Error while reading new log > s3a://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771 > java.lang.IllegalArgumentException: requirement failed: Log directory must > contain at least one event log file! > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.deploy.history.RollingEventLogFilesFileReader.files$lzycompute(EventLogFileReaders.scala:216) > {code} > {code} > $ aws s3 ls s3://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771/ > 2021-06-26 22:31:40 0 > appstatus_spark-95b5c736c8e44037afcf152534d08771.inprogress > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36354) EventLogFileReaders should not complain in case of no event log files
Dongjoon Hyun created SPARK-36354: - Summary: EventLogFileReaders should not complain in case of no event log files Key: SPARK-36354 URL: https://issues.apache.org/jira/browse/SPARK-36354 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.2, 3.2.0 Reporter: Dongjoon Hyun {code} 21/07/30 07:38:26 WARN FsHistoryProvider: Error while reading new log s3a://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771 java.lang.IllegalArgumentException: requirement failed: Log directory must contain at least one event log file! at scala.Predef$.require(Predef.scala:281) at org.apache.spark.deploy.history.RollingEventLogFilesFileReader.files$lzycompute(EventLogFileReaders.scala:216) {code} {code} $ aws s3 ls s3://.../eventlog_v2_spark-95b5c736c8e44037afcf152534d08771/ 2021-06-26 22:31:40 0 appstatus_spark-95b5c736c8e44037afcf152534d08771.inprogress {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-35976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390374#comment-17390374 ] Apache Spark commented on SPARK-35976: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/33585 > Adjust `astype` method for ExtensionDtype in pandas API on Spark > > > Key: SPARK-35976 > URL: https://issues.apache.org/jira/browse/SPARK-35976 > Project: Spark > Issue Type: Story > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Currently, `astype` method for ExtensionDtype in pandas API on Spark is not > consistent with pandas. For example, > [https://github.com/apache/spark/pull/33095#discussion_r661704734.] > [https://github.com/apache/spark/pull/33095#discussion_r662623005.] > > We ought to fill in the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-35976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390372#comment-17390372 ] Apache Spark commented on SPARK-35976: -- User 'Yikun' has created a pull request for this issue: https://github.com/apache/spark/pull/33585 > Adjust `astype` method for ExtensionDtype in pandas API on Spark > > > Key: SPARK-35976 > URL: https://issues.apache.org/jira/browse/SPARK-35976 > Project: Spark > Issue Type: Story > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Currently, `astype` method for ExtensionDtype in pandas API on Spark is not > consistent with pandas. For example, > [https://github.com/apache/spark/pull/33095#discussion_r661704734.] > [https://github.com/apache/spark/pull/33095#discussion_r662623005.] > > We ought to fill in the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-35976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35976: Assignee: Apache Spark > Adjust `astype` method for ExtensionDtype in pandas API on Spark > > > Key: SPARK-35976 > URL: https://issues.apache.org/jira/browse/SPARK-35976 > Project: Spark > Issue Type: Story > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Currently, `astype` method for ExtensionDtype in pandas API on Spark is not > consistent with pandas. For example, > [https://github.com/apache/spark/pull/33095#discussion_r661704734.] > [https://github.com/apache/spark/pull/33095#discussion_r662623005.] > > We ought to fill in the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35976) Adjust `astype` method for ExtensionDtype in pandas API on Spark
[ https://issues.apache.org/jira/browse/SPARK-35976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35976: Assignee: (was: Apache Spark) > Adjust `astype` method for ExtensionDtype in pandas API on Spark > > > Key: SPARK-35976 > URL: https://issues.apache.org/jira/browse/SPARK-35976 > Project: Spark > Issue Type: Story > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > > Currently, `astype` method for ExtensionDtype in pandas API on Spark is not > consistent with pandas. For example, > [https://github.com/apache/spark/pull/33095#discussion_r661704734.] > [https://github.com/apache/spark/pull/33095#discussion_r662623005.] > > We ought to fill in the gap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36351) Separate partition filters and data filters in PushDownUtils
[ https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390364#comment-17390364 ] Apache Spark commented on SPARK-36351: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/33584 > Separate partition filters and data filters in PushDownUtils > > > Key: SPARK-36351 > URL: https://issues.apache.org/jira/browse/SPARK-36351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > > Currently, DSv2 partition filters and data filters are separated in > PruneFileSourcePartitions. It's better to separate these in PushDownUtils, > where we do filter/aggregate push down and column pruning, so we can still > push down aggregate for FileScan if the filers are only partition filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36351) Separate partition filters and data filters in PushDownUtils
[ https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36351: Assignee: (was: Apache Spark) > Separate partition filters and data filters in PushDownUtils > > > Key: SPARK-36351 > URL: https://issues.apache.org/jira/browse/SPARK-36351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Priority: Minor > > Currently, DSv2 partition filters and data filters are separated in > PruneFileSourcePartitions. It's better to separate these in PushDownUtils, > where we do filter/aggregate push down and column pruning, so we can still > push down aggregate for FileScan if the filers are only partition filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36351) Separate partition filters and data filters in PushDownUtils
[ https://issues.apache.org/jira/browse/SPARK-36351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36351: Assignee: Apache Spark > Separate partition filters and data filters in PushDownUtils > > > Key: SPARK-36351 > URL: https://issues.apache.org/jira/browse/SPARK-36351 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > Currently, DSv2 partition filters and data filters are separated in > PruneFileSourcePartitions. It's better to separate these in PushDownUtils, > where we do filter/aggregate push down and column pruning, so we can still > push down aggregate for FileScan if the filers are only partition filters. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory
[ https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390358#comment-17390358 ] Dongjoon Hyun edited comment on SPARK-36327 at 7/30/21, 7:22 AM: - I commented on the PR and looped other review, too. [~senthh]. was (Author: dongjoon): I commented on the PR and looped other review, too. > Spark sql creates staging dir inside database directory rather than creating > inside table directory > --- > > Key: SPARK-36327 > URL: https://issues.apache.org/jira/browse/SPARK-36327 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Senthil Kumar >Priority: Minor > > Spark sql creates staging dir inside database directory rather than creating > inside table directory. > > This arises only when viewfs:// is configured. When the location is hdfs://, > it doesn't occur. > > Based on further investigation in file *SaveAsHiveFile.scala*, I could see > that the directory hierarchy has been not properly handled for viewFS > condition. > Parent path(db path) is passed rather than passing the actual directory(table > location). > {{ > // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2 > private def newVersionExternalTempPath( > path: Path, > hadoopConf: Configuration, > stagingDir: String): Path = { > val extURI: URI = path.toUri > if (extURI.getScheme == "viewfs") > { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) } > else > { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), > "-ext-1") } > } > }} > Please refer below lines > === > if (extURI.getScheme == "viewfs") { > getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) > === -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36327) Spark sql creates staging dir inside database directory rather than creating inside table directory
[ https://issues.apache.org/jira/browse/SPARK-36327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390358#comment-17390358 ] Dongjoon Hyun commented on SPARK-36327: --- I commented on the PR and looped other review, too. > Spark sql creates staging dir inside database directory rather than creating > inside table directory > --- > > Key: SPARK-36327 > URL: https://issues.apache.org/jira/browse/SPARK-36327 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 3.1.2 >Reporter: Senthil Kumar >Priority: Minor > > Spark sql creates staging dir inside database directory rather than creating > inside table directory. > > This arises only when viewfs:// is configured. When the location is hdfs://, > it doesn't occur. > > Based on further investigation in file *SaveAsHiveFile.scala*, I could see > that the directory hierarchy has been not properly handled for viewFS > condition. > Parent path(db path) is passed rather than passing the actual directory(table > location). > {{ > // Mostly copied from Context.java#getExternalTmpPath of Hive 1.2 > private def newVersionExternalTempPath( > path: Path, > hadoopConf: Configuration, > stagingDir: String): Path = { > val extURI: URI = path.toUri > if (extURI.getScheme == "viewfs") > { getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) } > else > { new Path(getExternalScratchDir(extURI, hadoopConf, stagingDir), > "-ext-1") } > } > }} > Please refer below lines > === > if (extURI.getScheme == "viewfs") { > getExtTmpPathRelTo(path.getParent, hadoopConf, stagingDir) > === -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36254) Install mlflow in Github Actions CI
[ https://issues.apache.org/jira/browse/SPARK-36254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-36254. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33567 [https://github.com/apache/spark/pull/33567] > Install mlflow in Github Actions CI > --- > > Key: SPARK-36254 > URL: https://issues.apache.org/jira/browse/SPARK-36254 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > Since the pandas-on-Spark includes the mlflow features and related tests, we > should install the mlflow and its dependencies our Github Actions CI so that > the test won't be skipped from Spark 3.2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36254) Install mlflow/sklearn in Github Actions CI
[ https://issues.apache.org/jira/browse/SPARK-36254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36254: -- Summary: Install mlflow/sklearn in Github Actions CI (was: Install mlflow in Github Actions CI) > Install mlflow/sklearn in Github Actions CI > --- > > Key: SPARK-36254 > URL: https://issues.apache.org/jira/browse/SPARK-36254 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 3.2.0 > > > Since the pandas-on-Spark includes the mlflow features and related tests, we > should install the mlflow and its dependencies our Github Actions CI so that > the test won't be skipped from Spark 3.2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390341#comment-17390341 ] Dongjoon Hyun commented on SPARK-36345: --- Thank you for reporting. I revised the title and will take care of this, [~itholic] and [~hyukjin.kwon]. > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36254) Install mlflow in Github Actions CI
[ https://issues.apache.org/jira/browse/SPARK-36254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-36254: - Assignee: Haejoon Lee > Install mlflow in Github Actions CI > --- > > Key: SPARK-36254 > URL: https://issues.apache.org/jira/browse/SPARK-36254 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Since the pandas-on-Spark includes the mlflow features and related tests, we > should install the mlflow and its dependencies our Github Actions CI so that > the test won't be skipped from Spark 3.2. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36345) Add mlflow/sklearn to GHA docker image
[ https://issues.apache.org/jira/browse/SPARK-36345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36345: -- Summary: Add mlflow/sklearn to GHA docker image (was: Create docker image that has mlflow and sklearn.) > Add mlflow/sklearn to GHA docker image > -- > > Key: SPARK-36345 > URL: https://issues.apache.org/jira/browse/SPARK-36345 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark, Tests >Affects Versions: 3.2.0 >Reporter: Haejoon Lee >Priority: Major > > In Github Actions CI, we install the `mlflow>=1.0` and `sklearn` in the step > "List Python packages (Python 3.9)" of "pyspark" job. > > We can reduce the cost of CI by creating the image that has pre-installed > both package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36093) (RemoveRedundantAliases should keep output schema name) The result incorrect if the partition path case is inconsistent
[ https://issues.apache.org/jira/browse/SPARK-36093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu updated SPARK-36093: -- Summary: (RemoveRedundantAliases should keep output schema name) The result incorrect if the partition path case is inconsistent (was: The result incorrect if the partition path case is inconsistent) > (RemoveRedundantAliases should keep output schema name) The result incorrect > if the partition path case is inconsistent > --- > > Key: SPARK-36093 > URL: https://issues.apache.org/jira/browse/SPARK-36093 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: angerszhu >Priority: Major > Labels: correctness > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > Please reproduce this issue using HDFS. Local HDFS can not reproduce this > issue. > {code:scala} > sql("create table t1(cal_dt date) using parquet") > sql("insert into t1 values > (date'2021-06-27'),(date'2021-06-28'),(date'2021-06-29'),(date'2021-06-30')") > sql("create view t1_v as select * from t1") > sql("CREATE TABLE t2 USING PARQUET PARTITIONED BY (CAL_DT) AS SELECT 1 AS > FLAG,CAL_DT FROM t1_v WHERE CAL_DT BETWEEN '2021-06-27' AND '2021-06-28'") > sql("INSERT INTO t2 SELECT 2 AS FLAG,CAL_DT FROM t1_v WHERE CAL_DT BETWEEN > '2021-06-29' AND '2021-06-30'") > sql("SELECT * FROM t2 WHERE CAL_DT BETWEEN '2021-06-29' AND > '2021-06-30'").show > sql("SELECT * FROM t2 ").show > {code} > {noformat} > // It should not empty. > scala> sql("SELECT * FROM t2 WHERE CAL_DT BETWEEN '2021-06-29' AND > '2021-06-30'").show > ++--+ > |FLAG|CAL_DT| > ++--+ > ++--+ > scala> sql("SELECT * FROM t2 ").show > ++--+ > |FLAG|CAL_DT| > ++--+ > | 1|2021-06-27| > | 1|2021-06-28| > ++--+ > scala> sql("SELECT 2 AS FLAG,CAL_DT FROM t1_v WHERE CAL_DT BETWEEN > '2021-06-29' AND '2021-06-30'").show > ++--+ > |FLAG|CAL_DT| > ++--+ > | 2|2021-06-29| > | 2|2021-06-30| > ++--+ > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36353) RemoveNoopOperators should keep output schema
angerszhu created SPARK-36353: - Summary: RemoveNoopOperators should keep output schema Key: SPARK-36353 URL: https://issues.apache.org/jira/browse/SPARK-36353 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: angerszhu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36352) Spark should check result plan's output schema name
[ https://issues.apache.org/jira/browse/SPARK-36352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390338#comment-17390338 ] angerszhu commented on SPARK-36352: --- RemoveNoopOperators CollapseProject > Spark should check result plan's output schema name > --- > > Key: SPARK-36352 > URL: https://issues.apache.org/jira/browse/SPARK-36352 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36352) Spark should check result plan's output schema name
[ https://issues.apache.org/jira/browse/SPARK-36352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17390329#comment-17390329 ] Apache Spark commented on SPARK-36352: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33583 > Spark should check result plan's output schema name > --- > > Key: SPARK-36352 > URL: https://issues.apache.org/jira/browse/SPARK-36352 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36352) Spark should check result plan's output schema name
[ https://issues.apache.org/jira/browse/SPARK-36352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36352: Assignee: Apache Spark > Spark should check result plan's output schema name > --- > > Key: SPARK-36352 > URL: https://issues.apache.org/jira/browse/SPARK-36352 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org