[jira] [Commented] (SPARK-33600) Group exception messages in execution/datasources/v2
[ https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1720#comment-1720 ] Apache Spark commented on SPARK-33600: -- User 'karenfeng' has created a pull request for this issue: https://github.com/apache/spark/pull/31619 > Group exception messages in execution/datasources/v2 > > > Key: SPARK-33600 > URL: https://issues.apache.org/jira/browse/SPARK-33600 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename || Count || > | AlterTableExec.scala | 1 | > | CreateNamespaceExec.scala| 1 | > | CreateTableExec.scala| 1 | > | DataSourceRDD.scala | 2 | > | DataSourceV2Strategy.scala | 9 | > | DropNamespaceExec.scala | 2 | > | DropTableExec.scala | 1 | > | EmptyPartitionReader.scala | 1 | > | FileDataSourceV2.scala | 1 | > | FilePartitionReader.scala| 2 | > | FilePartitionReaderFactory.scala | 1 | > | ReplaceTableExec.scala | 3 | > | TableCapabilityCheck.scala | 2 | > | V1FallbackWriters.scala | 1 | > | V2SessionCatalog.scala | 14 | > | WriteToDataSourceV2Exec.scala| 10 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc' > || Filename || Count || > | JDBCTableCatalog.scala | 3 | > '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename|| Count || > | DataSourceV2Implicits.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33600) Group exception messages in execution/datasources/v2
[ https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33600: Assignee: Apache Spark > Group exception messages in execution/datasources/v2 > > > Key: SPARK-33600 > URL: https://issues.apache.org/jira/browse/SPARK-33600 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Assignee: Apache Spark >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename || Count || > | AlterTableExec.scala | 1 | > | CreateNamespaceExec.scala| 1 | > | CreateTableExec.scala| 1 | > | DataSourceRDD.scala | 2 | > | DataSourceV2Strategy.scala | 9 | > | DropNamespaceExec.scala | 2 | > | DropTableExec.scala | 1 | > | EmptyPartitionReader.scala | 1 | > | FileDataSourceV2.scala | 1 | > | FilePartitionReader.scala| 2 | > | FilePartitionReaderFactory.scala | 1 | > | ReplaceTableExec.scala | 3 | > | TableCapabilityCheck.scala | 2 | > | V1FallbackWriters.scala | 1 | > | V2SessionCatalog.scala | 14 | > | WriteToDataSourceV2Exec.scala| 10 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc' > || Filename || Count || > | JDBCTableCatalog.scala | 3 | > '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename|| Count || > | DataSourceV2Implicits.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33600) Group exception messages in execution/datasources/v2
[ https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-33600: Assignee: (was: Apache Spark) > Group exception messages in execution/datasources/v2 > > > Key: SPARK-33600 > URL: https://issues.apache.org/jira/browse/SPARK-33600 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename || Count || > | AlterTableExec.scala | 1 | > | CreateNamespaceExec.scala| 1 | > | CreateTableExec.scala| 1 | > | DataSourceRDD.scala | 2 | > | DataSourceV2Strategy.scala | 9 | > | DropNamespaceExec.scala | 2 | > | DropTableExec.scala | 1 | > | EmptyPartitionReader.scala | 1 | > | FileDataSourceV2.scala | 1 | > | FilePartitionReader.scala| 2 | > | FilePartitionReaderFactory.scala | 1 | > | ReplaceTableExec.scala | 3 | > | TableCapabilityCheck.scala | 2 | > | V1FallbackWriters.scala | 1 | > | V2SessionCatalog.scala | 14 | > | WriteToDataSourceV2Exec.scala| 10 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc' > || Filename || Count || > | JDBCTableCatalog.scala | 3 | > '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename|| Count || > | DataSourceV2Implicits.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34504) avoid unnecessary view resolving and remove the `performCheck` flag
Linhong Liu created SPARK-34504: --- Summary: avoid unnecessary view resolving and remove the `performCheck` flag Key: SPARK-34504 URL: https://issues.apache.org/jira/browse/SPARK-34504 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Reporter: Linhong Liu in SPARK-34490, I added a `performCheck` flag to skip analysis check when resolving views. This is due to some view resolution is unnecessary. So we can avoid these unnecessary view resolution and remove the `performCheck` flag. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default
[ https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34503: Assignee: (was: Apache Spark) > Use zstd for spark.eventLog.compression.codec by default > > > Key: SPARK-34503 > URL: https://issues.apache.org/jira/browse/SPARK-34503 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default
[ https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288850#comment-17288850 ] Apache Spark commented on SPARK-34503: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31618 > Use zstd for spark.eventLog.compression.codec by default > > > Key: SPARK-34503 > URL: https://issues.apache.org/jira/browse/SPARK-34503 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default
[ https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288851#comment-17288851 ] Apache Spark commented on SPARK-34503: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31618 > Use zstd for spark.eventLog.compression.codec by default > > > Key: SPARK-34503 > URL: https://issues.apache.org/jira/browse/SPARK-34503 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default
[ https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34503: Assignee: Apache Spark > Use zstd for spark.eventLog.compression.codec by default > > > Key: SPARK-34503 > URL: https://issues.apache.org/jira/browse/SPARK-34503 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default
[ https://issues.apache.org/jira/browse/SPARK-34503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-34503: -- Labels: releasenotes (was: ) > Use zstd for spark.eventLog.compression.codec by default > > > Key: SPARK-34503 > URL: https://issues.apache.org/jira/browse/SPARK-34503 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: releasenotes > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34503) Use zstd for spark.eventLog.compression.codec by default
Dongjoon Hyun created SPARK-34503: - Summary: Use zstd for spark.eventLog.compression.codec by default Key: SPARK-34503 URL: https://issues.apache.org/jira/browse/SPARK-34503 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34502) Remove unused parameters in join methods
[ https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288845#comment-17288845 ] Apache Spark commented on SPARK-34502: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/31617 > Remove unused parameters in join methods > > > Key: SPARK-34502 > URL: https://issues.apache.org/jira/browse/SPARK-34502 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Huaxin Gao >Priority: Trivial > > Remove unused parameters in some join methods -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34502) Remove unused parameters in join methods
[ https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288844#comment-17288844 ] Apache Spark commented on SPARK-34502: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/31617 > Remove unused parameters in join methods > > > Key: SPARK-34502 > URL: https://issues.apache.org/jira/browse/SPARK-34502 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Huaxin Gao >Priority: Trivial > > Remove unused parameters in some join methods -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34502) Remove unused parameters in join methods
[ https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34502: Assignee: (was: Apache Spark) > Remove unused parameters in join methods > > > Key: SPARK-34502 > URL: https://issues.apache.org/jira/browse/SPARK-34502 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Huaxin Gao >Priority: Trivial > > Remove unused parameters in some join methods -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34502) Remove unused parameters in join methods
[ https://issues.apache.org/jira/browse/SPARK-34502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34502: Assignee: Apache Spark > Remove unused parameters in join methods > > > Key: SPARK-34502 > URL: https://issues.apache.org/jira/browse/SPARK-34502 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Trivial > > Remove unused parameters in some join methods -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34502) Remove unused parameters in join methods
Huaxin Gao created SPARK-34502: -- Summary: Remove unused parameters in join methods Key: SPARK-34502 URL: https://issues.apache.org/jira/browse/SPARK-34502 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Huaxin Gao Remove unused parameters in some join methods -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25075) Build and test Spark against Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-25075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288814#comment-17288814 ] Seth Tisue commented on SPARK-25075: Scala 2.13.5 is now available. > Build and test Spark against Scala 2.13 > --- > > Key: SPARK-25075 > URL: https://issues.apache.org/jira/browse/SPARK-25075 > Project: Spark > Issue Type: Umbrella > Components: Build, MLlib, Project Infra, Spark Core, SQL >Affects Versions: 3.0.0 >Reporter: Guillaume Massé >Priority: Major > > This umbrella JIRA tracks the requirements for building and testing Spark > against the current Scala 2.13 milestone. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34501) Support of DELETE/UPDATE operation for Database (MySQL, Postgres etc.) level
Nishant Ranjan created SPARK-34501: -- Summary: Support of DELETE/UPDATE operation for Database (MySQL, Postgres etc.) level Key: SPARK-34501 URL: https://issues.apache.org/jira/browse/SPARK-34501 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.2 Reporter: Nishant Ranjan Spark 3.0+ SQL started supporting Delete operation. But still there is no way to downpush "Delete" operation to database (like MySQL, Postgres etc.) In documents there is a mention of "deleteWhere(..)" but not sure whether it cant be used with JDBC data source. If yes, please help with some example. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288806#comment-17288806 ] Sean R. Owen commented on SPARK-34448: -- So one coarse response is - I'm surprised if the initialization should matter _that_ much? Starting the intercept at this value is kind of like starting it at the mean of the response in linear regression - probably the best a priori guess. That's why I am wondering about convergence (but sounds like it converges?) or what scikit does. Given the small data set, is the answer under-determined in this case? I'd have to actually look at your test case to answer those questions, but that's what I'm thinking here. Maybe you have already thought it through. What's a better initial value of the intercept? > Binary logistic regression incorrectly computes the intercept and > coefficients when data is not centered > > > Key: SPARK-34448 > URL: https://issues.apache.org/jira/browse/SPARK-34448 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.4.5, 3.0.0 >Reporter: Yakov Kerzhner >Priority: Major > Labels: correctness > > I have written up a fairly detailed gist that includes code to reproduce the > bug, as well as the output of the code and some commentary: > [https://gist.github.com/ykerzhner/51358780a6a4cc33266515f17bf98a96] > To summarize: under certain conditions, the minimization that fits a binary > logistic regression contains a bug that pulls the intercept value towards the > log(odds) of the target data. This is mathematically only correct when the > data comes from distributions with zero means. In general, this gives > incorrect intercept values, and consequently incorrect coefficients as well. > As I am not so familiar with the spark code base, I have not been able to > find this bug within the spark code itself. A hint to this bug is here: > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L894-L904] > based on the code, I don't believe that the features have zero means at this > point, and so this heuristic is incorrect. But an incorrect starting point > does not explain this bug. The minimizer should drift to the correct place. > I was not able to find the code of the actual objective function that is > being minimized. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34448) Binary logistic regression incorrectly computes the intercept and coefficients when data is not centered
[ https://issues.apache.org/jira/browse/SPARK-34448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-34448: - Priority: Major (was: Critical) > Binary logistic regression incorrectly computes the intercept and > coefficients when data is not centered > > > Key: SPARK-34448 > URL: https://issues.apache.org/jira/browse/SPARK-34448 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 2.4.5, 3.0.0 >Reporter: Yakov Kerzhner >Priority: Major > Labels: correctness > > I have written up a fairly detailed gist that includes code to reproduce the > bug, as well as the output of the code and some commentary: > [https://gist.github.com/ykerzhner/51358780a6a4cc33266515f17bf98a96] > To summarize: under certain conditions, the minimization that fits a binary > logistic regression contains a bug that pulls the intercept value towards the > log(odds) of the target data. This is mathematically only correct when the > data comes from distributions with zero means. In general, this gives > incorrect intercept values, and consequently incorrect coefficients as well. > As I am not so familiar with the spark code base, I have not been able to > find this bug within the spark code itself. A hint to this bug is here: > [https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala#L894-L904] > based on the code, I don't believe that the features have zero means at this > point, and so this heuristic is incorrect. But an incorrect starting point > does not explain this bug. The minimizer should drift to the correct place. > I was not able to find the code of the actual objective function that is > being minimized. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34499) improve the catalyst expression dsl internal APIs
[ https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34499. - Resolution: Won't Fix > improve the catalyst expression dsl internal APIs > - > > Key: SPARK-34499 > URL: https://issues.apache.org/jira/browse/SPARK-34499 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33863) Pyspark UDF wrongly changes timestamps to UTC
[ https://issues.apache.org/jira/browse/SPARK-33863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288802#comment-17288802 ] Nasir Ali commented on SPARK-33863: --- [~hyukjin.kwon] and [~viirya] any update on this issue? > Pyspark UDF wrongly changes timestamps to UTC > - > > Key: SPARK-33863 > URL: https://issues.apache.org/jira/browse/SPARK-33863 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.0.1 > Environment: MAC/Linux > Standalone cluster / local machine >Reporter: Nasir Ali >Priority: Major > > *Problem*: > I have a dataframe with a ts (timestamp) column in UTC. If I create a new > column using udf, pyspark udf wrongly changes timestamps into UTC time. ts > (timestamp) column is already in UTC time. Therefore, pyspark udf should not > convert ts (timestamp) column into UTC timestamp. > I have used following configs to let spark know the timestamps are in UTC: > > {code:java} > --conf spark.driver.extraJavaOptions=-Duser.timezone=UTC > --conf spark.executor.extraJavaOptions=-Duser.timezone=UTC > --conf spark.sql.session.timeZone=UTC > {code} > Below is a code snippet to reproduce the error: > > {code:java} > from pyspark.sql import SparkSession > from pyspark.sql import functions as F > from pyspark.sql.types import StringType, TimestampType > import datetime > spark = SparkSession.builder.config("spark.sql.session.timeZone", > "UTC").getOrCreate() > df = spark.createDataFrame([("usr1",17.00, "2018-02-10T15:27:18+00:00"), > ("usr1",13.00, "2018-02-11T12:27:18+00:00"), > ("usr1",25.00, "2018-02-12T11:27:18+00:00"), > ("usr1",20.00, "2018-02-13T15:27:18+00:00"), > ("usr1",17.00, "2018-02-14T12:27:18+00:00"), > ("usr2",99.00, "2018-02-15T11:27:18+00:00"), > ("usr2",156.00, "2018-02-22T11:27:18+00:00") > ], >["user","id", "ts"]) > df = df.withColumn('ts', df.ts.cast('timestamp')) > df.show(truncate=False) > def some_time_udf(i): > if datetime.time(5, 0)<=i.time() < datetime.time(12, 0): > tmp= "Morning: " + str(i) > elif datetime.time(12, 0)<=i.time() < datetime.time(17, 0): > tmp= "Afternoon: " + str(i) > elif datetime.time(17, 0)<=i.time() < datetime.time(21, 0): > tmp= "Evening" > elif datetime.time(21, 0)<=i.time() < datetime.time(0, 0): > tmp= "Night" > elif datetime.time(0, 0)<=i.time() < datetime.time(5, 0): > tmp= "Night" > return tmp > udf = F.udf(some_time_udf,StringType()) > df.withColumn("day_part", udf(df.ts)).show(truncate=False) > {code} > > Below is the output of the above code: > {code:java} > ++-+---++ > |user|id |ts |day_part| > ++-+---++ > |usr1|17.0 |2018-02-10 15:27:18|Morning: 2018-02-10 09:27:18| > |usr1|13.0 |2018-02-11 12:27:18|Morning: 2018-02-11 06:27:18| > |usr1|25.0 |2018-02-12 11:27:18|Morning: 2018-02-12 05:27:18| > |usr1|20.0 |2018-02-13 15:27:18|Morning: 2018-02-13 09:27:18| > |usr1|17.0 |2018-02-14 12:27:18|Morning: 2018-02-14 06:27:18| > |usr2|99.0 |2018-02-15 11:27:18|Morning: 2018-02-15 05:27:18| > |usr2|156.0|2018-02-22 11:27:18|Morning: 2018-02-22 05:27:18| > ++-+---++ > {code} > Above output is incorrect. You can see ts and day_part columns don't have > same timestamps. Below is the output I would expect: > > {code:java} > ++-+---++ > |user|id |ts |day_part| > ++-+---++ > |usr1|17.0 |2018-02-10 15:27:18|Afternoon: 2018-02-10 15:27:18| > |usr1|13.0 |2018-02-11 12:27:18|Afternoon: 2018-02-11 12:27:18| > |usr1|25.0 |2018-02-12 11:27:18|Morning: 2018-02-12 11:27:18| > |usr1|20.0 |2018-02-13 15:27:18|Afternoon: 2018-02-13 15:27:18| > |usr1|17.0 |2018-02-14 12:27:18|Afternoon: 2018-02-14 12:27:18| > |usr2|99.0 |2018-02-15 11:27:18|Morning: 2018-02-15 11:27:18| > |usr2|156.0|2018-02-22 11:27:18|Morning: 2018-02-22 11:27:18| > ++-+---++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34500) Replace symbol literals with $"" in examples and documents
[ https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34500. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31615 [https://github.com/apache/spark/pull/31615] > Replace symbol literals with $"" in examples and documents > -- > > Key: SPARK-34500 > URL: https://issues.apache.org/jira/browse/SPARK-34500 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Fix For: 3.2.0 > > > The Scala community seems to deprecate Symbol in the future so let's replace > symbol literals in user facing examples and documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11844) can not read class org.apache.parquet.format.PageHeader: don't know what type: 13
[ https://issues.apache.org/jira/browse/SPARK-11844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288693#comment-17288693 ] Sungwon commented on SPARK-11844: - I'm seeing the same issue on spark-2.4 I'm able to read the parquet files just fine and load them into DataFrames, but using any order related operations leads them to crash (min, max, orderBy) {code:java} org.apache.iceberg.exceptions.RuntimeIOException: java.io.IOException: can not read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: don't know what type: 13 at org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:133) at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:110) at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:69) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:49) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41) at org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:306) at org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:304) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:123) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) {code} And as mentioned above, it is accompanied by 'uncompressed_page_size' was not found in serialized data! error: {code:java} org.apache.iceberg.exceptions.RuntimeIOException: java.io.IOException: can not read class org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader: Required field 'uncompressed_page_size' was not found in serialized data! Struct: org.apache.iceberg.shaded.org.apache.parquet.format.PageHeader$PageHeaderStandardScheme@bb8d7d1 at org.apache.iceberg.parquet.ParquetReader$FileIterator.advance(ParquetReader.java:133) at org.apache.iceberg.parquet.ParquetReader$FileIterator.next(ParquetReader.java:110) at org.apache.iceberg.spark.source.BaseDataReader.next(BaseDataReader.java:69) at org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:49) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41) at org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:306) at org.apache.spark.RangePartitioner$$anonfun$13.apply(Partitioner.scala:304) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:853) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitions
[jira] [Assigned] (SPARK-34500) Replace symbol literals with $"" in examples and documents
[ https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34500: Assignee: Kousuke Saruta (was: Apache Spark) > Replace symbol literals with $"" in examples and documents > -- > > Key: SPARK-34500 > URL: https://issues.apache.org/jira/browse/SPARK-34500 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > The Scala community seems to deprecate Symbol in the future so let's replace > symbol literals in user facing examples and documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34500) Replace symbol literals with $"" in examples and documents
[ https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288615#comment-17288615 ] Apache Spark commented on SPARK-34500: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31615 > Replace symbol literals with $"" in examples and documents > -- > > Key: SPARK-34500 > URL: https://issues.apache.org/jira/browse/SPARK-34500 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > The Scala community seems to deprecate Symbol in the future so let's replace > symbol literals in user facing examples and documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34500) Replace symbol literals with $"" in examples and documents
[ https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288616#comment-17288616 ] Apache Spark commented on SPARK-34500: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/31615 > Replace symbol literals with $"" in examples and documents > -- > > Key: SPARK-34500 > URL: https://issues.apache.org/jira/browse/SPARK-34500 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > The Scala community seems to deprecate Symbol in the future so let's replace > symbol literals in user facing examples and documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34500) Replace symbol literals with $"" in examples and documents
[ https://issues.apache.org/jira/browse/SPARK-34500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34500: Assignee: Apache Spark (was: Kousuke Saruta) > Replace symbol literals with $"" in examples and documents > -- > > Key: SPARK-34500 > URL: https://issues.apache.org/jira/browse/SPARK-34500 > Project: Spark > Issue Type: Improvement > Components: Documentation, Examples >Affects Versions: 3.2.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Minor > > The Scala community seems to deprecate Symbol in the future so let's replace > symbol literals in user facing examples and documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34500) Replace symbol literals with $"" in examples and documents
Kousuke Saruta created SPARK-34500: -- Summary: Replace symbol literals with $"" in examples and documents Key: SPARK-34500 URL: https://issues.apache.org/jira/browse/SPARK-34500 Project: Spark Issue Type: Improvement Components: Documentation, Examples Affects Versions: 3.2.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta The Scala community seems to deprecate Symbol in the future so let's replace symbol literals in user facing examples and documents. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33602) Group exception messages in execution/datasources
[ https://issues.apache.org/jira/browse/SPARK-33602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288591#comment-17288591 ] Allison Wang commented on SPARK-33602: -- [~beliefer] Let's put AnalysisException in QueryCompilationErrors for now. > Group exception messages in execution/datasources > - > > Key: SPARK-33602 > URL: https://issues.apache.org/jira/browse/SPARK-33602 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources' > || Filename|| Count || > | DataSource.scala| 9 | > | DataSourceStrategy.scala| 1 | > | DataSourceUtils.scala | 2 | > | FileFormat.scala| 1 | > | FileFormatWriter.scala | 3 | > | FileScanRDD.scala | 2 | > | InsertIntoHadoopFsRelationCommand.scala | 2 | > | PartitioningAwareFileIndex.scala| 1 | > | PartitioningUtils.scala | 3 | > | RecordReaderIterator.scala | 1 | > | rules.scala | 4 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/binaryfile' > || Filename || Count || > | BinaryFileFormat.scala | 2 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc' > || Filename || Count || > | JDBCOptions.scala | 2 | > | JdbcUtils.scala | 6 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc' > || Filename || Count || > | OrcDeserializer.scala | 1 | > | OrcFilters.scala | 1 | > | OrcSerializer.scala | 1 | > | OrcUtils.scala| 2 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet' > || Filename || Count || > | ParquetFileFormat.scala | 2 | > | ParquetReadSupport.scala | 1 | > | ParquetSchemaConverter.scala | 6 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33600) Group exception messages in execution/datasources/v2
[ https://issues.apache.org/jira/browse/SPARK-33600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288584#comment-17288584 ] Karen Feng commented on SPARK-33600: I'm working on this. > Group exception messages in execution/datasources/v2 > > > Key: SPARK-33600 > URL: https://issues.apache.org/jira/browse/SPARK-33600 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Allison Wang >Priority: Major > > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename || Count || > | AlterTableExec.scala | 1 | > | CreateNamespaceExec.scala| 1 | > | CreateTableExec.scala| 1 | > | DataSourceRDD.scala | 2 | > | DataSourceV2Strategy.scala | 9 | > | DropNamespaceExec.scala | 2 | > | DropTableExec.scala | 1 | > | EmptyPartitionReader.scala | 1 | > | FileDataSourceV2.scala | 1 | > | FilePartitionReader.scala| 2 | > | FilePartitionReaderFactory.scala | 1 | > | ReplaceTableExec.scala | 3 | > | TableCapabilityCheck.scala | 2 | > | V1FallbackWriters.scala | 1 | > | V2SessionCatalog.scala | 14 | > | WriteToDataSourceV2Exec.scala| 10 | > '/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc' > || Filename || Count || > | JDBCTableCatalog.scala | 3 | > '/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2' > || Filename|| Count || > | DataSourceV2Implicits.scala | 3 | -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288527#comment-17288527 ] Apache Spark commented on SPARK-27790: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31614 > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27790: Assignee: Apache Spark > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27790: Assignee: Apache Spark > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288526#comment-17288526 ] Apache Spark commented on SPARK-27790: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31614 > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27790: Assignee: (was: Apache Spark) > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27790) Support SQL INTERVAL types
[ https://issues.apache.org/jira/browse/SPARK-27790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-27790: --- Affects Version/s: (was: 3.1.0) 3.2.0 > Support SQL INTERVAL types > -- > > Key: SPARK-27790 > URL: https://issues.apache.org/jira/browse/SPARK-27790 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > SQL standard defines 2 interval types: > # year-month interval contains a YEAR field or a MONTH field or both > # day-time interval contains DAY, HOUR, MINUTE, and SECOND (possibly fraction > of seconds) > Need to add 2 new internal types YearMonthIntervalType and > DayTimeIntervalType, support operations defined by SQL standard as well as > INTERVAL literals. > The java.time.Period and java.time.Duration can be supported as external type > for YearMonthIntervalType and DayTimeIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34499) improve the catalyst expression dsl internal APIs
[ https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288483#comment-17288483 ] Apache Spark commented on SPARK-34499: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/31612 > improve the catalyst expression dsl internal APIs > - > > Key: SPARK-34499 > URL: https://issues.apache.org/jira/browse/SPARK-34499 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34499) improve the catalyst expression dsl internal APIs
[ https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34499: Assignee: Apache Spark > improve the catalyst expression dsl internal APIs > - > > Key: SPARK-34499 > URL: https://issues.apache.org/jira/browse/SPARK-34499 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34499) improve the catalyst expression dsl internal APIs
[ https://issues.apache.org/jira/browse/SPARK-34499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34499: Assignee: (was: Apache Spark) > improve the catalyst expression dsl internal APIs > - > > Key: SPARK-34499 > URL: https://issues.apache.org/jira/browse/SPARK-34499 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34476: Assignee: (was: Apache Spark) > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34476: Assignee: Apache Spark > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Assignee: Apache Spark >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34476) Duplicate referenceNames are given for ambiguousReferences
[ https://issues.apache.org/jira/browse/SPARK-34476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288482#comment-17288482 ] Apache Spark commented on SPARK-34476: -- User 'tedyu' has created a pull request for this issue: https://github.com/apache/spark/pull/31613 > Duplicate referenceNames are given for ambiguousReferences > -- > > Key: SPARK-34476 > URL: https://issues.apache.org/jira/browse/SPARK-34476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Ted Yu >Priority: Major > > When running test with Spark extension that converts custom function to json > path expression, I saw the following in test output: > {code} > 2021-02-19 21:57:24,550 (Time-limited test) [INFO - > org.yb.loadtest.TestSpark3Jsonb.testJsonb(TestSpark3Jsonb.java:102)] plan is > == Physical Plan == > org.apache.spark.sql.AnalysisException: Reference > 'phone->'key'->1->'m'->2->>'b'' is ambiguous, could be: > mycatalog.test.person.phone->'key'->1->'m'->2->>'b', > mycatalog.test.person.phone->'key'->1->'m'->2->>'b'.; line 1 pos 8 > {code} > Please note the candidates following 'could be' are the same. > Here is the physical plan for a working query where phone is a jsonb column: > {code} > TakeOrderedAndProject(limit=2, orderBy=[id#6 ASC NULLS FIRST], > output=[id#6,address#7,key#0]) > +- *(1) Project [id#6, address#7, phone->'key'->1->'m'->2->'b'#12 AS key#0] >+- BatchScan[id#6, address#7, phone->'key'->1->'m'->2->'b'#12] Cassandra > Scan: test.person > - Cassandra Filters: [[phone->'key'->1->'m'->2->>'b' >= ?, 100]] > - Requested Columns: [id,address,phone->'key'->1->'m'->2->'b'] > {code} > The difference for the failed query is that it tries to use > {code}phone->'key'->1->'m'->2->>'b'{code} in the projection (which works as > part of filter). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34499) improve the catalyst expression dsl internal APIs
Wenchen Fan created SPARK-34499: --- Summary: improve the catalyst expression dsl internal APIs Key: SPARK-34499 URL: https://issues.apache.org/jira/browse/SPARK-34499 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288423#comment-17288423 ] Sean R. Owen commented on SPARK-7768: - To be clear I just opened it. [~viirya] did much more actual work to improve the API - and it is possible it still changes later. Use at your own risk. But this keeps the status quo for Java 9+ > Make user-defined type (UDT) API public > --- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Xiangrui Meng >Assignee: Sean R. Owen >Priority: Critical > Fix For: 3.2.0 > > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288388#comment-17288388 ] Simeon H.K. Fitch commented on SPARK-7768: -- [~srowen]Thanks so much for for all your excellent work on this! > Make user-defined type (UDT) API public > --- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL >Reporter: Xiangrui Meng >Assignee: Sean R. Owen >Priority: Critical > Fix For: 3.2.0 > > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34497) JDBC connection provider is not removing kerberos credentials from JVM security context
[ https://issues.apache.org/jira/browse/SPARK-34497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288387#comment-17288387 ] Gabor Somogyi commented on SPARK-34497: --- Working on this. > JDBC connection provider is not removing kerberos credentials from JVM > security context > --- > > Key: SPARK-34497 > URL: https://issues.apache.org/jira/browse/SPARK-34497 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.2, 3.1.0 >Reporter: Gabor Somogyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34498) fix the remaining problems in SPARK-34432
[ https://issues.apache.org/jira/browse/SPARK-34498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288386#comment-17288386 ] Kevin Pis commented on SPARK-34498: --- I will fix the remaining problems. > fix the remaining problems in SPARK-34432 > - > > Key: SPARK-34498 > URL: https://issues.apache.org/jira/browse/SPARK-34498 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.1 >Reporter: Kevin Pis >Priority: Minor > > the remaining problems : > 1. we don't need to implement SessionConfigSupport in simple writable table > data source tests. remove it from both the Scala and Java versions. > 2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34498) fix the remaining problems in SPARK-34432
[ https://issues.apache.org/jira/browse/SPARK-34498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Pis updated SPARK-34498: -- Description: the remaining problems : 1. we don't need to implement SessionConfigSupport in simple writable table data source tests. remove it from both the Scala and Java versions. 2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source` was: the remaining problems : 1. don't > fix the remaining problems in SPARK-34432 > - > > Key: SPARK-34498 > URL: https://issues.apache.org/jira/browse/SPARK-34498 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.1 >Reporter: Kevin Pis >Priority: Minor > > the remaining problems : > 1. we don't need to implement SessionConfigSupport in simple writable table > data source tests. remove it from both the Scala and Java versions. > 2. change the schema of `SimpleWritableDataSource`, to match `TestingV2Source` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34498) fix the remaining problems in SPARK-34432
[ https://issues.apache.org/jira/browse/SPARK-34498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Pis updated SPARK-34498: -- Summary: fix the remaining problems in SPARK-34432 (was: don't implement SessionConfigSupport in simple writable table data source) > fix the remaining problems in SPARK-34432 > - > > Key: SPARK-34498 > URL: https://issues.apache.org/jira/browse/SPARK-34498 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.0.1 >Reporter: Kevin Pis >Priority: Minor > > the remaining problems : > 1. don't -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34498) don't implement SessionConfigSupport in simple writable table data source
Kevin Pis created SPARK-34498: - Summary: don't implement SessionConfigSupport in simple writable table data source Key: SPARK-34498 URL: https://issues.apache.org/jira/browse/SPARK-34498 Project: Spark Issue Type: Sub-task Components: SQL, Tests Affects Versions: 3.0.1 Reporter: Kevin Pis the remaining problems : 1. don't -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34497) JDBC connection provider is not removing kerberos credentials from JVM security context
Gabor Somogyi created SPARK-34497: - Summary: JDBC connection provider is not removing kerberos credentials from JVM security context Key: SPARK-34497 URL: https://issues.apache.org/jira/browse/SPARK-34497 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.2, 3.1.0 Reporter: Gabor Somogyi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34496) Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility
[ https://issues.apache.org/jira/browse/SPARK-34496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34496: Assignee: Dongjoon Hyun > Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility > > > Key: SPARK-34496 > URL: https://issues.apache.org/jira/browse/SPARK-34496 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34496) Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility
[ https://issues.apache.org/jira/browse/SPARK-34496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-34496. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31609 [https://github.com/apache/spark/pull/31609] > Upgrade ZSTD-JNI to 1.4.8-5 to API compatibility > > > Key: SPARK-34496 > URL: https://issues.apache.org/jira/browse/SPARK-34496 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)
[ https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34473: --- Assignee: Wenchen Fan > avoid NPE in DataFrameReader.schema(StructType) > --- > > Key: SPARK-34473 > URL: https://issues.apache.org/jira/browse/SPARK-34473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34473) avoid NPE in DataFrameReader.schema(StructType)
[ https://issues.apache.org/jira/browse/SPARK-34473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34473. - Fix Version/s: 3.1.2 3.2.0 Resolution: Fixed Issue resolved by pull request 31593 [https://github.com/apache/spark/pull/31593] > avoid NPE in DataFrameReader.schema(StructType) > --- > > Key: SPARK-34473 > URL: https://issues.apache.org/jira/browse/SPARK-34473 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.2.0, 3.1.2 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34488) Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage
[ https://issues.apache.org/jira/browse/SPARK-34488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34488: Assignee: (was: Apache Spark) > Support task Metrics Distributions and executor Metrics Distributions in the > REST API call for a specified stage > > > Key: SPARK-34488 > URL: https://issues.apache.org/jira/browse/SPARK-34488 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Ron Hu >Priority: Major > Attachments: executorMetricsDistributions.json, > taskMetricsDistributions.json > > > For a specific stage, it is useful to show the task metrics in percentile > distribution. This information can help users know whether or not there is a > skew/bottleneck among tasks in a given stage. We list an example in > [^taskMetricsDistributions.json] > Similarly, it is useful to show the executor metrics in percentile > distribution for a specific stage. This information can show whether or not > there is a skewed load on some executors. We list an example in > [^executorMetricsDistributions.json] > > We define withSummaries query parameter in the REST API for a specific stage > as: > applications///?withSummaries=[true|false] > When withSummaries=true, both task metrics in percentile distribution and > executor metrics in percentile distribution are included in the REST API > output. The default value of withSummaries is false, i.e. no metrics > percentile distribution will be included in the REST API output. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34488) Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage
[ https://issues.apache.org/jira/browse/SPARK-34488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288312#comment-17288312 ] Apache Spark commented on SPARK-34488: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/31611 > Support task Metrics Distributions and executor Metrics Distributions in the > REST API call for a specified stage > > > Key: SPARK-34488 > URL: https://issues.apache.org/jira/browse/SPARK-34488 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Ron Hu >Priority: Major > Attachments: executorMetricsDistributions.json, > taskMetricsDistributions.json > > > For a specific stage, it is useful to show the task metrics in percentile > distribution. This information can help users know whether or not there is a > skew/bottleneck among tasks in a given stage. We list an example in > [^taskMetricsDistributions.json] > Similarly, it is useful to show the executor metrics in percentile > distribution for a specific stage. This information can show whether or not > there is a skewed load on some executors. We list an example in > [^executorMetricsDistributions.json] > > We define withSummaries query parameter in the REST API for a specific stage > as: > applications///?withSummaries=[true|false] > When withSummaries=true, both task metrics in percentile distribution and > executor metrics in percentile distribution are included in the REST API > output. The default value of withSummaries is false, i.e. no metrics > percentile distribution will be included in the REST API output. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34488) Support task Metrics Distributions and executor Metrics Distributions in the REST API call for a specified stage
[ https://issues.apache.org/jira/browse/SPARK-34488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34488: Assignee: Apache Spark > Support task Metrics Distributions and executor Metrics Distributions in the > REST API call for a specified stage > > > Key: SPARK-34488 > URL: https://issues.apache.org/jira/browse/SPARK-34488 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Ron Hu >Assignee: Apache Spark >Priority: Major > Attachments: executorMetricsDistributions.json, > taskMetricsDistributions.json > > > For a specific stage, it is useful to show the task metrics in percentile > distribution. This information can help users know whether or not there is a > skew/bottleneck among tasks in a given stage. We list an example in > [^taskMetricsDistributions.json] > Similarly, it is useful to show the executor metrics in percentile > distribution for a specific stage. This information can show whether or not > there is a skewed load on some executors. We list an example in > [^executorMetricsDistributions.json] > > We define withSummaries query parameter in the REST API for a specific stage > as: > applications///?withSummaries=[true|false] > When withSummaries=true, both task metrics in percentile distribution and > executor metrics in percentile distribution are included in the REST API > output. The default value of withSummaries is false, i.e. no metrics > percentile distribution will be included in the REST API output. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27500) Add tests for built-in Hive 2.3
[ https://issues.apache.org/jira/browse/SPARK-27500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-27500. - Fix Version/s: 3.0.0 Resolution: Fixed > Add tests for built-in Hive 2.3 > --- > > Key: SPARK-27500 > URL: https://issues.apache.org/jira/browse/SPARK-27500 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > Our Spark will use some of the new features and bug fixes of Hive 2.3, and we > should add tests for these. This is an umbrella JIRA for tracking this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27500) Add tests for built-in Hive 2.3
[ https://issues.apache.org/jira/browse/SPARK-27500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-27500: --- Assignee: Yuming Wang > Add tests for built-in Hive 2.3 > --- > > Key: SPARK-27500 > URL: https://issues.apache.org/jira/browse/SPARK-27500 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Our Spark will use some of the new features and bug fixes of Hive 2.3, and we > should add tests for these. This is an umbrella JIRA for tracking this. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34432) add a java implementation for the simple writable data source
[ https://issues.apache.org/jira/browse/SPARK-34432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34432. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31560 [https://github.com/apache/spark/pull/31560] > add a java implementation for the simple writable data source > - > > Key: SPARK-34432 > URL: https://issues.apache.org/jira/browse/SPARK-34432 > Project: Spark > Issue Type: Test > Components: SQL, Tests >Affects Versions: 3.0.1 >Reporter: Kevin Pis >Priority: Minor > Fix For: 3.2.0 > > > This is a followup of https://github.com/apache/spark/pull/19269 > In #19269 , there is only a scala implementation of simple writable data > source in `DataSourceV2Suite`. > This PR adds a java implementation of it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34450) Unify v1 and v2 ALTER TABLE .. RENAME tests
[ https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34450: --- Assignee: Maxim Gekk > Unify v1 and v2 ALTER TABLE .. RENAME tests > --- > > Key: SPARK-34450 > URL: https://issues.apache.org/jira/browse/SPARK-34450 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 > and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34450) Unify v1 and v2 ALTER TABLE .. RENAME tests
[ https://issues.apache.org/jira/browse/SPARK-34450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34450. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31575 [https://github.com/apache/spark/pull/31575] > Unify v1 and v2 ALTER TABLE .. RENAME tests > --- > > Key: SPARK-34450 > URL: https://issues.apache.org/jira/browse/SPARK-34450 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.2.0 > > > Extract ALTER TABLE .. RENAME tests to the common place to run them for V1 > and v2 datasources. Some tests can be places to V1 and V2 specific test > suites. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org