[jira] [Comment Edited] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267059#comment-17267059 ] Peter Toth edited comment on SPARK-29890 at 1/18/21, 7:57 AM: -- [~imback82], yes, that's a good example. `fill` didn't throw any exception before this ticket. was (Author: petertoth): [~imback82], yes, that's a good example. `fill` didn't throw any exception before this PR. > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3 >Reporter: sandeshyapuram >Assignee: Terry Kim >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267059#comment-17267059 ] Peter Toth commented on SPARK-29890: [~imback82], yes, that's a good example. `fill` didn't throw any exception before this PR. > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3 >Reporter: sandeshyapuram >Assignee: Terry Kim >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34150) Strip Null literal.sql in resolve alias
ulysses you created SPARK-34150: --- Summary: Strip Null literal.sql in resolve alias Key: SPARK-34150 URL: https://issues.apache.org/jira/browse/SPARK-34150 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: ulysses you We will convert `Literal(null)` to target data type during analysis. Then the generated alias name will include something like `CAST(NULL AS INT)` instead of `NULL`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache
[ https://issues.apache.org/jira/browse/SPARK-34149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-34149: --- Description: For example, the test below: {code:scala} test("SPARK-X: refresh cache in partition adding") { withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)") sql(s"ALTER TABLE $t ADD PARTITION (part=0)") assert(!spark.catalog.isCached(t)) sql(s"CACHE TABLE $t") assert(spark.catalog.isCached(t)) checkAnswer(sql(s"SELECT * FROM $t"), Row(0)) sql(s"ALTER TABLE $t ADD PARTITION (part=1)") assert(spark.catalog.isCached(t)) checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1))) } } {code} fails with; {code} !== Correct Answer - 2 == == Spark Answer - 1 == !struct<> struct [0][0] ![1] ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at (QueryTest.scala:243) {code} because the command doesn't refresh the cache. > DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache > - > > Key: SPARK-34149 > URL: https://issues.apache.org/jira/browse/SPARK-34149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > For example, the test below: > {code:scala} > test("SPARK-X: refresh cache in partition adding") { > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (part int) $defaultUsing PARTITIONED BY (part)") > sql(s"ALTER TABLE $t ADD PARTITION (part=0)") > assert(!spark.catalog.isCached(t)) > sql(s"CACHE TABLE $t") > assert(spark.catalog.isCached(t)) > checkAnswer(sql(s"SELECT * FROM $t"), Row(0)) > sql(s"ALTER TABLE $t ADD PARTITION (part=1)") > assert(spark.catalog.isCached(t)) > checkAnswer(sql(s"SELECT * FROM $t"), Seq(Row(0), Row(1))) > } > } > {code} > fails with; > {code} > !== Correct Answer - 2 == == Spark Answer - 1 == > !struct<> struct > [0][0] > ![1] > > > ScalaTestFailureLocation: org.apache.spark.sql.QueryTest$ at > (QueryTest.scala:243) > {code} > because the command doesn't refresh the cache. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34149) DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache
Maxim Gekk created SPARK-34149: -- Summary: DSv2: `ALTER TABLE .. ADD PARTITION` does not refresh table cache Key: SPARK-34149 URL: https://issues.apache.org/jira/browse/SPARK-34149 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34064) Broadcast job is not aborted even the SQL statement canceled
[ https://issues.apache.org/jira/browse/SPARK-34064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267055#comment-17267055 ] Apache Spark commented on SPARK-34064: -- User 'LantaoJin' has created a pull request for this issue: https://github.com/apache/spark/pull/31227 > Broadcast job is not aborted even the SQL statement canceled > > > Key: SPARK-34064 > URL: https://issues.apache.org/jira/browse/SPARK-34064 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.2.0, 3.1.1 >Reporter: Lantao Jin >Priority: Minor > Attachments: Screen Shot 2021-01-11 at 12.03.13 PM.png > > > SPARK-27036 introduced a runId for BroadcastExchangeExec to resolve the > problem that a broadcast job is not aborted when broadcast timeout happens. > Since the runId is a random UUID, when a SQL statement is cancelled, these > broadcast sub-jobs still not canceled as a whole. > !Screen Shot 2021-01-11 at 12.03.13 PM.png|width=100%! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33354) New explicit cast syntax rules in ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33354: - Fix Version/s: (was: 3.1.1) 3.1.0 > New explicit cast syntax rules in ANSI mode > --- > > Key: SPARK-33354 > URL: https://issues.apache.org/jira/browse/SPARK-33354 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.1.0 > > > In section 6.13 of the ANSI SQL standard, there are syntax rules for valid > combinations of the source and target data types. > To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the > following casting in ANSI mode: > {code:java} > TimeStamp <=> Boolean > Date <=> Boolean > Numeric <=> Timestamp > Numeric <=> Date > Numeric <=> Binary > String <=> Array > String <=> Map > String <=> Struct > {code} > The following castings are considered invalid in ANSI SQL standard, but they > are quite straight forward. Let's Allow them for now > {code:java} > Numeric <=> Boolean > String <=> Boolean > String <=> Binary > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33354) New explicit cast syntax rules in ANSI mode
[ https://issues.apache.org/jira/browse/SPARK-33354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33354: - Fix Version/s: (was: 3.1.0) 3.1.1 > New explicit cast syntax rules in ANSI mode > --- > > Key: SPARK-33354 > URL: https://issues.apache.org/jira/browse/SPARK-33354 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.1.1 > > > In section 6.13 of the ANSI SQL standard, there are syntax rules for valid > combinations of the source and target data types. > To make Spark's ANSI mode more ANSI SQL Compatible,I propose to disallow the > following casting in ANSI mode: > {code:java} > TimeStamp <=> Boolean > Date <=> Boolean > Numeric <=> Timestamp > Numeric <=> Date > Numeric <=> Binary > String <=> Array > String <=> Map > String <=> Struct > {code} > The following castings are considered invalid in ANSI SQL standard, but they > are quite straight forward. Let's Allow them for now > {code:java} > Numeric <=> Boolean > String <=> Boolean > String <=> Binary > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
[ https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33819: - Fix Version/s: (was: 3.1.0) 3.1.1 > SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be > `package private` > --- > > Key: SPARK-33819 > URL: https://issues.apache.org/jira/browse/SPARK-33819 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > Fix For: 3.0.2, 3.2.0, 3.1.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34069) Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL
[ https://issues.apache.org/jira/browse/SPARK-34069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34069: - Fix Version/s: (was: 3.1.0) 3.1.1 > Kill barrier tasks should respect SPARK_JOB_INTERRUPT_ON_CANCEL > --- > > Key: SPARK-34069 > URL: https://issues.apache.org/jira/browse/SPARK-34069 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: ulysses you >Assignee: ulysses you >Priority: Major > Fix For: 3.1.1 > > > We should interrupt task thread if user set local property > `SPARK_JOB_INTERRUPT_ON_CANCEL` to true. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34103) Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x
[ https://issues.apache.org/jira/browse/SPARK-34103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34103: - Fix Version/s: (was: 3.1.0) 3.1.1 > Fix MiMaExcludes by moving SPARK-23429 from 2.4.x to 3.0.x > -- > > Key: SPARK-34103 > URL: https://issues.apache.org/jira/browse/SPARK-34103 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.0.0, 3.0.1, 3.1.0, 3.2.0, 3.1.1 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.2, 3.1.1 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34041) Miscellaneous cleanup for new PySpark documentation
[ https://issues.apache.org/jira/browse/SPARK-34041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34041: - Fix Version/s: (was: 3.1.0) 3.1.1 > Miscellaneous cleanup for new PySpark documentation > --- > > Key: SPARK-34041 > URL: https://issues.apache.org/jira/browse/SPARK-34041 > Project: Spark > Issue Type: Sub-task > Components: docs >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.1.1 > > > 1. Add a link of quick start in PySpark docs into "Programming Guides" in > Spark main docs > 2. ML MLlib -> MLlib (DataFrame-based)" and "MLlib (RDD-based)" > 3. Mention MLlib user guide > (https://dist.apache.org/repos/dist/dev/spark/v3.1.0-rc1-docs/_site/ml-guide.html) > 4. Mention other migration guides as well because PySpark can get affected by > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33100) Support parse the sql statements with c-style comments
[ https://issues.apache.org/jira/browse/SPARK-33100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33100: - Fix Version/s: (was: 3.1.0) 3.1.1 > Support parse the sql statements with c-style comments > -- > > Key: SPARK-33100 > URL: https://issues.apache.org/jira/browse/SPARK-33100 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 >Reporter: feiwang >Assignee: feiwang >Priority: Minor > Fix For: 3.0.2, 3.2.0, 3.1.1 > > > Now the spark-sql does not support parse the sql statements with C-style > comments. > For the sql statements: > {code:java} > /* SELECT 'test'; */ > SELECT 'test'; > {code} > Would be split to two statements: > The first: "/* SELECT 'test'" > The second: "*/ SELECT 'test'" > Then it would throw an exception because the first one is illegal. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33796) Show hidden text from the left menu of Spark Doc
[ https://issues.apache.org/jira/browse/SPARK-33796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-33796: - Fix Version/s: (was: 3.1.0) 3.1.1 > Show hidden text from the left menu of Spark Doc > > > Key: SPARK-33796 > URL: https://issues.apache.org/jira/browse/SPARK-33796 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.0.0, 3.0.1, 3.1.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.1.1 > > > If the text in the left menu of Spark is too long, it will be hidden. We > should fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30681) Add higher order functions API to PySpark
[ https://issues.apache.org/jira/browse/SPARK-30681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30681: - Fix Version/s: (was: 3.1.1) 3.1.0 > Add higher order functions API to PySpark > - > > Key: SPARK-30681 > URL: https://issues.apache.org/jira/browse/SPARK-30681 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > As of 3.0.0 higher order functions are available in SQL and Scala, but not in > PySpark, forcing Python users to invoke these through {{expr}}, > {{selectExpr}} or {{sql}}. > This is error prone and not well documented. Spark should provide > {{pyspark.sql}} wrappers that accept plain Python functions (of course within > limits of {{(*Column) -> Column}}) as arguments. > {code:python} > df.select(transform("values", lambda c: trim(upper(c))) > def increment_values(k: Column, v: Column) -> Column: > return v + 1 > df.select(transform_values("data"), increment_values) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30681) Add higher order functions API to PySpark
[ https://issues.apache.org/jira/browse/SPARK-30681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30681: - Fix Version/s: (was: 3.1.0) 3.1.1 > Add higher order functions API to PySpark > - > > Key: SPARK-30681 > URL: https://issues.apache.org/jira/browse/SPARK-30681 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.0.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.1 > > > As of 3.0.0 higher order functions are available in SQL and Scala, but not in > PySpark, forcing Python users to invoke these through {{expr}}, > {{selectExpr}} or {{sql}}. > This is error prone and not well documented. Spark should provide > {{pyspark.sql}} wrappers that accept plain Python functions (of course within > limits of {{(*Column) -> Column}}) as arguments. > {code:python} > df.select(transform("values", lambda c: trim(upper(c))) > def increment_values(k: Column, v: Column) -> Column: > return v + 1 > df.select(transform_values("data"), increment_values) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34021) Fix hyper links in SparkR documentation for CRAN submission
[ https://issues.apache.org/jira/browse/SPARK-34021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34021: - Fix Version/s: (was: 3.1.0) 3.1.1 > Fix hyper links in SparkR documentation for CRAN submission > --- > > Key: SPARK-34021 > URL: https://issues.apache.org/jira/browse/SPARK-34021 > Project: Spark > Issue Type: Task > Components: SparkR >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Blocker > Fix For: 3.1.1 > > > CRAN submission fails due to: > {code} >Found the following (possibly) invalid URLs: > URL: http://jsonlines.org/ (moved to https://jsonlines.org/) >From: man/read.json.Rd > man/write.json.Rd >Status: 200 >Message: OK > URL: https://dl.acm.org/citation.cfm?id=1608614 (moved to > https://dl.acm.org/doi/10.1109/MC.2009.263) >From: inst/doc/sparkr-vignettes.html >Status: 200 >Message: OK > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34139) UnresolvedRelation should retain SQL text position
[ https://issues.apache.org/jira/browse/SPARK-34139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-34139: --- Assignee: Terry Kim > UnresolvedRelation should retain SQL text position > -- > > Key: SPARK-34139 > URL: https://issues.apache.org/jira/browse/SPARK-34139 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > > UnresolvedRelation should retain SQL text position. The following commands > will be handled: > {code:java} > CACHE TABLE unknown > UNCACHE TABLE unknown > DELETE FROM unknown > UPDATE unknown SET name='abc' > MERGE INTO unknown1 AS target USING unknown2 AS source ON target.col = > source.col WHEN MATCHED THEN DELETE > INSERT INTO TABLE unknown SELECT 1 > INSERT OVERWRITE TABLE unknown VALUES (1, 'a') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34139) UnresolvedRelation should retain SQL text position
[ https://issues.apache.org/jira/browse/SPARK-34139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-34139. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31209 [https://github.com/apache/spark/pull/31209] > UnresolvedRelation should retain SQL text position > -- > > Key: SPARK-34139 > URL: https://issues.apache.org/jira/browse/SPARK-34139 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.2.0 > > > UnresolvedRelation should retain SQL text position. The following commands > will be handled: > {code:java} > CACHE TABLE unknown > UNCACHE TABLE unknown > DELETE FROM unknown > UPDATE unknown SET name='abc' > MERGE INTO unknown1 AS target USING unknown2 AS source ON target.col = > source.col WHEN MATCHED THEN DELETE > INSERT INTO TABLE unknown SELECT 1 > INSERT OVERWRITE TABLE unknown VALUES (1, 'a') > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-33696) Upgrade built-in Hive to 2.3.8
[ https://issues.apache.org/jira/browse/SPARK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-33696: - Assignee: Yuming Wang > Upgrade built-in Hive to 2.3.8 > -- > > Key: SPARK-33696 > URL: https://issues.apache.org/jira/browse/SPARK-33696 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Hive 2.3.8 changes: > HIVE-19662: Upgrade Avro to 1.8.2 > HIVE-24324: Remove deprecated API usage from Avro > HIVE-23980: Shade Guava from hive-exec in Hive 2.3 > HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue > HIVE-24512: Exclude calcite in packaging. > HIVE-22708: Fix for HttpTransport to replace String.equals -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33696) Upgrade built-in Hive to 2.3.8
[ https://issues.apache.org/jira/browse/SPARK-33696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-33696. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30657 [https://github.com/apache/spark/pull/30657] > Upgrade built-in Hive to 2.3.8 > -- > > Key: SPARK-33696 > URL: https://issues.apache.org/jira/browse/SPARK-33696 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.2.0 > > > Hive 2.3.8 changes: > HIVE-19662: Upgrade Avro to 1.8.2 > HIVE-24324: Remove deprecated API usage from Avro > HIVE-23980: Shade Guava from hive-exec in Hive 2.3 > HIVE-24436: Fix Avro NULL_DEFAULT_VALUE compatibility issue > HIVE-24512: Exclude calcite in packaging. > HIVE-22708: Fix for HttpTransport to replace String.equals -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33913) Upgrade Kafka to 2.7.0
[ https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267028#comment-17267028 ] Apache Spark commented on SPARK-33913: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31223 > Upgrade Kafka to 2.7.0 > -- > > Key: SPARK-33913 > URL: https://issues.apache.org/jira/browse/SPARK-33913 > Project: Spark > Issue Type: Improvement > Components: Build, DStreams >Affects Versions: 3.2.0 >Reporter: dengziming >Priority: Major > > > The Apache Kafka community has released for Apache Kafka 2.7.0, some features > are useful for example the KAFKA-9893 > configurable TCP connection timeout, more details : > https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33913) Upgrade Kafka to 2.7.0
[ https://issues.apache.org/jira/browse/SPARK-33913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17267027#comment-17267027 ] Apache Spark commented on SPARK-33913: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31223 > Upgrade Kafka to 2.7.0 > -- > > Key: SPARK-33913 > URL: https://issues.apache.org/jira/browse/SPARK-33913 > Project: Spark > Issue Type: Improvement > Components: Build, DStreams >Affects Versions: 3.2.0 >Reporter: dengziming >Priority: Major > > > The Apache Kafka community has released for Apache Kafka 2.7.0, some features > are useful for example the KAFKA-9893 > configurable TCP connection timeout, more details : > https://downloads.apache.org/kafka/2.7.0/RELEASE_NOTES.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30682) Add higher order functions API to SparkR
[ https://issues.apache.org/jira/browse/SPARK-30682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266998#comment-17266998 ] Apache Spark commented on SPARK-30682: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31226 > Add higher order functions API to SparkR > > > Key: SPARK-30682 > URL: https://issues.apache.org/jira/browse/SPARK-30682 > Project: Spark > Issue Type: Improvement > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > As of 3.0.0 higher order functions are available in SQL and Scala, but not in > SparkR forcing R users to invoke these through {{expr}}, {{selectExpr}} or > {{sql}}. > It would be great if Spark provided high level wrappers that accept plain R > functions operating on SQL expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30682) Add higher order functions API to SparkR
[ https://issues.apache.org/jira/browse/SPARK-30682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266997#comment-17266997 ] Apache Spark commented on SPARK-30682: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/31226 > Add higher order functions API to SparkR > > > Key: SPARK-30682 > URL: https://issues.apache.org/jira/browse/SPARK-30682 > Project: Spark > Issue Type: Improvement > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.1.0 > > > As of 3.0.0 higher order functions are available in SQL and Scala, but not in > SparkR forcing R users to invoke these through {{expr}}, {{selectExpr}} or > {{sql}}. > It would be great if Spark provided high level wrappers that accept plain R > functions operating on SQL expressions. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
[ https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266989#comment-17266989 ] Apache Spark commented on SPARK-33819: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/31225 > SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be > `package private` > --- > > Key: SPARK-33819 > URL: https://issues.apache.org/jira/browse/SPARK-33819 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > Fix For: 3.0.2, 3.1.0, 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
[ https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266988#comment-17266988 ] Apache Spark commented on SPARK-33819: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/31224 > SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be > `package private` > --- > > Key: SPARK-33819 > URL: https://issues.apache.org/jira/browse/SPARK-33819 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > Fix For: 3.0.2, 3.1.0, 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33819) SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be `package private`
[ https://issues.apache.org/jira/browse/SPARK-33819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266987#comment-17266987 ] Apache Spark commented on SPARK-33819: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/31224 > SingleFileEventLogFileReader/RollingEventLogFilesFileReader should be > `package private` > --- > > Key: SPARK-33819 > URL: https://issues.apache.org/jira/browse/SPARK-33819 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2, 3.1.0, 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: releasenotes > Fix For: 3.0.2, 3.1.0, 3.2.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31168) Upgrade Scala to 2.12.13
[ https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31168: -- Parent: SPARK-33772 Issue Type: Sub-task (was: Improvement) > Upgrade Scala to 2.12.13 > > > Key: SPARK-31168 > URL: https://issues.apache.org/jira/browse/SPARK-31168 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > h2. Highlights > * Performance improvements in the collections library: algorithmic > improvements and changes to avoid unnecessary allocations ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance]) > * Performance improvements in the compiler ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+], > minor [effects in our > benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@]) > * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL > encoding that avoids deadlocks (details on > [#8712|https://github.com/scala/scala/pull/8712]) > * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in > the REPL, which can lead to deteriorating performance in long sessions > ([#8576|https://github.com/scala/scala/pull/8576]) > * Fix some {{toX}} methods that could expose the underlying mutability of a > {{ListBuffer}}-generated collection > ([#8674|https://github.com/scala/scala/pull/8674]) > h3. JDK 9+ support > * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ > ([#8676|https://github.com/scala/scala/pull/8676]) > * {{:javap}} in the REPL now works on JDK 9+ > ([#8400|https://github.com/scala/scala/pull/8400]) > h3. Other changes > * Support new labels for creating durations for consistency: > {{Duration("1m")}}, {{Duration("3 hrs")}} > ([#8325|https://github.com/scala/scala/pull/8325], > [#8450|https://github.com/scala/scala/pull/8450]) > * Fix memory leak in runtime reflection's {{TypeTag}} caches > ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety > issues in runtime reflection > ([#8433|https://github.com/scala/scala/pull/8433]) > * When using compiler plugins, the ordering of compiler phases may change > due to [#8427|https://github.com/scala/scala/pull/8427] > For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266970#comment-17266970 ] Terry Kim commented on SPARK-29890: --- [~petertoth] Could you share the example and the behavior change? Are you referring to something like the following: {code:java} scala> Seq(1).toDF("i").na.fill(0, Seq("j")) org.apache.spark.sql.AnalysisException: Cannot resolve column name "j" among (i) {code} , which seems fine to me. > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.3, 2.4.3 >Reporter: sandeshyapuram >Assignee: Terry Kim >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13
[ https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266969#comment-17266969 ] Apache Spark commented on SPARK-31168: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31223 > Upgrade Scala to 2.12.13 > > > Key: SPARK-31168 > URL: https://issues.apache.org/jira/browse/SPARK-31168 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > h2. Highlights > * Performance improvements in the collections library: algorithmic > improvements and changes to avoid unnecessary allocations ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance]) > * Performance improvements in the compiler ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+], > minor [effects in our > benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@]) > * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL > encoding that avoids deadlocks (details on > [#8712|https://github.com/scala/scala/pull/8712]) > * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in > the REPL, which can lead to deteriorating performance in long sessions > ([#8576|https://github.com/scala/scala/pull/8576]) > * Fix some {{toX}} methods that could expose the underlying mutability of a > {{ListBuffer}}-generated collection > ([#8674|https://github.com/scala/scala/pull/8674]) > h3. JDK 9+ support > * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ > ([#8676|https://github.com/scala/scala/pull/8676]) > * {{:javap}} in the REPL now works on JDK 9+ > ([#8400|https://github.com/scala/scala/pull/8400]) > h3. Other changes > * Support new labels for creating durations for consistency: > {{Duration("1m")}}, {{Duration("3 hrs")}} > ([#8325|https://github.com/scala/scala/pull/8325], > [#8450|https://github.com/scala/scala/pull/8450]) > * Fix memory leak in runtime reflection's {{TypeTag}} caches > ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety > issues in runtime reflection > ([#8433|https://github.com/scala/scala/pull/8433]) > * When using compiler plugins, the ordering of compiler phases may change > due to [#8427|https://github.com/scala/scala/pull/8427] > For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31168) Upgrade Scala to 2.12.13
[ https://issues.apache.org/jira/browse/SPARK-31168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266968#comment-17266968 ] Apache Spark commented on SPARK-31168: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/31223 > Upgrade Scala to 2.12.13 > > > Key: SPARK-31168 > URL: https://issues.apache.org/jira/browse/SPARK-31168 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > > h2. Highlights > * Performance improvements in the collections library: algorithmic > improvements and changes to avoid unnecessary allocations ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+label%3Alibrary%3Acollections+label%3Aperformance]) > * Performance improvements in the compiler ([list of > PRs|https://github.com/scala/scala/pulls?q=is%3Apr+milestone%3A2.12.11+is%3Aclosed+sort%3Acreated-desc+-label%3Alibrary%3Acollections+label%3Aperformance+], > minor [effects in our > benchmarks|https://scala-ci.typesafe.com/grafana/dashboard/db/scala-benchmark?orgId=1&from=1567985515850&to=1584355915694&var-branch=2.12.x&var-source=All&var-bench=HotScalacBenchmark.compile&var-host=scalabench@scalabench@]) > * Improvements to {{-Yrepl-class-based}}, an alternative internal REPL > encoding that avoids deadlocks (details on > [#8712|https://github.com/scala/scala/pull/8712]) > * A new {{-Yrepl-use-magic-imports}} flag that avoids deep class nesting in > the REPL, which can lead to deteriorating performance in long sessions > ([#8576|https://github.com/scala/scala/pull/8576]) > * Fix some {{toX}} methods that could expose the underlying mutability of a > {{ListBuffer}}-generated collection > ([#8674|https://github.com/scala/scala/pull/8674]) > h3. JDK 9+ support > * ASM was upgraded to 7.3.1, allowing the optimizer to run on JDK 13+ > ([#8676|https://github.com/scala/scala/pull/8676]) > * {{:javap}} in the REPL now works on JDK 9+ > ([#8400|https://github.com/scala/scala/pull/8400]) > h3. Other changes > * Support new labels for creating durations for consistency: > {{Duration("1m")}}, {{Duration("3 hrs")}} > ([#8325|https://github.com/scala/scala/pull/8325], > [#8450|https://github.com/scala/scala/pull/8450]) > * Fix memory leak in runtime reflection's {{TypeTag}} caches > ([#8470|https://github.com/scala/scala/pull/8470]) and some thread safety > issues in runtime reflection > ([#8433|https://github.com/scala/scala/pull/8433]) > * When using compiler plugins, the ordering of compiler phases may change > due to [#8427|https://github.com/scala/scala/pull/8427] > For more details, see [https://github.com/scala/scala/releases/tag/v2.12.11]. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34080) Add UnivariateFeatureSelector to deprecate existing selectors
[ https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266966#comment-17266966 ] Apache Spark commented on SPARK-34080: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/31222 > Add UnivariateFeatureSelector to deprecate existing selectors > - > > Key: SPARK-34080 > URL: https://issues.apache.org/jira/browse/SPARK-34080 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: Xiangrui Meng >Assignee: Huaxin Gao >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > > In SPARK-26111, we introduced a few univariate feature selectors, which share > a common set of params. And they are named after the underlying test, which > requires users to understand the test to find the matched scenarios. It would > be nice if we introduce a single class called UnivariateFeatureSelector that > accepts a selection criterion and a score method (string names). Then we can > deprecate all other univariate selectors. > For the params, instead of ask users to provide what score function to use, > it is more friendly to ask users to specify the feature and label types > (continuous or categorical) and we set a default score function for each > combo. We can also detect the types from feature metadata if given. Advanced > users can overwrite it (if there are multiple score function that is > compatible with the feature type and label type combo). Example (param names > are not finalized): > {code} > selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], > labelCol=["target"], featureType="categorical", labelType="continuous", > select="bestK", k=100) > {code} > cc: [~huaxingao] [~ruifengz] [~weichenxu123] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34080) Add UnivariateFeatureSelector to deprecate existing selectors
[ https://issues.apache.org/jira/browse/SPARK-34080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266964#comment-17266964 ] Apache Spark commented on SPARK-34080: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/31222 > Add UnivariateFeatureSelector to deprecate existing selectors > - > > Key: SPARK-34080 > URL: https://issues.apache.org/jira/browse/SPARK-34080 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.2.0, 3.1.1 >Reporter: Xiangrui Meng >Assignee: Huaxin Gao >Priority: Critical > Fix For: 3.2.0, 3.1.1 > > > In SPARK-26111, we introduced a few univariate feature selectors, which share > a common set of params. And they are named after the underlying test, which > requires users to understand the test to find the matched scenarios. It would > be nice if we introduce a single class called UnivariateFeatureSelector that > accepts a selection criterion and a score method (string names). Then we can > deprecate all other univariate selectors. > For the params, instead of ask users to provide what score function to use, > it is more friendly to ask users to specify the feature and label types > (continuous or categorical) and we set a default score function for each > combo. We can also detect the types from feature metadata if given. Advanced > users can overwrite it (if there are multiple score function that is > compatible with the feature type and label type combo). Example (param names > are not finalized): > {code} > selector = UnivariateFeatureSelector(featureCols=["x", "y", "z"], > labelCol=["target"], featureType="categorical", labelType="continuous", > select="bestK", k=100) > {code} > cc: [~huaxingao] [~ruifengz] [~weichenxu123] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34142) Support Fallback Storage Cleanup during stopping SparkContext
[ https://issues.apache.org/jira/browse/SPARK-34142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-34142: - Assignee: Dongjoon Hyun > Support Fallback Storage Cleanup during stopping SparkContext > - > > Key: SPARK-34142 > URL: https://issues.apache.org/jira/browse/SPARK-34142 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > SPARK-33545 added `Support Fallback Storage during worker decommission` for > the managed cloud-storage with TTL support. This issue aims to add additional > clean-up feature during stopping SparkContext to save some money before TTL > or the other HDFS-compatible storage which doesn't have TTL support. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34142) Support Fallback Storage Cleanup during stopping SparkContext
[ https://issues.apache.org/jira/browse/SPARK-34142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-34142. --- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31215 [https://github.com/apache/spark/pull/31215] > Support Fallback Storage Cleanup during stopping SparkContext > - > > Key: SPARK-34142 > URL: https://issues.apache.org/jira/browse/SPARK-34142 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.2.0 > > > SPARK-33545 added `Support Fallback Storage during worker decommission` for > the managed cloud-storage with TTL support. This issue aims to add additional > clean-up feature during stopping SparkContext to save some money before TTL > or the other HDFS-compatible storage which doesn't have TTL support. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33730) Standardize warning types
[ https://issues.apache.org/jira/browse/SPARK-33730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33730. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 30985 [https://github.com/apache/spark/pull/30985] > Standardize warning types > - > > Key: SPARK-33730 > URL: https://issues.apache.org/jira/browse/SPARK-33730 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Maciej Bryński >Priority: Major > Fix For: 3.2.0 > > > We should use warnings properly per > [https://docs.python.org/3/library/warnings.html#warning-categories] > In particular, > - we should use {{FutureWarning}} instead of {{DeprecationWarning}} for the > places we should show the warnings to end-users by default. > - we should __maybe__ think about customizing stacklevel > ([https://docs.python.org/3/library/warnings.html#warnings.warn]) like pandas > does. > - ... > Current warnings are a bit messy and somewhat arbitrary. > To be more explicit, we'll have to fix: > {code:java} > pyspark/context.py:warnings.warn( > pyspark/context.py:warnings.warn( > pyspark/ml/classification.py:warnings.warn("weightCol is > ignored, " > pyspark/ml/clustering.py:warnings.warn("Deprecated in 3.0.0. It will > be removed in future versions. Use " > pyspark/mllib/classification.py:warnings.warn( > pyspark/mllib/feature.py:warnings.warn("Both withMean and withStd > are false. The model does nothing.") > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/mllib/regression.py:warnings.warn( > pyspark/rdd.py:warnings.warn("mapPartitionsWithSplit is deprecated; " > pyspark/rdd.py:warnings.warn( > pyspark/shell.py:warnings.warn("Failed to initialize Spark session.") > pyspark/shuffle.py:warnings.warn("Please install psutil to have > better " > pyspark/sql/catalog.py:warnings.warn( > pyspark/sql/catalog.py:warnings.warn( > pyspark/sql/column.py:warnings.warn( > pyspark/sql/column.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/context.py:warnings.warn( > pyspark/sql/dataframe.py:warnings.warn( > pyspark/sql/dataframe.py:warnings.warn("to_replace is a dict > and value is not None. value will be ignored.") > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use degrees > instead.", DeprecationWarning) > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use radians > instead.", DeprecationWarning) > pyspark/sql/functions.py:warnings.warn("Deprecated in 2.1, use > approx_count_distinct instead.", DeprecationWarning) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/conversion.py:warnings.warn(msg) > pyspark/sql/pandas/functions.py:warnings.warn( > pyspark/sql/pandas/group_ops.py:warnings.warn( > pyspark/sql/session.py:warnings.warn("Fall back to non-hive > support because failing to access HiveConf, " > {code} > PySpark prints warnings via using {{print}} in some places as well. We should > also see if we should switch and replace to {{warnings.warn}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe
[ https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-34125. -- Fix Version/s: 2.4.8 Resolution: Fixed Issue resolved by pull request 31194 [https://github.com/apache/spark/pull/31194] > Make EventLoggingListener.codecMap thread-safe > -- > > Key: SPARK-34125 > URL: https://issues.apache.org/jira/browse/SPARK-34125 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > Fix For: 2.4.8 > > Attachments: jstack.png, top.png > > > 2.x version of history server > EventLoggingListener.codecMap is of type mutable.HashMap, which is not > thread safe > This will cause the history server to suddenly get stuck and not work. > The 3.x version was changed to EventLogFileReader.codecMap to > ConcurrentHashMap type, so there is no such problem.(-SPARK-28869-) > PID 117049 0x1c939 > !top.png! > > !jstack.png! > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34125) Make EventLoggingListener.codecMap thread-safe
[ https://issues.apache.org/jira/browse/SPARK-34125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-34125: Assignee: dzcxzl > Make EventLoggingListener.codecMap thread-safe > -- > > Key: SPARK-34125 > URL: https://issues.apache.org/jira/browse/SPARK-34125 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.7 >Reporter: dzcxzl >Assignee: dzcxzl >Priority: Trivial > Attachments: jstack.png, top.png > > > 2.x version of history server > EventLoggingListener.codecMap is of type mutable.HashMap, which is not > thread safe > This will cause the history server to suddenly get stuck and not work. > The 3.x version was changed to EventLogFileReader.codecMap to > ConcurrentHashMap type, so there is no such problem.(-SPARK-28869-) > PID 117049 0x1c939 > !top.png! > > !jstack.png! > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase
[ https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266938#comment-17266938 ] Apache Spark commented on SPARK-34148: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/31219 > Move general StateStore tests to StateStoreSuiteBase > > > Key: SPARK-34148 > URL: https://issues.apache.org/jira/browse/SPARK-34148 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > There are some general StateStore tests in StateStoreSuite which is > HDFSBackedStateStoreProvider-specific test suite. We should move general > tests into StateStoreSuiteBase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase
[ https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34148: Assignee: Apache Spark (was: L. C. Hsieh) > Move general StateStore tests to StateStoreSuiteBase > > > Key: SPARK-34148 > URL: https://issues.apache.org/jira/browse/SPARK-34148 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: Apache Spark >Priority: Major > > There are some general StateStore tests in StateStoreSuite which is > HDFSBackedStateStoreProvider-specific test suite. We should move general > tests into StateStoreSuiteBase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase
[ https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34148: Assignee: L. C. Hsieh (was: Apache Spark) > Move general StateStore tests to StateStoreSuiteBase > > > Key: SPARK-34148 > URL: https://issues.apache.org/jira/browse/SPARK-34148 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > There are some general StateStore tests in StateStoreSuite which is > HDFSBackedStateStoreProvider-specific test suite. We should move general > tests into StateStoreSuiteBase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase
[ https://issues.apache.org/jira/browse/SPARK-34148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266937#comment-17266937 ] Apache Spark commented on SPARK-34148: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/31219 > Move general StateStore tests to StateStoreSuiteBase > > > Key: SPARK-34148 > URL: https://issues.apache.org/jira/browse/SPARK-34148 > Project: Spark > Issue Type: Test > Components: Structured Streaming >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > > There are some general StateStore tests in StateStoreSuite which is > HDFSBackedStateStoreProvider-specific test suite. We should move general > tests into StateStoreSuiteBase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34148) Move general StateStore tests to StateStoreSuiteBase
L. C. Hsieh created SPARK-34148: --- Summary: Move general StateStore tests to StateStoreSuiteBase Key: SPARK-34148 URL: https://issues.apache.org/jira/browse/SPARK-34148 Project: Spark Issue Type: Test Components: Structured Streaming Affects Versions: 3.2.0 Reporter: L. C. Hsieh Assignee: L. C. Hsieh There are some general StateStore tests in StateStoreSuite which is HDFSBackedStateStoreProvider-specific test suite. We should move general tests into StateStoreSuiteBase. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-34123) Faster way to display/render entries in HistoryPage (Spark history server summary page)
[ https://issues.apache.org/jira/browse/SPARK-34123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-34123. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 31191 [https://github.com/apache/spark/pull/31191] > Faster way to display/render entries in HistoryPage (Spark history server > summary page) > --- > > Key: SPARK-34123 > URL: https://issues.apache.org/jira/browse/SPARK-34123 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Mohanad Elsafty >Assignee: Mohanad Elsafty >Priority: Major > Fix For: 3.2.0 > > Attachments: Screenshot 2021-01-15 at 1.21.40 PM.png > > > Since a long time ago my team/company suffered from history server being very > slow to display/search entries specially when entries grow over 50k entry, > regardless there is a pagination there in that page already but still very > slow to display the entries. > > Current situation *Mustache Js* is used to render the entries and > *datatables* is used to manipulate it (sort by column and search). > > By getting rid of *Mustache* (stop rendering the entries using *Mustache*) > and using *datatables* to display it proved to be faster. > > Displaying > 100k entries (my case): > Existing takes at least 30 to 40 seconds to display the entries, searching > takes at least 20 seconds and the page stop responding until it finishes. > Improved takes ~3 seconds to display the entries searching is very fast and > the page stays responsive. > *(These numbers will be different for others since JS is executed on your > browser)* > > I am not sure why *Mustache* is used to display the data since data tables > can do the job, > [~ajbozarth] [~sowen] please elaborate more about this what is the reason to > use *Mustache*? what are the drawbacks if it's not used anymore to display > the entries (only this part)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34123) Faster way to display/render entries in HistoryPage (Spark history server summary page)
[ https://issues.apache.org/jira/browse/SPARK-34123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-34123: Assignee: Mohanad Elsafty > Faster way to display/render entries in HistoryPage (Spark history server > summary page) > --- > > Key: SPARK-34123 > URL: https://issues.apache.org/jira/browse/SPARK-34123 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Mohanad Elsafty >Assignee: Mohanad Elsafty >Priority: Major > Attachments: Screenshot 2021-01-15 at 1.21.40 PM.png > > > Since a long time ago my team/company suffered from history server being very > slow to display/search entries specially when entries grow over 50k entry, > regardless there is a pagination there in that page already but still very > slow to display the entries. > > Current situation *Mustache Js* is used to render the entries and > *datatables* is used to manipulate it (sort by column and search). > > By getting rid of *Mustache* (stop rendering the entries using *Mustache*) > and using *datatables* to display it proved to be faster. > > Displaying > 100k entries (my case): > Existing takes at least 30 to 40 seconds to display the entries, searching > takes at least 20 seconds and the page stop responding until it finishes. > Improved takes ~3 seconds to display the entries searching is very fast and > the page stays responsive. > *(These numbers will be different for others since JS is executed on your > browser)* > > I am not sure why *Mustache* is used to display the data since data tables > can do the job, > [~ajbozarth] [~sowen] please elaborate more about this what is the reason to > use *Mustache*? what are the drawbacks if it's not used anymore to display > the entries (only this part)? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled
[ https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266889#comment-17266889 ] Apache Spark commented on SPARK-34147: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/31218 > Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled > -- > > Key: SPARK-34147 > URL: https://issues.apache.org/jira/browse/SPARK-34147 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Major > > {{--cbo}} should not change partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled
[ https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34147: Assignee: (was: Apache Spark) > Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled > -- > > Key: SPARK-34147 > URL: https://issues.apache.org/jira/browse/SPARK-34147 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Major > > {{--cbo}} should not change partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled
[ https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34147: Assignee: Apache Spark > Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled > -- > > Key: SPARK-34147 > URL: https://issues.apache.org/jira/browse/SPARK-34147 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Peter Toth >Assignee: Apache Spark >Priority: Major > > {{--cbo}} should not change partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled
[ https://issues.apache.org/jira/browse/SPARK-34147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266888#comment-17266888 ] Apache Spark commented on SPARK-34147: -- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/31218 > Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled > -- > > Key: SPARK-34147 > URL: https://issues.apache.org/jira/browse/SPARK-34147 > Project: Spark > Issue Type: Improvement > Components: SQL, Tests >Affects Versions: 3.2.0 >Reporter: Peter Toth >Priority: Major > > {{--cbo}} should not change partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34147) Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled
Peter Toth created SPARK-34147: -- Summary: Keep data partitioning in TPCDSQueryBenchmark when CBO is enabled Key: SPARK-34147 URL: https://issues.apache.org/jira/browse/SPARK-34147 Project: Spark Issue Type: Improvement Components: SQL, Tests Affects Versions: 3.2.0 Reporter: Peter Toth {{--cbo}} should not change partitioning. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266848#comment-17266848 ] Apache Spark commented on SPARK-34146: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/31217 > "*" should not throw Exception in SparkGetSchemasOperation > -- > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will throw `Exception` > when handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34146: Assignee: (was: Apache Spark) > "*" should not throw Exception in SparkGetSchemasOperation > -- > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will throw `Exception` > when handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266847#comment-17266847 ] Apache Spark commented on SPARK-34146: -- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/31217 > "*" should not throw Exception in SparkGetSchemasOperation > -- > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will throw `Exception` > when handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34146: Assignee: Apache Spark > "*" should not throw Exception in SparkGetSchemasOperation > -- > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will throw `Exception` > when handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34146) "*" should not crash in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Pan updated SPARK-34146: -- Description: HiveServer2 treat "*" as list all databases, but spark will throw `Exception` when handle global temp view since "" is not a valid regex. (was: HiveServer2 treat "*" as list all databases, but spark will crashed when handle global temp view since "" is not a valid regex.) > "*" should not crash in SparkGetSchemasOperation > > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will throw `Exception` > when handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34146) "*" should not throw Exception in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Pan updated SPARK-34146: -- Summary: "*" should not throw Exception in SparkGetSchemasOperation (was: "*" should not crash in SparkGetSchemasOperation) > "*" should not throw Exception in SparkGetSchemasOperation > -- > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will throw `Exception` > when handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34146) "*" should not crash in SparkGetSchemasOperation
[ https://issues.apache.org/jira/browse/SPARK-34146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Pan updated SPARK-34146: -- Description: HiveServer2 treat "*" as list all databases, but spark will crashed when handle global temp view since "" is not a valid regex. (was: HiveServer2 treat "*" as list all databases, but spark will crashed when handle global temp view since "*" is not a valid regex.) > "*" should not crash in SparkGetSchemasOperation > > > Key: SPARK-34146 > URL: https://issues.apache.org/jira/browse/SPARK-34146 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.1 >Reporter: Cheng Pan >Priority: Minor > > HiveServer2 treat "*" as list all databases, but spark will crashed when > handle global temp view since "" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34146) "*" should not crash in SparkGetSchemasOperation
Cheng Pan created SPARK-34146: - Summary: "*" should not crash in SparkGetSchemasOperation Key: SPARK-34146 URL: https://issues.apache.org/jira/browse/SPARK-34146 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 3.1.1 Reporter: Cheng Pan HiveServer2 treat "*" as list all databases, but spark will crashed when handle global temp view since "*" is not a valid regex. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34145) Combine scalar subqueries
Yuming Wang created SPARK-34145: --- Summary: Combine scalar subqueries Key: SPARK-34145 URL: https://issues.apache.org/jira/browse/SPARK-34145 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yuming Wang We can add a rule to combine scalar subqueries if it from same table to improve query performance. for example: {code:sql} -- TPC-DS q9.sql SELECT CASE WHEN (SELECT count(*) FROM store_sales WHERE ss_quantity BETWEEN 1 AND 20) > 62316685 THEN (SELECT avg(ss_ext_discount_amt) FROM store_sales WHERE ss_quantity BETWEEN 1 AND 20) ELSE (SELECT avg(ss_net_paid) FROM store_sales WHERE ss_quantity BETWEEN 1 AND 20) END bucket1, CASE WHEN (SELECT count(*) FROM store_sales WHERE ss_quantity BETWEEN 21 AND 40) > 19045798 THEN (SELECT avg(ss_ext_discount_amt) FROM store_sales WHERE ss_quantity BETWEEN 21 AND 40) ELSE (SELECT avg(ss_net_paid) FROM store_sales WHERE ss_quantity BETWEEN 21 AND 40) END bucket2, CASE WHEN (SELECT count(*) FROM store_sales WHERE ss_quantity BETWEEN 41 AND 60) > 365541424 THEN (SELECT avg(ss_ext_discount_amt) FROM store_sales WHERE ss_quantity BETWEEN 41 AND 60) ELSE (SELECT avg(ss_net_paid) FROM store_sales WHERE ss_quantity BETWEEN 41 AND 60) END bucket3, CASE WHEN (SELECT count(*) FROM store_sales WHERE ss_quantity BETWEEN 61 AND 80) > 216357808 THEN (SELECT avg(ss_ext_discount_amt) FROM store_sales WHERE ss_quantity BETWEEN 61 AND 80) ELSE (SELECT avg(ss_net_paid) FROM store_sales WHERE ss_quantity BETWEEN 61 AND 80) END bucket4, CASE WHEN (SELECT count(*) FROM store_sales WHERE ss_quantity BETWEEN 81 AND 100) > 184483884 THEN (SELECT avg(ss_ext_discount_amt) FROM store_sales WHERE ss_quantity BETWEEN 81 AND 100) ELSE (SELECT avg(ss_net_paid) FROM store_sales WHERE ss_quantity BETWEEN 81 AND 100) END bucket5 FROM reason WHERE r_reason_sk = 1 {code} We can rewrite it to: {code:sql} WITH bucket_result AS ( SELECT CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 1 AND 20)) > 62316685 THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 1 AND 20)) ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 1 AND 20)) END bucket1, CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 21 AND 40)) > 62316685 THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 21 AND 40)) ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 21 AND 40)) END bucket2, CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 41 AND 60)) > 62316685 THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 41 AND 60)) ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 41 AND 60)) END bucket3, CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 61 AND 80)) > 62316685 THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 61 AND 80)) ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 61 AND 80)) END bucket4, CASE WHEN (count(ss_quantity) FILTER (WHERE ss_quantity BETWEEN 81 AND 100)) > 62316685 THEN (avg(ss_ext_discount_amt) FILTER (WHERE ss_quantity BETWEEN 81 AND 100)) ELSE (avg(ss_net_paid) FILTER (WHERE ss_quantity BETWEEN 81 AND 100)) END bucket5 FROM store_sales ) SELECT (SELECT bucket1 FROM bucket_result) as bucket1, (SELECT bucket2 FROM bucket_result) as bucket2, (SELECT bucket3 FROM bucket_result) as bucket3, (SELECT bucket4 FROM bucket_result) as bucket4, (SELECT bucket5 FROM bucket_result) as bucket5 FROM reason WHERE r_reason_sk = 1; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34144) java.time.Instant and java.time.LocalDate not handled when writing to tables
[ https://issues.apache.org/jira/browse/SPARK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristi updated SPARK-34144: --- Summary: java.time.Instant and java.time.LocalDate not handled when writing to tables (was: java.time.Instant and java.time.LocalDate not handled not handled when writing to tables) > java.time.Instant and java.time.LocalDate not handled when writing to tables > > > Key: SPARK-34144 > URL: https://issues.apache.org/jira/browse/SPARK-34144 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1, 3.1.0, 3.1.1 >Reporter: Cristi >Priority: Major > > When using the new java time API (spark.sql.datetime.java8API.enabled=true) > LocalDate and Instant aren't handled in > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#makeSetter so > Instant and LocalDate are cast to Timestamp and Date when attempting to write > values to a table. > Driver stacktrace:Driver stacktrace: at > org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007) > at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at > org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973) > at > org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973) > at scala.Option.foreach(Option.scala:407) at > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2099) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2139) at > org.apache.spark.SparkContext.runJob(SparkContext.scala:2164) at > org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:994) at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:388) at > org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:992) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:856) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:68) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121) > at > org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at > org.apache.spark.sq
[jira] [Updated] (SPARK-34144) java.time.Instant and java.time.LocalDate not handled not handled when writing to tables
[ https://issues.apache.org/jira/browse/SPARK-34144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cristi updated SPARK-34144: --- Description: When using the new java time API (spark.sql.datetime.java8API.enabled=true) LocalDate and Instant aren't handled in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils#makeSetter so Instant and LocalDate are cast to Timestamp and Date when attempting to write values to a table. Driver stacktrace:Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2059) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2008) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2007) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2007) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:973) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:973) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:973) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2239) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2188) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2177) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:775) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2099) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2120) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2139) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2164) at org.apache.spark.rdd.RDD.$anonfun$foreachPartition$1(RDD.scala:994) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:388) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:992) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.saveTable(JdbcUtils.scala:856) at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:68) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:175) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:213) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:210) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:171) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:122) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:121) at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:963) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:963) at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:415) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:399) Caused by: java.lang.ClassCastException: class java.time.LocalDate cannot be cast to class java.sql.Date (java.time.LocalDate is in module java.base of loader 'bootstrap'; java.sql.Date is in module java.sql of loader 'platform') at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeSetter$11(JdbcUtils.scala:573) at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeSetter$11$adapted(JdbcUtils.scala:572) at org.ap
[jira] [Commented] (SPARK-34115) Long runtime on many environment variables
[ https://issues.apache.org/jira/browse/SPARK-34115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266764#comment-17266764 ] Hyukjin Kwon commented on SPARK-34115: -- I think you can try and see if it works when we switch isTesting to a lazy val as you proposed. But I have to say theoretically both should have constant lookup time which should not affect performance heavily. > Long runtime on many environment variables > -- > > Key: SPARK-34115 > URL: https://issues.apache.org/jira/browse/SPARK-34115 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.4.0, 2.4.7, 3.0.1 > Environment: Spark 2.4.0 local[2] on a Kubernetes Pod >Reporter: Norbert Schultz >Priority: Major > Attachments: spark-bug-34115.tar.gz > > > I am not sure if this is a bug report or a feature request. The code is is > the same in current versions of Spark and maybe this ticket saves someone > some time for debugging. > We migrated some older code to Spark 2.4.0, and suddently the integration > tests on our build machine were much slower than expected. > On local machines it was running perfectly. > At the end it turned out, that Spark was wasting CPU Cycles during DataFrame > analyzing in the following functions > * AnalysisHelper.assertNotAnalysisRule calling > * Utils.isTesting > Utils.isTesting is traversing all environment variables. > The offending build machine was a Kubernetes Pod which automatically exposed > all services as environment variables, so it had more than 3000 environment > variables. > As Utils.isTesting is called very often throgh > AnalysisHelper.assertNotAnalysisRule (via AnalysisHelper.transformDown, > transformUp). > > Of course we will restrict the number of environment variables, on the other > side Utils.isTesting could also use a lazy val for > > {code:java} > sys.env.contains("SPARK_TESTING") {code} > > to not make it that expensive. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34144) java.time.Instant and java.time.LocalDate not handled not handled when writing to tables
Cristi created SPARK-34144: -- Summary: java.time.Instant and java.time.LocalDate not handled not handled when writing to tables Key: SPARK-34144 URL: https://issues.apache.org/jira/browse/SPARK-34144 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 3.1.0, 3.1.1 Reporter: Cristi -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34121) Intersect operator missing rowCount when CBO enabled
[ https://issues.apache.org/jira/browse/SPARK-34121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266763#comment-17266763 ] Hyukjin Kwon commented on SPARK-34121: -- [~yumwang] mind filling the PR description? > Intersect operator missing rowCount when CBO enabled > > > Key: SPARK-34121 > URL: https://issues.apache.org/jira/browse/SPARK-34121 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34126) SQL running error, spark does not exit, resulting in data quality problems
[ https://issues.apache.org/jira/browse/SPARK-34126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266762#comment-17266762 ] Hyukjin Kwon commented on SPARK-34126: -- [~shikui] can you provide a self-contained reproducible step? > SQL running error, spark does not exit, resulting in data quality problems > -- > > Key: SPARK-34126 > URL: https://issues.apache.org/jira/browse/SPARK-34126 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.1 > Environment: spark3.0.1 on yarn >Reporter: shikui ye >Priority: Major > > Spark SQL executes a SQL file containing multiple SQL segments. Because one > of the SQL segments fails to run, but spark driver or spark context does not > exit, an error will occur. The table written by the SQL segment is empty or > old data. Depending on this problematic table, the subsequent SQL will have > data quality problems even if it runs successfully. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34131) NPE when driver.podTemplateFile defines no containers
[ https://issues.apache.org/jira/browse/SPARK-34131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266760#comment-17266760 ] Hyukjin Kwon commented on SPARK-34131: -- cc [~holdenkarau] and [~dongjoon] FYI > NPE when driver.podTemplateFile defines no containers > - > > Key: SPARK-34131 > URL: https://issues.apache.org/jira/browse/SPARK-34131 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.1 >Reporter: Jacek Laskowski >Priority: Minor > > An empty pod template leads to the following NPE: > {code} > 21/01/15 18:44:32 ERROR KubernetesUtils: Encountered exception while > attempting to load initial pod spec from file > java.lang.NullPointerException > at > org.apache.spark.deploy.k8s.KubernetesUtils$.selectSparkContainer(KubernetesUtils.scala:108) > at > org.apache.spark.deploy.k8s.KubernetesUtils$.loadPodFromTemplate(KubernetesUtils.scala:88) > at > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.$anonfun$buildFromFeatures$1(KubernetesDriverBuilder.scala:36) > at scala.Option.map(Option.scala:230) > at > org.apache.spark.deploy.k8s.submit.KubernetesDriverBuilder.buildFromFeatures(KubernetesDriverBuilder.scala:32) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:98) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4(KubernetesClientApplication.scala:221) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$4$adapted(KubernetesClientApplication.scala:215) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2539) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:215) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:188) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} > {code:java} > $> cat empty-template.yml > spec: > {code} > {code} > $> ./bin/run-example \ > --master k8s://$K8S_SERVER \ > --deploy-mode cluster \ > --conf spark.kubernetes.driver.podTemplateFile=empty-template.yml \ > --name $POD_NAME \ > --jars local:///opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar \ > --conf spark.kubernetes.container.image=spark:v3.0.1 \ > --conf spark.kubernetes.driver.pod.name=$POD_NAME \ > --conf spark.kubernetes.namespace=spark-demo \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --verbose \ >SparkPi 10 > {code} > It appears that the implicit requirement is that there's at least one > well-defined container of any name (not necessarily > {{spark.kubernetes.driver.podTemplateContainerName}}). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-33245) Add built-in UDF - GETBIT
[ https://issues.apache.org/jira/browse/SPARK-33245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-33245. -- Resolution: Won't Fix > Add built-in UDF - GETBIT > -- > > Key: SPARK-33245 > URL: https://issues.apache.org/jira/browse/SPARK-33245 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > Teradata, Impala, Snowflake and Yellowbrick support this function: > https://docs.teradata.com/reader/kmuOwjp1zEYg98JsB8fu_A/PK1oV1b2jqvG~ohRnOro9w > https://docs.cloudera.com/runtime/7.2.0/impala-sql-reference/topics/impala-bit-functions.html#bit_functions__getbit > https://docs.snowflake.com/en/sql-reference/functions/getbit.html > https://www.yellowbrick.com/docs/2.2/ybd_sqlref/getbit.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34143) Adding partitions to fully partitioned v2 table
[ https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266742#comment-17266742 ] Apache Spark commented on SPARK-34143: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31216 > Adding partitions to fully partitioned v2 table > --- > > Key: SPARK-34143 > URL: https://issues.apache.org/jira/browse/SPARK-34143 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test below fails: > {code:scala} > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY > (p0, p1)") > sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')") > checkPartitions(t, Map("p0" -> "0", "p1" -> "abc")) > checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc")) > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34143) Adding partitions to fully partitioned v2 table
[ https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17266741#comment-17266741 ] Apache Spark commented on SPARK-34143: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/31216 > Adding partitions to fully partitioned v2 table > --- > > Key: SPARK-34143 > URL: https://issues.apache.org/jira/browse/SPARK-34143 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test below fails: > {code:scala} > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY > (p0, p1)") > sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')") > checkPartitions(t, Map("p0" -> "0", "p1" -> "abc")) > checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc")) > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34143) Adding partitions to fully partitioned v2 table
[ https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34143: Assignee: (was: Apache Spark) > Adding partitions to fully partitioned v2 table > --- > > Key: SPARK-34143 > URL: https://issues.apache.org/jira/browse/SPARK-34143 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Priority: Major > > The test below fails: > {code:scala} > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY > (p0, p1)") > sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')") > checkPartitions(t, Map("p0" -> "0", "p1" -> "abc")) > checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc")) > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34143) Adding partitions to fully partitioned v2 table
[ https://issues.apache.org/jira/browse/SPARK-34143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-34143: Assignee: Apache Spark > Adding partitions to fully partitioned v2 table > --- > > Key: SPARK-34143 > URL: https://issues.apache.org/jira/browse/SPARK-34143 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > The test below fails: > {code:scala} > withNamespaceAndTable("ns", "tbl") { t => > sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY > (p0, p1)") > sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')") > checkPartitions(t, Map("p0" -> "0", "p1" -> "abc")) > checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc")) > } > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34143) Adding partitions to fully partitioned v2 table
Maxim Gekk created SPARK-34143: -- Summary: Adding partitions to fully partitioned v2 table Key: SPARK-34143 URL: https://issues.apache.org/jira/browse/SPARK-34143 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Maxim Gekk The test below fails: {code:scala} withNamespaceAndTable("ns", "tbl") { t => sql(s"CREATE TABLE $t (p0 INT, p1 STRING) $defaultUsing PARTITIONED BY (p0, p1)") sql(s"ALTER TABLE $t ADD PARTITION (p0 = 0, p1 = 'abc')") checkPartitions(t, Map("p0" -> "0", "p1" -> "abc")) checkAnswer(sql(s"SELECT * FROM $t"), Row(0, "abc")) } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-34111: Assignee: Kent Yao > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Assignee: Kent Yao >Priority: Critical > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34111) Deconflict the jars jakarta.servlet-api-4.0.3.jar and javax.servlet-api-3.1.0.jar
[ https://issues.apache.org/jira/browse/SPARK-34111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-34111: - Priority: Critical (was: Blocker) > Deconflict the jars jakarta.servlet-api-4.0.3.jar and > javax.servlet-api-3.1.0.jar > - > > Key: SPARK-34111 > URL: https://issues.apache.org/jira/browse/SPARK-34111 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.1.0 >Reporter: Hyukjin Kwon >Priority: Critical > > After SPARK-33705, we now happened to have two jars in the release artifact > with Hadoop 3: > {{dev/deps/spark-deps-hadoop-3.2-hive-2.3}}: > {code} > ... > jakarta.servlet-api/4.0.3//jakarta.servlet-api-4.0.3.jar > ... > javax.servlet-api/3.1.0//javax.servlet-api-3.1.0.jar > ... > {code} > It can potentially cause an issue, and we should better remove > {{javax.servlet-api-3.1.0.jar}} which is apparently only required for YARN > tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org