[jira] [Assigned] (SPARK-40157) Make pyspark.files examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40157: Assignee: Apache Spark (was: Ruifeng Zheng) > Make pyspark.files examples self-contained > -- > > Key: SPARK-40157 > URL: https://issues.apache.org/jira/browse/SPARK-40157 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40157) Make pyspark.files examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40157: Assignee: Ruifeng Zheng (was: Apache Spark) > Make pyspark.files examples self-contained > -- > > Key: SPARK-40157 > URL: https://issues.apache.org/jira/browse/SPARK-40157 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40157) Make pyspark.files examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582738#comment-17582738 ] Apache Spark commented on SPARK-40157: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/37607 > Make pyspark.files examples self-contained > -- > > Key: SPARK-40157 > URL: https://issues.apache.org/jira/browse/SPARK-40157 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40173) Make pyspark.taskcontext examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582735#comment-17582735 ] Hyukjin Kwon commented on SPARK-40173: -- im working on this. > Make pyspark.taskcontext examples self-contained > > > Key: SPARK-40173 > URL: https://issues.apache.org/jira/browse/SPARK-40173 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark, Spark Core >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40173) Make pyspark.taskcontext examples self-contained
Hyukjin Kwon created SPARK-40173: Summary: Make pyspark.taskcontext examples self-contained Key: SPARK-40173 URL: https://issues.apache.org/jira/browse/SPARK-40173 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark, Spark Core Affects Versions: 3.4.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40172: Assignee: Gengliang Wang (was: Apache Spark) > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582722#comment-17582722 ] Apache Spark commented on SPARK-40172: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37605 > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582723#comment-17582723 ] Apache Spark commented on SPARK-40172: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/37605 > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40172: Assignee: Apache Spark (was: Gengliang Wang) > Temporarily disable flaky test cases in ImageFileFormatSuite > > > Key: SPARK-40172 > URL: https://issues.apache.org/jira/browse/SPARK-40172 > Project: Spark > Issue Type: Test > Components: ML, Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > > 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I > suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40171) Fix flaky tests in ImageFileFormatSuite
[ https://issues.apache.org/jira/browse/SPARK-40171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582721#comment-17582721 ] Gengliang Wang commented on SPARK-40171: cc [~weichenxu123] > Fix flaky tests in ImageFileFormatSuite > --- > > Key: SPARK-40171 > URL: https://issues.apache.org/jira/browse/SPARK-40171 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Priority: Major > > There are 3 test cases that become flaky in the GitHub action tests: > [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] > We should fix them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40172) Temporarily disable flaky test cases in ImageFileFormatSuite
Gengliang Wang created SPARK-40172: -- Summary: Temporarily disable flaky test cases in ImageFileFormatSuite Key: SPARK-40172 URL: https://issues.apache.org/jira/browse/SPARK-40172 Project: Spark Issue Type: Test Components: ML, Tests Affects Versions: 3.4.0 Reporter: Gengliang Wang Assignee: Gengliang Wang 3 test cases in ImageFileFormatSuite become flaky in the GitHub action tests: [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] Before they are fixed(https://issues.apache.org/jira/browse/SPARK-40171), I suggest disabling them in OSS. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40171) Fix flaky tests in ImageFileFormatSuite
Gengliang Wang created SPARK-40171: -- Summary: Fix flaky tests in ImageFileFormatSuite Key: SPARK-40171 URL: https://issues.apache.org/jira/browse/SPARK-40171 Project: Spark Issue Type: Bug Components: ML Affects Versions: 3.4.0 Reporter: Gengliang Wang There are 3 test cases that become flaky in the GitHub action tests: [https://github.com/apache/spark/runs/7941765326?check_suite_focus=true] We should fix them. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key
[ https://issues.apache.org/jira/browse/SPARK-40149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582709#comment-17582709 ] Hyukjin Kwon commented on SPARK-40149: -- [~karenfeng] FYI > Star expansion after outer join asymmetrically includes joining key > --- > > Key: SPARK-40149 > URL: https://issues.apache.org/jira/browse/SPARK-40149 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2 >Reporter: Otakar Truněček >Priority: Major > > When star expansion is used on left side of a join, the result will include > joining key, while on the right side of join it doesn't. I would expect the > behaviour to be symmetric (either include on both sides or on neither). > Example: > {code:python} > from pyspark.sql import SparkSession > import pyspark.sql.functions as f > spark = SparkSession.builder.getOrCreate() > df_left = spark.range(5).withColumn('val', f.lit('left')) > df_right = spark.range(3, 7).withColumn('val', f.lit('right')) > df_merged = ( > df_left > .alias('left') > .join(df_right.alias('right'), on='id', how='full_outer') > .withColumn('left_all', f.struct('left.*')) > .withColumn('right_all', f.struct('right.*')) > ) > df_merged.show() > {code} > result: > {code:java} > +---++-++-+ > | id| val| val|left_all|right_all| > +---++-++-+ > | 0|left| null| {0, left}| {null}| > | 1|left| null| {1, left}| {null}| > | 2|left| null| {2, left}| {null}| > | 3|left|right| {3, left}| {right}| > | 4|left|right| {4, left}| {right}| > | 5|null|right|{null, null}| {right}| > | 6|null|right|{null, null}| {right}| > +---++-++-+ > {code} > This behaviour started with release 3.2.0. Previously the key was not > included on either side. > Result from Spark 3.1.3 > {code:java} > +---++-++-+ > | id| val| val|left_all|right_all| > +---++-++-+ > | 0|left| null| {left}| {null}| > | 6|null|right| {null}| {right}| > | 5|null|right| {null}| {right}| > | 1|left| null| {left}| {null}| > | 3|left|right| {left}| {right}| > | 2|left| null| {left}| {null}| > | 4|left|right| {left}| {right}| > +---++-++-+ {code} > I have a gut feeling this is related to these issues: > https://issues.apache.org/jira/browse/SPARK-39376 > https://issues.apache.org/jira/browse/SPARK-34527 > https://issues.apache.org/jira/browse/SPARK-38603 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40140) REST API for SQL level information does not show information on running queries
[ https://issues.apache.org/jira/browse/SPARK-40140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40140. -- Resolution: Cannot Reproduce > REST API for SQL level information does not show information on running > queries > --- > > Key: SPARK-40140 > URL: https://issues.apache.org/jira/browse/SPARK-40140 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yeachan Park >Priority: Minor > Attachments: running.png > > > Hi All, > We noticed that the SQL information REST API implemented in > https://issues.apache.org/jira/browse/SPARK-27142 does not return back SQL > queries which are currently running. We can only see queries which are > completed/failed. > As far as I can see, this should be supported since one of the fields in the > returned JSON is "runningJobIds". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions
[ https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582707#comment-17582707 ] XiDuo You commented on SPARK-39915: --- We may need a more strict machine to ensure the output partition number of repartition > Dataset.repartition(N) may not create N partitions > -- > > Key: SPARK-39915 > URL: https://issues.apache.org/jira/browse/SPARK-39915 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Shixiong Zhu >Priority: Major > > Looks like there is a behavior change in Dataset.repartition in 3.3.0. For > example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 > in Spark 3.2.0, but 0 in Spark 3.3.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582708#comment-17582708 ] caican commented on SPARK-40170: gently ping [~sowen] [~r...@databricks.com] > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > Does anyone have any ideas for optimization? > !image-2022-08-22-10-56-54-768.png! > > !image-2022-08-22-10-57-11-744.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39915) Dataset.repartition(N) may not create N partitions
[ https://issues.apache.org/jira/browse/SPARK-39915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582703#comment-17582703 ] XiDuo You commented on SPARK-39915: --- Thank you [~yumwang] for ping me. I see this issue. This is not only for empty relation optimization but also for other unary node which is at top of repartition, e.g.: {code:java} val df1 = spark.range(1).selectExpr("id as c1") val df2 = spark.range(1).selectExpr("id as c2") df1.join(df2, col("c1") === col("c2")).repartition(200, col("c1")).rdd.getNumPartitions -- output 1{code} the `.rdd` of dataset will inject a unary node `DeserializeToObject`, so the protection of current AQE for repartition does not work. see `AQEUtils`. And the protection does not retain the `RoundRobinPartitioning`, which makes this issue more complex. > Dataset.repartition(N) may not create N partitions > -- > > Key: SPARK-39915 > URL: https://issues.apache.org/jira/browse/SPARK-39915 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Shixiong Zhu >Priority: Major > > Looks like there is a behavior change in Dataset.repartition in 3.3.0. For > example, `spark.range(10, 0).repartition(5).rdd.getNumPartitions` returns 5 > in Spark 3.2.0, but 0 in Spark 3.3.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caican updated SPARK-40170: --- Description: When `UnsafeRow` is converted to `Row` at `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow `, UTF8String decoding and copyMemory process are very slow. Does anyone have any ideas for optimization? !image-2022-08-22-10-56-54-768.png! !image-2022-08-22-10-57-11-744.png! was: When `UnsafeRow` is converted to `Row` at `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow `, UTF8String decoding and copyMemory process are very slow. !image-2022-08-22-10-56-54-768.png! !image-2022-08-22-10-57-11-744.png! > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > Does anyone have any ideas for optimization? > !image-2022-08-22-10-56-54-768.png! > > !image-2022-08-22-10-57-11-744.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caican updated SPARK-40170: --- Attachment: image-2022-08-22-10-57-11-744.png > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > !image-2022-08-22-10-51-07-542.png! > > !image-2022-08-22-10-56-04-574.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caican updated SPARK-40170: --- Description: When `UnsafeRow` is converted to `Row` at `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow `, UTF8String decoding and copyMemory process are very slow. !image-2022-08-22-10-56-54-768.png! !image-2022-08-22-10-57-11-744.png! was: When `UnsafeRow` is converted to `Row` at `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow `, UTF8String decoding and copyMemory process are very slow. !image-2022-08-22-10-51-07-542.png! !image-2022-08-22-10-56-04-574.png! > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png, > image-2022-08-22-10-57-11-744.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > !image-2022-08-22-10-56-54-768.png! > > !image-2022-08-22-10-57-11-744.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40170) StringCoding UTF8 decode slowly
[ https://issues.apache.org/jira/browse/SPARK-40170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caican updated SPARK-40170: --- Attachment: image-2022-08-22-10-56-54-768.png > StringCoding UTF8 decode slowly > --- > > Key: SPARK-40170 > URL: https://issues.apache.org/jira/browse/SPARK-40170 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: caican >Priority: Major > Attachments: image-2022-08-22-10-56-54-768.png > > > When `UnsafeRow` is converted to `Row` at > `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow > `, UTF8String decoding and copyMemory process are very slow. > !image-2022-08-22-10-51-07-542.png! > > !image-2022-08-22-10-56-04-574.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40170) StringCoding UTF8 decode slowly
caican created SPARK-40170: -- Summary: StringCoding UTF8 decode slowly Key: SPARK-40170 URL: https://issues.apache.org/jira/browse/SPARK-40170 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: caican Attachments: image-2022-08-22-10-56-54-768.png When `UnsafeRow` is converted to `Row` at `org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.createExternalRow `, UTF8String decoding and copyMemory process are very slow. !image-2022-08-22-10-51-07-542.png! !image-2022-08-22-10-56-04-574.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40074) Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)
[ https://issues.apache.org/jira/browse/SPARK-40074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anuj Gargava updated SPARK-40074: - Affects Version/s: 3.3.0 > Error while creating dataset in Java spark-3.x using Encoders bean with Dense > Vector. (Issue arises when updating spark from 2.4 to 3.x) > > > Key: SPARK-40074 > URL: https://issues.apache.org/jira/browse/SPARK-40074 > Project: Spark > Issue Type: Bug > Components: Java API, ML, SQL >Affects Versions: 3.1.2, 3.3.0, 3.2.2 > Environment: Scala 2.12 > Spark 3.x >Reporter: Anuj Gargava >Priority: Major > > Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also > scala is upgraded from 2.11 to 2.12). > This java code below used to work with spark 2.4 but when migrated to 3.x it > gives the error (mentioned below) I have done my own research but couldn't > find a solution or any related information. > > > {code:java|title=Code.java|borderStyle=solid} > public void test() { > final SparkSession spark = SparkSession.builder() > .appName("Test") > .getOrCreate(); > DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, > 0.24})); > DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, > 0.32})); > final List inputsNew = Arrays.asList(denseFactor1, denseFactor2); > final Dataset denseVectorDf = spark.createDataset(inputsNew, > Encoders.bean(DenseClass.class)); > denseVectorDf.printSchema(); > } > public static class DenseClass implements Serializable > { private org.apache.spark.ml.linalg.DenseVector denseVector; }{code} > The error occurs while creating the dataset *denseVectorDf* . > Error > > {noformat} > }} > {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from > struct<> to > struct,values:array>. > The type path of the target object is: > - field (class: "org.apache.spark.ml.linalg.DenseVector", name: > "denseVector") > You can either add an explicit cast to the input data or choose a higher > precision type of the field in the target object}} > {{{noformat} > I have tried to use _double_ instead of dense vector and it works just fine, > but fails on using the dense vector with encoders bean. > > StackOverflow link for the issue: > [https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40074) Error while creating dataset in Java spark-3.x using Encoders bean with Dense Vector. (Issue arises when updating spark from 2.4 to 3.x)
[ https://issues.apache.org/jira/browse/SPARK-40074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anuj Gargava updated SPARK-40074: - Affects Version/s: 3.2.2 > Error while creating dataset in Java spark-3.x using Encoders bean with Dense > Vector. (Issue arises when updating spark from 2.4 to 3.x) > > > Key: SPARK-40074 > URL: https://issues.apache.org/jira/browse/SPARK-40074 > Project: Spark > Issue Type: Bug > Components: Java API, ML, SQL >Affects Versions: 3.1.2, 3.2.2 > Environment: Scala 2.12 > Spark 3.x >Reporter: Anuj Gargava >Priority: Major > > Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also > scala is upgraded from 2.11 to 2.12). > This java code below used to work with spark 2.4 but when migrated to 3.x it > gives the error (mentioned below) I have done my own research but couldn't > find a solution or any related information. > > > {code:java|title=Code.java|borderStyle=solid} > public void test() { > final SparkSession spark = SparkSession.builder() > .appName("Test") > .getOrCreate(); > DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, > 0.24})); > DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, > 0.32})); > final List inputsNew = Arrays.asList(denseFactor1, denseFactor2); > final Dataset denseVectorDf = spark.createDataset(inputsNew, > Encoders.bean(DenseClass.class)); > denseVectorDf.printSchema(); > } > public static class DenseClass implements Serializable > { private org.apache.spark.ml.linalg.DenseVector denseVector; }{code} > The error occurs while creating the dataset *denseVectorDf* . > Error > > {noformat} > }} > {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from > struct<> to > struct,values:array>. > The type path of the target object is: > - field (class: "org.apache.spark.ml.linalg.DenseVector", name: > "denseVector") > You can either add an explicit cast to the input data or choose a higher > precision type of the field in the target object}} > {{{noformat} > I have tried to use _double_ instead of dense vector and it works just fine, > but fails on using the dense vector with encoders bean. > > StackOverflow link for the issue: > [https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582664#comment-17582664 ] Ivan Sadikov commented on SPARK-40169: -- I would like to work on it as it was my responsibility to come up with a proper fix for the original issue :). I will sync with [~chaosun] offline and we will come up with the strategy to address the problem properly. > Fix the issue with Parquet column index and predicate pushdown in Data source > V1 > > > Key: SPARK-40169 > URL: https://issues.apache.org/jira/browse/SPARK-40169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.3.1, 3.2.3 >Reporter: Ivan Sadikov >Priority: Major > > This is a follow for SPARK-39833. In > [https://github.com/apache/spark/pull/37419,] we disabled column index for > Parquet due to correctness issues that we found when filtering data on the > partition column overlapping with data schema. > > This ticket is for permanent and thorough fix for the issue and re-enablement > of the column index. See more details in the PR linked above. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1
[ https://issues.apache.org/jira/browse/SPARK-40169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ivan Sadikov updated SPARK-40169: - Description: This is a follow for SPARK-39833. In [https://github.com/apache/spark/pull/37419,] we disabled column index for Parquet due to correctness issues that we found when filtering data on the partition column overlapping with data schema. This ticket is for permanent and thorough fix for the issue and re-enablement of the column index. See more details in the PR linked above. was: This is a follow for SPARK-39833. We disabled > Fix the issue with Parquet column index and predicate pushdown in Data source > V1 > > > Key: SPARK-40169 > URL: https://issues.apache.org/jira/browse/SPARK-40169 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.3.1, 3.2.3 >Reporter: Ivan Sadikov >Priority: Major > > This is a follow for SPARK-39833. In > [https://github.com/apache/spark/pull/37419,] we disabled column index for > Parquet due to correctness issues that we found when filtering data on the > partition column overlapping with data schema. > > This ticket is for permanent and thorough fix for the issue and re-enablement > of the column index. See more details in the PR linked above. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40169) Fix the issue with Parquet column index and predicate pushdown in Data source V1
Ivan Sadikov created SPARK-40169: Summary: Fix the issue with Parquet column index and predicate pushdown in Data source V1 Key: SPARK-40169 URL: https://issues.apache.org/jira/browse/SPARK-40169 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0, 3.3.1, 3.2.3 Reporter: Ivan Sadikov This is a follow for SPARK-39833. We disabled -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error
[ https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582643#comment-17582643 ] Serge Rielau commented on SPARK-40156: -- + [~maxgekk] For new function. we should be using the new error framework: [https://github.com/apache/spark/blob/master/core/src/main/resources/error/error-classes.json] > url_decode() exposes a Java error > - > > Key: SPARK-40156 > URL: https://issues.apache.org/jira/browse/SPARK-40156 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Major > > Given a badly encode string Spark returns a Java error. > It should the return an ERROR_CLASS > spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org'); > 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT > url_decode('http%3A%2F%2spark.apache.org')] > java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in > escape (%) pattern - Error at index 1 in: "2s" > at java.base/java.net.URLDecoder.decode(URLDecoder.java:232) > at java.base/java.net.URLDecoder.decode(URLDecoder.java:142) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113) > at > org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner
[ https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40168: Assignee: Apache Spark > Handle FileNotFoundException when shuffle file deleted in decommissioner > > > Key: SPARK-40168 > URL: https://issues.apache.org/jira/browse/SPARK-40168 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Zhongwei Zhu >Assignee: Apache Spark >Priority: Major > > When shuffle files not found, decommissioner will handles IOException, but > the real exception is as below: > {code:java} > 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during > migrating migrate_shuffle_1_356 > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at > org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111) > at scala.collection.immutable.List.foreach(List.scala:431) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to > /10.240.2.65:43481: java.io.FileNotFoundException: > /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index > (No such file or directory) > at > org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392) > at > org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) > at > io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) > at > io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) > at > io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) > at > io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723) > at > io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308) > at > io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) > at > io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) > at > io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) > at >
[jira] [Assigned] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner
[ https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40168: Assignee: (was: Apache Spark) > Handle FileNotFoundException when shuffle file deleted in decommissioner > > > Key: SPARK-40168 > URL: https://issues.apache.org/jira/browse/SPARK-40168 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Zhongwei Zhu >Priority: Major > > When shuffle files not found, decommissioner will handles IOException, but > the real exception is as below: > {code:java} > 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during > migrating migrate_shuffle_1_356 > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at > org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111) > at scala.collection.immutable.List.foreach(List.scala:431) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to > /10.240.2.65:43481: java.io.FileNotFoundException: > /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index > (No such file or directory) > at > org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392) > at > org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) > at > io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) > at > io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) > at > io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) > at > io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723) > at > io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308) > at > io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) > at > io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) > at > io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) > at >
[jira] [Commented] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner
[ https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582634#comment-17582634 ] Apache Spark commented on SPARK-40168: -- User 'warrenzhu25' has created a pull request for this issue: https://github.com/apache/spark/pull/37603 > Handle FileNotFoundException when shuffle file deleted in decommissioner > > > Key: SPARK-40168 > URL: https://issues.apache.org/jira/browse/SPARK-40168 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Zhongwei Zhu >Priority: Major > > When shuffle files not found, decommissioner will handles IOException, but > the real exception is as below: > {code:java} > 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during > migrating migrate_shuffle_1_356 > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) > at > org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111) > at scala.collection.immutable.List.foreach(List.scala:431) > at > org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to > /10.240.2.65:43481: java.io.FileNotFoundException: > /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index > (No such file or directory) > at > org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392) > at > org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369) > at > io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) > at > io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) > at > io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) > at > io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) > at > io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) > at > io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) > at > io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) > at > io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723) > at > io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308) > at > io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) > at > io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) > at > io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) > at >
[jira] [Updated] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner
[ https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongwei Zhu updated SPARK-40168: - Description: When shuffle files not found, decommissioner will handles IOException, but the real exception is as below: {code:java} 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during migrating migrate_shuffle_1_356 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to /10.240.2.65:43481: java.io.FileNotFoundException: /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index (No such file or directory) at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) at io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723) at io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308) at io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735) at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ... 1 more Caused by: java.io.FileNotFoundException:
[jira] [Updated] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner
[ https://issues.apache.org/jira/browse/SPARK-40168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhongwei Zhu updated SPARK-40168: - Description: {code:java} // Some comments here public String getFoo() { return foo; } {code} {code:java} // code placeholder {code} When shuffle files not found, decommissioner will handles IOException, but the real exception is as below: ``` 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during migrating migrate_shuffle_1_356 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to /10.240.2.65:43481: java.io.FileNotFoundException: /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index (No such file or directory) at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) at io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723) at io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308) at io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735) at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ...
[jira] [Created] (SPARK-40168) Handle FileNotFoundException when shuffle file deleted in decommissioner
Zhongwei Zhu created SPARK-40168: Summary: Handle FileNotFoundException when shuffle file deleted in decommissioner Key: SPARK-40168 URL: https://issues.apache.org/jira/browse/SPARK-40168 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Zhongwei Zhu When shuffle files not found, decommissioner will handles IOException, but the real exception is as below: ``` 22/08/10 18:05:34 ERROR BlockManagerDecommissioner: Error occurred during migrating migrate_shuffle_1_356 org.apache.spark.SparkException: Exception thrown in awaitResult: at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301) at org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4(BlockManagerDecommissioner.scala:120) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$4$adapted(BlockManagerDecommissioner.scala:111) at scala.collection.immutable.List.foreach(List.scala:431) at org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:111) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.io.IOException: Failed to send RPC RPC 5697756267528635203 to /10.240.2.65:43481: java.io.FileNotFoundException: /tmp/blockmgr-98a2a29a-5231-4fed-a82e-6bc0531ad407/15/shuffle_1_356_0.index (No such file or directory) at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:392) at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:369) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:578) at io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:571) at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:550) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:491) at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:616) at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:609) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:117) at io.netty.util.internal.PromiseNotificationUtil.tryFailure(PromiseNotificationUtil.java:64) at io.netty.channel.ChannelOutboundBuffer.safeFail(ChannelOutboundBuffer.java:723) at io.netty.channel.ChannelOutboundBuffer.remove0(ChannelOutboundBuffer.java:308) at io.netty.channel.ChannelOutboundBuffer.failFlushed(ChannelOutboundBuffer.java:660) at io.netty.channel.AbstractChannel$AbstractUnsafe.close(AbstractChannel.java:735) at io.netty.channel.AbstractChannel$AbstractUnsafe.handleWriteError(AbstractChannel.java:950) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:933) at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:354) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:895) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1372) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:742) at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:728) at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:127) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:750) at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:765) at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1071) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:469) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at
[jira] [Resolved] (SPARK-40152) Codegen compilation error when using split_part
[ https://issues.apache.org/jira/browse/SPARK-40152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40152. -- Fix Version/s: 3.4.0 3.3.1 Assignee: Yuming Wang Resolution: Fixed Resolved by https://github.com/apache/spark/pull/37589 > Codegen compilation error when using split_part > --- > > Key: SPARK-40152 > URL: https://issues.apache.org/jira/browse/SPARK-40152 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > The following query throws an error: > {noformat} > create or replace temp view v1 as > select * from values > ('11.12.13', '.', 3) > as v1(col1, col2, col3); > cache table v1; > SELECT split_part(col1, col2, col3) > from v1; > {noformat} > The error is: > {noformat} > 22/08/19 14:25:14 ERROR CodeGenerator: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 42, Column 1: Expression "project_isNull_0 = false" is not a type > at > org.codehaus.janino.Java$Atom.toTypeOrCompileException(Java.java:3934) > at org.codehaus.janino.Parser.parseBlockStatement(Parser.java:1887) > at org.codehaus.janino.Parser.parseBlockStatements(Parser.java:1811) > at org.codehaus.janino.Parser.parseBlock(Parser.java:1792) > at > {noformat} > In the end, {{split_part}} does successfully execute, although in interpreted > mode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
[ https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40163. -- Fix Version/s: 3.4.0 Assignee: seunggabi Resolution: Fixed Resolved by https://github.com/apache/spark/pull/37478 > [SPARK][SQL] feat: SparkSession.confing(Map) > > > Key: SPARK-40163 > URL: https://issues.apache.org/jira/browse/SPARK-40163 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: seunggabi >Assignee: seunggabi >Priority: Trivial > Fix For: 3.4.0 > > > [https://github.com/apache/spark/pull/37478] > - as-is > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > var b = builder > map.keys.forEach { > val k = it > val v = map[k] > b = when (v) { > is Long -> b.config(k, v) > is String -> b.config(k, v) > is Double -> b.config(k, v) > is Boolean -> b.config(k, v) > else -> b > } > } > return b > } > } {code} > - to-be > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > return b.config(map) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40167) Add array_sort(column, comparator) to SparkR
[ https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40167: Assignee: (was: Apache Spark) > Add array_sort(column, comparator) to SparkR > > > Key: SPARK-40167 > URL: https://issues.apache.org/jira/browse/SPARK-40167 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in R as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40167) Add array_sort(column, comparator) to SparkR
[ https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40167: Assignee: Apache Spark > Add array_sort(column, comparator) to SparkR > > > Key: SPARK-40167 > URL: https://issues.apache.org/jira/browse/SPARK-40167 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in R as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40167) Add array_sort(column, comparator) to SparkR
[ https://issues.apache.org/jira/browse/SPARK-40167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582570#comment-17582570 ] Apache Spark commented on SPARK-40167: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/37600 > Add array_sort(column, comparator) to SparkR > > > Key: SPARK-40167 > URL: https://issues.apache.org/jira/browse/SPARK-40167 > Project: Spark > Issue Type: Improvement > Components: R, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in R as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40167) Add array_sort(column, comparator) to SparkR
Maciej Szymkiewicz created SPARK-40167: -- Summary: Add array_sort(column, comparator) to SparkR Key: SPARK-40167 URL: https://issues.apache.org/jira/browse/SPARK-40167 Project: Spark Issue Type: Improvement Components: R, SQL Affects Versions: 3.4.0 Reporter: Maciej Szymkiewicz SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be available in R as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number
[ https://issues.apache.org/jira/browse/SPARK-40164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582568#comment-17582568 ] Apache Spark commented on SPARK-40164: -- User 'wankunde' has created a pull request for this issue: https://github.com/apache/spark/pull/37602 > The partitionSpec should be distinct keys after filter one row of row_number > > > Key: SPARK-40164 > URL: https://issues.apache.org/jira/browse/SPARK-40164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wan Kun >Priority: Minor > > For query > {code:sql} > SELECT * > FROM ( > SELECT *, row_number() over(partition by key order by value) rn > FROM testData t > ) t1 > WHERE rn=1 > {code} > column *key* will be distinct -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number
[ https://issues.apache.org/jira/browse/SPARK-40164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40164: Assignee: (was: Apache Spark) > The partitionSpec should be distinct keys after filter one row of row_number > > > Key: SPARK-40164 > URL: https://issues.apache.org/jira/browse/SPARK-40164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wan Kun >Priority: Minor > > For query > {code:sql} > SELECT * > FROM ( > SELECT *, row_number() over(partition by key order by value) rn > FROM testData t > ) t1 > WHERE rn=1 > {code} > column *key* will be distinct -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number
[ https://issues.apache.org/jira/browse/SPARK-40164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40164: Assignee: Apache Spark > The partitionSpec should be distinct keys after filter one row of row_number > > > Key: SPARK-40164 > URL: https://issues.apache.org/jira/browse/SPARK-40164 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Wan Kun >Assignee: Apache Spark >Priority: Minor > > For query > {code:sql} > SELECT * > FROM ( > SELECT *, row_number() over(partition by key order by value) rn > FROM testData t > ) t1 > WHERE rn=1 > {code} > column *key* will be distinct -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40150) Dynamically merge File Splits
[ https://issues.apache.org/jira/browse/SPARK-40150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582564#comment-17582564 ] Apache Spark commented on SPARK-40150: -- User 'jackylee-ch' has created a pull request for this issue: https://github.com/apache/spark/pull/37601 > Dynamically merge File Splits > - > > Key: SPARK-40150 > URL: https://issues.apache.org/jira/browse/SPARK-40150 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jackey Lee >Priority: Major > > We currently use maxPartitionBytes and minPartitionNum to split files and use > openCostInBytes to merge file splits. But these are static configurations, > and the same configuration does not work in all scenarios. > This PR attempts to dynamically merge file splits, taking into the > concurrency while processing more data in one task. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40150) Dynamically merge File Splits
[ https://issues.apache.org/jira/browse/SPARK-40150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40150: Assignee: (was: Apache Spark) > Dynamically merge File Splits > - > > Key: SPARK-40150 > URL: https://issues.apache.org/jira/browse/SPARK-40150 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jackey Lee >Priority: Major > > We currently use maxPartitionBytes and minPartitionNum to split files and use > openCostInBytes to merge file splits. But these are static configurations, > and the same configuration does not work in all scenarios. > This PR attempts to dynamically merge file splits, taking into the > concurrency while processing more data in one task. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40150) Dynamically merge File Splits
[ https://issues.apache.org/jira/browse/SPARK-40150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40150: Assignee: Apache Spark > Dynamically merge File Splits > - > > Key: SPARK-40150 > URL: https://issues.apache.org/jira/browse/SPARK-40150 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jackey Lee >Assignee: Apache Spark >Priority: Major > > We currently use maxPartitionBytes and minPartitionNum to split files and use > openCostInBytes to merge file splits. But these are static configurations, > and the same configuration does not work in all scenarios. > This PR attempts to dynamically merge file splits, taking into the > concurrency while processing more data in one task. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31
[ https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40162. -- Assignee: BingKun Pan Resolution: Fixed Resolved by https://github.com/apache/spark/pull/37597 > Upgrade RoaringBitmap from 0.9.30 to 0.9.31 > --- > > Key: SPARK-40162 > URL: https://issues.apache.org/jira/browse/SPARK-40162 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31 > [simplify BatchIterators, fix bug in advanceIfNeeded > (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] > [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40165. -- Assignee: BingKun Pan Resolution: Fixed Resolved by https://github.com/apache/spark/pull/37598 > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Trivial > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-40165: - Priority: Trivial (was: Minor) > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Trivial > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
[ https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40163: Assignee: (was: Apache Spark) > [SPARK][SQL] feat: SparkSession.confing(Map) > > > Key: SPARK-40163 > URL: https://issues.apache.org/jira/browse/SPARK-40163 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: seunggabi >Priority: Trivial > > [https://github.com/apache/spark/pull/37478] > - as-is > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > var b = builder > map.keys.forEach { > val k = it > val v = map[k] > b = when (v) { > is Long -> b.config(k, v) > is String -> b.config(k, v) > is Double -> b.config(k, v) > is Boolean -> b.config(k, v) > else -> b > } > } > return b > } > } {code} > - to-be > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > return b.config(map) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
[ https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40163: Assignee: Apache Spark > [SPARK][SQL] feat: SparkSession.confing(Map) > > > Key: SPARK-40163 > URL: https://issues.apache.org/jira/browse/SPARK-40163 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: seunggabi >Assignee: Apache Spark >Priority: Trivial > > [https://github.com/apache/spark/pull/37478] > - as-is > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > var b = builder > map.keys.forEach { > val k = it > val v = map[k] > b = when (v) { > is Long -> b.config(k, v) > is String -> b.config(k, v) > is Double -> b.config(k, v) > is Boolean -> b.config(k, v) > else -> b > } > } > return b > } > } {code} > - to-be > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > return b.config(map) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
[ https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582549#comment-17582549 ] Apache Spark commented on SPARK-40163: -- User 'seunggabi' has created a pull request for this issue: https://github.com/apache/spark/pull/37478 > [SPARK][SQL] feat: SparkSession.confing(Map) > > > Key: SPARK-40163 > URL: https://issues.apache.org/jira/browse/SPARK-40163 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: seunggabi >Priority: Trivial > > [https://github.com/apache/spark/pull/37478] > - as-is > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > var b = builder > map.keys.forEach { > val k = it > val v = map[k] > b = when (v) { > is Long -> b.config(k, v) > is String -> b.config(k, v) > is Double -> b.config(k, v) > is Boolean -> b.config(k, v) > else -> b > } > } > return b > } > } {code} > - to-be > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > return b.config(map) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40166) Add array_sort(column, comparator) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40166: Assignee: (was: Apache Spark) > Add array_sort(column, comparator) to PySpark > - > > Key: SPARK-40166 > URL: https://issues.apache.org/jira/browse/SPARK-40166 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in Python as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40166) Add array_sort(column, comparator) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582544#comment-17582544 ] Apache Spark commented on SPARK-40166: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/37600 > Add array_sort(column, comparator) to PySpark > - > > Key: SPARK-40166 > URL: https://issues.apache.org/jira/browse/SPARK-40166 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in Python as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40166) Add array_sort(column, comparator) to PySpark
[ https://issues.apache.org/jira/browse/SPARK-40166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40166: Assignee: Apache Spark > Add array_sort(column, comparator) to PySpark > - > > Key: SPARK-40166 > URL: https://issues.apache.org/jira/browse/SPARK-40166 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Minor > > SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be > available in Python as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40166) Add array_sort(column, comparator) to PySpark
Maciej Szymkiewicz created SPARK-40166: -- Summary: Add array_sort(column, comparator) to PySpark Key: SPARK-40166 URL: https://issues.apache.org/jira/browse/SPARK-40166 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 3.4.0 Reporter: Maciej Szymkiewicz SPARK-39925 exposed array_sort(column, comparator) on JVM. It should be available in Python as well. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582535#comment-17582535 ] Qian Sun commented on SPARK-40148: -- [~hyukjin.kwon] OK, I'll create a follow-up PR to do these :) > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40148 > URL: https://issues.apache.org/jira/browse/SPARK-40148 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582531#comment-17582531 ] Apache Spark commented on SPARK-40165: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37598 > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-40165: Description: Include: * 1.scalacheck (from 1.15.4 to 1.16.0) * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > Include: > * 1.scalacheck (from 1.15.4 to 1.16.0) > * 2.maven-surefire-plugin (from 3.0.0-M5 to 3.0.0-M7) > * 3.maven-dependency-plugin (from 3.1.1 to 3.3.0) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40165: Assignee: (was: Apache Spark) > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40165) Update test plugins to latest versions
[ https://issues.apache.org/jira/browse/SPARK-40165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40165: Assignee: Apache Spark > Update test plugins to latest versions > -- > > Key: SPARK-40165 > URL: https://issues.apache.org/jira/browse/SPARK-40165 > Project: Spark > Issue Type: Improvement > Components: Build, Tests >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40165) Update test plugins to latest versions
BingKun Pan created SPARK-40165: --- Summary: Update test plugins to latest versions Key: SPARK-40165 URL: https://issues.apache.org/jira/browse/SPARK-40165 Project: Spark Issue Type: Improvement Components: Build, Tests Affects Versions: 3.4.0 Reporter: BingKun Pan Fix For: 3.4.0 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39833. -- Fix Version/s: 3.3.1 3.2.3 3.4.0 Resolution: Fixed Issue resolved by pull request 37419 [https://github.com/apache/spark/pull/37419] > Filtered parquet data frame count() and show() produce inconsistent results > when spark.sql.parquet.filterPushdown is true > - > > Key: SPARK-39833 > URL: https://issues.apache.org/jira/browse/SPARK-39833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0 >Reporter: Michael Allman >Assignee: Ivan Sadikov >Priority: Major > Labels: correctness > Fix For: 3.3.1, 3.2.3, 3.4.0 > > > One of our data scientists discovered a problem wherein a data frame > `.show()` call printed non-empty results, but `.count()` printed 0. I've > narrowed the issue to a small, reproducible test case which exhibits this > aberrant behavior. In pyspark, run the following code: > {code:python} > from pyspark.sql.types import * > parquet_pushdown_bug_df = spark.createDataFrame([{"COL0": int(0)}], > schema=StructType(fields=[StructField("COL0",IntegerType(),True)])) > parquet_pushdown_bug_df.repartition(1).write.mode("overwrite").parquet("parquet_pushdown_bug/col0=0/parquet_pushdown_bug.parquet") > reread_parquet_pushdown_bug_df = spark.read.parquet("parquet_pushdown_bug") > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > print(reread_parquet_pushdown_bug_df.filter("col0 = 0").count()) > {code} > In my usage, this prints a data frame with 1 row and a count of 0. However, > disabling `spark.sql.parquet.filterPushdown` produces consistent results: > {code:python} > spark.conf.set("spark.sql.parquet.filterPushdown", False) > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > reread_parquet_pushdown_bug_df.filter("col0 = 0").count() > {code} > This will print the same data frame, however it will print a count of 1. The > key to triggering this bug is not just enabling > `spark.sql.parquet.filterPushdown` (which is enabled by default). The case of > the column in the data frame (before writing) must differ from the case of > the partition column in the file path, i.e. COL0 versus col0 or col0 versus > COL0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39833) Filtered parquet data frame count() and show() produce inconsistent results when spark.sql.parquet.filterPushdown is true
[ https://issues.apache.org/jira/browse/SPARK-39833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39833: Assignee: Ivan Sadikov > Filtered parquet data frame count() and show() produce inconsistent results > when spark.sql.parquet.filterPushdown is true > - > > Key: SPARK-39833 > URL: https://issues.apache.org/jira/browse/SPARK-39833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0 >Reporter: Michael Allman >Assignee: Ivan Sadikov >Priority: Major > Labels: correctness > > One of our data scientists discovered a problem wherein a data frame > `.show()` call printed non-empty results, but `.count()` printed 0. I've > narrowed the issue to a small, reproducible test case which exhibits this > aberrant behavior. In pyspark, run the following code: > {code:python} > from pyspark.sql.types import * > parquet_pushdown_bug_df = spark.createDataFrame([{"COL0": int(0)}], > schema=StructType(fields=[StructField("COL0",IntegerType(),True)])) > parquet_pushdown_bug_df.repartition(1).write.mode("overwrite").parquet("parquet_pushdown_bug/col0=0/parquet_pushdown_bug.parquet") > reread_parquet_pushdown_bug_df = spark.read.parquet("parquet_pushdown_bug") > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > print(reread_parquet_pushdown_bug_df.filter("col0 = 0").count()) > {code} > In my usage, this prints a data frame with 1 row and a count of 0. However, > disabling `spark.sql.parquet.filterPushdown` produces consistent results: > {code:python} > spark.conf.set("spark.sql.parquet.filterPushdown", False) > reread_parquet_pushdown_bug_df.filter("col0 = 0").show() > reread_parquet_pushdown_bug_df.filter("col0 = 0").count() > {code} > This will print the same data frame, however it will print a count of 1. The > key to triggering this bug is not just enabling > `spark.sql.parquet.filterPushdown` (which is enabled by default). The case of > the column in the data frame (before writing) must differ from the case of > the partition column in the file path, i.e. COL0 versus col0 or col0 versus > COL0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40164) The partitionSpec should be distinct keys after filter one row of row_number
Wan Kun created SPARK-40164: --- Summary: The partitionSpec should be distinct keys after filter one row of row_number Key: SPARK-40164 URL: https://issues.apache.org/jira/browse/SPARK-40164 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Wan Kun For query {code:sql} SELECT * FROM ( SELECT *, row_number() over(partition by key order by value) rn FROM testData t ) t1 WHERE rn=1 {code} column *key* will be distinct -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40148) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582517#comment-17582517 ] Hyukjin Kwon commented on SPARK-40148: -- Oops, my bad. Resolving. I just noticed that we don't have examples for several API such as rowsBetween. It would be good to have. feel free to create a follow up > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40148 > URL: https://issues.apache.org/jira/browse/SPARK-40148 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40148) Make pyspark.sql.window examples self-contained
[ https://issues.apache.org/jira/browse/SPARK-40148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40148. -- Resolution: Duplicate > Make pyspark.sql.window examples self-contained > --- > > Key: SPARK-40148 > URL: https://issues.apache.org/jira/browse/SPARK-40148 > Project: Spark > Issue Type: Sub-task > Components: PySpark, SQL >Affects Versions: 3.4.0 >Reporter: Hyukjin Kwon >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40161) Make Series.mode apply PandasMode
[ https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40161: Assignee: Ruifeng Zheng > Make Series.mode apply PandasMode > - > > Key: SPARK-40161 > URL: https://issues.apache.org/jira/browse/SPARK-40161 > Project: Spark > Issue Type: Improvement > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40161) Make Series.mode apply PandasMode
[ https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40161. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37596 [https://github.com/apache/spark/pull/37596] > Make Series.mode apply PandasMode > - > > Key: SPARK-40161 > URL: https://issues.apache.org/jira/browse/SPARK-40161 > Project: Spark > Issue Type: Improvement > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39925) Add array_sort(column, comparator) overload to DataFrame operations
[ https://issues.apache.org/jira/browse/SPARK-39925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39925. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 37361 [https://github.com/apache/spark/pull/37361] > Add array_sort(column, comparator) overload to DataFrame operations > --- > > Key: SPARK-39925 > URL: https://issues.apache.org/jira/browse/SPARK-39925 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Brandon Dahler >Assignee: Brandon Dahler >Priority: Minor > Fix For: 3.4.0 > > > The ability to use {{array_sort with a comparator was added in SPARK-29020; > however, the new signature wasn't made available to the DataFrame operations > API.}} > > Proposed new signature: > {code:java} > package org.apache.spark.sql > object functions { > ... > def array_sort(e: Column, comparator: (Column, Column) => Column): Column > ... > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39925) Add array_sort(column, comparator) overload to DataFrame operations
[ https://issues.apache.org/jira/browse/SPARK-39925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39925: Assignee: Brandon Dahler > Add array_sort(column, comparator) overload to DataFrame operations > --- > > Key: SPARK-39925 > URL: https://issues.apache.org/jira/browse/SPARK-39925 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Brandon Dahler >Assignee: Brandon Dahler >Priority: Minor > > The ability to use {{array_sort with a comparator was added in SPARK-29020; > however, the new signature wasn't made available to the DataFrame operations > API.}} > > Proposed new signature: > {code:java} > package org.apache.spark.sql > object functions { > ... > def array_sort(e: Column, comparator: (Column, Column) => Column): Column > ... > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
[ https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] seunggabi updated SPARK-40163: -- Affects Version/s: 3.3.0 (was: 3.2.2) > [SPARK][SQL] feat: SparkSession.confing(Map) > > > Key: SPARK-40163 > URL: https://issues.apache.org/jira/browse/SPARK-40163 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: seunggabi >Priority: Trivial > > [https://github.com/apache/spark/pull/37478] > - as-is > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > var b = builder > map.keys.forEach { > val k = it > val v = map[k] > b = when (v) { > is Long -> b.config(k, v) > is String -> b.config(k, v) > is Double -> b.config(k, v) > is Boolean -> b.config(k, v) > else -> b > } > } > return b > } > } {code} > - to-be > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > return b.config(map) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
[ https://issues.apache.org/jira/browse/SPARK-40163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] seunggabi updated SPARK-40163: -- Description: [https://github.com/apache/spark/pull/37478] - as-is {code:java} private fun config(builder: SparkSession.Builder): SparkSession.Builder { val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) var b = builder map.keys.forEach { val k = it val v = map[k] b = when (v) { is Long -> b.config(k, v) is String -> b.config(k, v) is Double -> b.config(k, v) is Boolean -> b.config(k, v) else -> b } } return b } } {code} - to-be {code:java} private fun config(builder: SparkSession.Builder): SparkSession.Builder { val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) return b.config(map) } } {code} was:https://github.com/apache/spark/pull/37478 !image-2022-08-21-17-45-36-461.png! > [SPARK][SQL] feat: SparkSession.confing(Map) > > > Key: SPARK-40163 > URL: https://issues.apache.org/jira/browse/SPARK-40163 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: seunggabi >Priority: Trivial > > [https://github.com/apache/spark/pull/37478] > - as-is > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > var b = builder > map.keys.forEach { > val k = it > val v = map[k] > b = when (v) { > is Long -> b.config(k, v) > is String -> b.config(k, v) > is Double -> b.config(k, v) > is Boolean -> b.config(k, v) > else -> b > } > } > return b > } > } {code} > - to-be > {code:java} > private fun config(builder: SparkSession.Builder): SparkSession.Builder { > val map = YamlUtils.read(this::class.java, "spark", Extension.YAML) > return b.config(map) > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40163) [SPARK][SQL] feat: SparkSession.confing(Map)
seunggabi created SPARK-40163: - Summary: [SPARK][SQL] feat: SparkSession.confing(Map) Key: SPARK-40163 URL: https://issues.apache.org/jira/browse/SPARK-40163 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.2.2 Reporter: seunggabi https://github.com/apache/spark/pull/37478 !image-2022-08-21-17-45-36-461.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31
[ https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582489#comment-17582489 ] Apache Spark commented on SPARK-40162: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37597 > Upgrade RoaringBitmap from 0.9.30 to 0.9.31 > --- > > Key: SPARK-40162 > URL: https://issues.apache.org/jira/browse/SPARK-40162 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31 > [simplify BatchIterators, fix bug in advanceIfNeeded > (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] > [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31
[ https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40162: Assignee: Apache Spark > Upgrade RoaringBitmap from 0.9.30 to 0.9.31 > --- > > Key: SPARK-40162 > URL: https://issues.apache.org/jira/browse/SPARK-40162 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31 > [simplify BatchIterators, fix bug in advanceIfNeeded > (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] > [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31
[ https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40162: Assignee: (was: Apache Spark) > Upgrade RoaringBitmap from 0.9.30 to 0.9.31 > --- > > Key: SPARK-40162 > URL: https://issues.apache.org/jira/browse/SPARK-40162 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31 > [simplify BatchIterators, fix bug in advanceIfNeeded > (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] > [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31
[ https://issues.apache.org/jira/browse/SPARK-40162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582488#comment-17582488 ] Apache Spark commented on SPARK-40162: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/37597 > Upgrade RoaringBitmap from 0.9.30 to 0.9.31 > --- > > Key: SPARK-40162 > URL: https://issues.apache.org/jira/browse/SPARK-40162 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31 > [simplify BatchIterators, fix bug in advanceIfNeeded > (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] > [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40162) Upgrade RoaringBitmap from 0.9.30 to 0.9.31
BingKun Pan created SPARK-40162: --- Summary: Upgrade RoaringBitmap from 0.9.30 to 0.9.31 Key: SPARK-40162 URL: https://issues.apache.org/jira/browse/SPARK-40162 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.4.0 Reporter: BingKun Pan Fix For: 3.4.0 https://github.com/RoaringBitmap/RoaringBitmap/compare/0.9.30...0.9.31 [simplify BatchIterators, fix bug in advanceIfNeeded (|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] [#573|https://github.com/RoaringBitmap/RoaringBitmap/pull/573] [)|https://github.com/RoaringBitmap/RoaringBitmap/commit/56b1bba400e9f91c682648fa90b890f3a0bb561c] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40161) Make Series.mode apply PandasMode
[ https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40161: Assignee: Apache Spark > Make Series.mode apply PandasMode > - > > Key: SPARK-40161 > URL: https://issues.apache.org/jira/browse/SPARK-40161 > Project: Spark > Issue Type: Improvement > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40151) Fix return type for new median(interval) function
[ https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582482#comment-17582482 ] Apache Spark commented on SPARK-40151: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/37595 > Fix return type for new median(interval) function > -- > > Key: SPARK-40151 > URL: https://issues.apache.org/jira/browse/SPARK-40151 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Critical > > median() right now returns an interval of the same type as the input. > We should instead match mean and avg(): > The result type is computed as for the arguments: > - year-month interval: The result is an `INTERVAL YEAR TO MONTH`. > - day-time interval: The result is an `INTERVAL DAY TO SECOND`. > - In all other cases the result is a DOUBLE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40161) Make Series.mode apply PandasMode
[ https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40161: Assignee: (was: Apache Spark) > Make Series.mode apply PandasMode > - > > Key: SPARK-40161 > URL: https://issues.apache.org/jira/browse/SPARK-40161 > Project: Spark > Issue Type: Improvement > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40161) Make Series.mode apply PandasMode
[ https://issues.apache.org/jira/browse/SPARK-40161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582483#comment-17582483 ] Apache Spark commented on SPARK-40161: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/37596 > Make Series.mode apply PandasMode > - > > Key: SPARK-40161 > URL: https://issues.apache.org/jira/browse/SPARK-40161 > Project: Spark > Issue Type: Improvement > Components: ps >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40151) Fix return type for new median(interval) function
[ https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582481#comment-17582481 ] Apache Spark commented on SPARK-40151: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/37595 > Fix return type for new median(interval) function > -- > > Key: SPARK-40151 > URL: https://issues.apache.org/jira/browse/SPARK-40151 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Critical > > median() right now returns an interval of the same type as the input. > We should instead match mean and avg(): > The result type is computed as for the arguments: > - year-month interval: The result is an `INTERVAL YEAR TO MONTH`. > - day-time interval: The result is an `INTERVAL DAY TO SECOND`. > - In all other cases the result is a DOUBLE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40151) Fix return type for new median(interval) function
[ https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40151: Assignee: (was: Apache Spark) > Fix return type for new median(interval) function > -- > > Key: SPARK-40151 > URL: https://issues.apache.org/jira/browse/SPARK-40151 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Priority: Critical > > median() right now returns an interval of the same type as the input. > We should instead match mean and avg(): > The result type is computed as for the arguments: > - year-month interval: The result is an `INTERVAL YEAR TO MONTH`. > - day-time interval: The result is an `INTERVAL DAY TO SECOND`. > - In all other cases the result is a DOUBLE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40151) Fix return type for new median(interval) function
[ https://issues.apache.org/jira/browse/SPARK-40151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40151: Assignee: Apache Spark > Fix return type for new median(interval) function > -- > > Key: SPARK-40151 > URL: https://issues.apache.org/jira/browse/SPARK-40151 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Apache Spark >Priority: Critical > > median() right now returns an interval of the same type as the input. > We should instead match mean and avg(): > The result type is computed as for the arguments: > - year-month interval: The result is an `INTERVAL YEAR TO MONTH`. > - day-time interval: The result is an `INTERVAL DAY TO SECOND`. > - In all other cases the result is a DOUBLE. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40161) Make Series.mode apply PandasMode
Ruifeng Zheng created SPARK-40161: - Summary: Make Series.mode apply PandasMode Key: SPARK-40161 URL: https://issues.apache.org/jira/browse/SPARK-40161 Project: Spark Issue Type: Improvement Components: ps Affects Versions: 3.4.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org