[jira] [Commented] (SPARK-36440) Spark3 fails to read hive table with mixed format
[ https://issues.apache.org/jira/browse/SPARK-36440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394529#comment-17394529 ] Chao Sun commented on SPARK-36440: -- Hmm really? Spark 2.x support this? I'm not sure why Spark is still expected to work in this case since the serde is changed to Parquet but the underlying data file is in ORC. It seems like an error that users should avoid. > Spark3 fails to read hive table with mixed format > - > > Key: SPARK-36440 > URL: https://issues.apache.org/jira/browse/SPARK-36440 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.1.1, 3.1.2 >Reporter: Jason Xu >Priority: Major > > Spark3 fails to read hive table with mixed format with hive Serde, this is a > regression compares to Spark 2.4. > Replication steps : > 1. In spark 3 (3.0 or 3.1) spark shell: > {code:java} > scala> spark.sql("create table tmp.test_table (id int, name string) > partitioned by (pt int) stored as rcfile") > scala> spark.sql("insert into tmp.test_table (pt = 1) values (1, 'Alice'), > (2, 'Bob')") > {code} > 2. Run hive command to change table file format (from RCFile to Parquet). > {code:java} > hive (default)> alter table set tmp.test_table fileformat Parquet; > {code} > 3. Try to read partition (in RCFile format) with hive serde using Spark shell: > {code:java} > scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false") > scala> spark.sql("select * from tmp.test_table where pt=1").show{code} > Exception: (anonymized file path with ) > {code:java} > Caused by: java.lang.RuntimeException: > s3a:///data/part-0-22112178-5dd7-4065-89d7-2ee550296909-c000 is not > a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [5, > 96, 1, -33] > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:524) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) > at > org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) > at > org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) > at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:286) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:285) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > {code} > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36162) extractJoinKeysWithColStats support EqualNullSafe
[ https://issues.apache.org/jira/browse/SPARK-36162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394495#comment-17394495 ] Apache Spark commented on SPARK-36162: -- User 'changvvb' has created a pull request for this issue: https://github.com/apache/spark/pull/33662 > extractJoinKeysWithColStats support EqualNullSafe > - > > Key: SPARK-36162 > URL: https://issues.apache.org/jira/browse/SPARK-36162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > sql("select * from date_dim join item on d_date_sk = > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 = i_item_sk#28), Statistics(sizeInBytes=1.0 B, > rowCount=0) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > sql("select * from date_dim join item on d_date_sk <=> > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 <=> i_item_sk#28), Statistics(sizeInBytes=9.2 TiB, > rowCount=1.49E+10) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L329-L339 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36162) extractJoinKeysWithColStats support EqualNullSafe
[ https://issues.apache.org/jira/browse/SPARK-36162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36162: Assignee: Apache Spark > extractJoinKeysWithColStats support EqualNullSafe > - > > Key: SPARK-36162 > URL: https://issues.apache.org/jira/browse/SPARK-36162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > sql("select * from date_dim join item on d_date_sk = > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 = i_item_sk#28), Statistics(sizeInBytes=1.0 B, > rowCount=0) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > sql("select * from date_dim join item on d_date_sk <=> > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 <=> i_item_sk#28), Statistics(sizeInBytes=9.2 TiB, > rowCount=1.49E+10) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L329-L339 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36162) extractJoinKeysWithColStats support EqualNullSafe
[ https://issues.apache.org/jira/browse/SPARK-36162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36162: Assignee: (was: Apache Spark) > extractJoinKeysWithColStats support EqualNullSafe > - > > Key: SPARK-36162 > URL: https://issues.apache.org/jira/browse/SPARK-36162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > sql("select * from date_dim join item on d_date_sk = > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 = i_item_sk#28), Statistics(sizeInBytes=1.0 B, > rowCount=0) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > sql("select * from date_dim join item on d_date_sk <=> > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 <=> i_item_sk#28), Statistics(sizeInBytes=9.2 TiB, > rowCount=1.49E+10) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L329-L339 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36162) extractJoinKeysWithColStats support EqualNullSafe
[ https://issues.apache.org/jira/browse/SPARK-36162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394494#comment-17394494 ] Apache Spark commented on SPARK-36162: -- User 'changvvb' has created a pull request for this issue: https://github.com/apache/spark/pull/33662 > extractJoinKeysWithColStats support EqualNullSafe > - > > Key: SPARK-36162 > URL: https://issues.apache.org/jira/browse/SPARK-36162 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Priority: Major > > sql("select * from date_dim join item on d_date_sk = > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 = i_item_sk#28), Statistics(sizeInBytes=1.0 B, > rowCount=0) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > sql("select * from date_dim join item on d_date_sk <=> > i_item_sk").explain("cost") > {noformat} > == Optimized Logical Plan == > Join Inner, (d_date_sk#0 <=> i_item_sk#28), Statistics(sizeInBytes=9.2 TiB, > rowCount=1.49E+10) > :- Relation > default.date_dim[d_date_sk#0,d_date_id#1,d_date#2,d_month_seq#3,d_week_seq#4,d_quarter_seq#5,d_year#6,d_dow#7,d_moy#8,d_dom#9,d_qoy#10,d_fy_year#11,d_fy_quarter_seq#12,d_fy_week_seq#13,d_day_name#14,d_quarter_name#15,d_holiday#16,d_weekend#17,d_following_holiday#18,d_first_dom#19,d_last_dom#20,d_same_day_ly#21,d_same_day_lq#22,d_current_day#23,... > 4 more fields] parquet, Statistics(sizeInBytes=17.6 MiB, rowCount=7.30E+4) > +- Relation > default.item[i_item_sk#28,i_item_id#29,i_rec_start_date#30,i_rec_end_date#31,i_item_desc#32,i_current_price#33,i_wholesale_cost#34,i_brand_id#35,i_brand#36,i_class_id#37,i_class#38,i_category_id#39,i_category#40,i_manufact_id#41,i_manufact#42,i_size#43,i_formulation#44,i_color#45,i_units#46,i_container#47,i_manager_id#48,i_product_name#49] > parquet, Statistics(sizeInBytes=85.2 MiB, rowCount=2.04E+5) > {noformat} > https://github.com/apache/spark/blob/d6a68e0b67ff7de58073c176dd097070e88ac831/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala#L329-L339 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36415) Add docs for try_cast/try_add/try_divide
[ https://issues.apache.org/jira/browse/SPARK-36415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36415. -- Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33638 [https://github.com/apache/spark/pull/33638] > Add docs for try_cast/try_add/try_divide > > > Key: SPARK-36415 > URL: https://issues.apache.org/jira/browse/SPARK-36415 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.2.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.2.0 > > > Add docs for try_cast/try_add/try_divide -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks
[ https://issues.apache.org/jira/browse/SPARK-36443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-36443: - Description: !image-2021-08-06-11-24-34-122.png! was: !image-2021-08-06-11-24-34-122.png! > Demote BroadcastJoin causes performance regression and increases OOM risks > -- > > Key: SPARK-36443 > URL: https://issues.apache.org/jira/browse/SPARK-36443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kent Yao >Priority: Major > Attachments: image-2021-08-06-11-24-34-122.png > > > > > !image-2021-08-06-11-24-34-122.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks
[ https://issues.apache.org/jira/browse/SPARK-36443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-36443: - Description: !image-2021-08-06-11-24-34-122.png! was:!image-2021-08-06-11-24-34-122.png! > Demote BroadcastJoin causes performance regression and increases OOM risks > -- > > Key: SPARK-36443 > URL: https://issues.apache.org/jira/browse/SPARK-36443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kent Yao >Priority: Major > Attachments: image-2021-08-06-11-24-34-122.png > > > > !image-2021-08-06-11-24-34-122.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks
[ https://issues.apache.org/jira/browse/SPARK-36443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-36443: - Attachment: image-2021-08-06-11-24-34-122.png > Demote BroadcastJoin causes performance regression and increases OOM risks > -- > > Key: SPARK-36443 > URL: https://issues.apache.org/jira/browse/SPARK-36443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kent Yao >Priority: Major > Attachments: image-2021-08-06-11-24-34-122.png > > > !image-2021-08-06-11-19-00-105.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks
[ https://issues.apache.org/jira/browse/SPARK-36443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-36443: - Description: !image-2021-08-06-11-24-34-122.png! (was: !image-2021-08-06-11-19-00-105.png!) > Demote BroadcastJoin causes performance regression and increases OOM risks > -- > > Key: SPARK-36443 > URL: https://issues.apache.org/jira/browse/SPARK-36443 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kent Yao >Priority: Major > Attachments: image-2021-08-06-11-24-34-122.png > > > !image-2021-08-06-11-24-34-122.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36443) Demote BroadcastJoin causes performance regression and increases OOM risks
Kent Yao created SPARK-36443: Summary: Demote BroadcastJoin causes performance regression and increases OOM risks Key: SPARK-36443 URL: https://issues.apache.org/jira/browse/SPARK-36443 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.1.2 Reporter: Kent Yao !image-2021-08-06-11-19-00-105.png! -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36420) Use `isEmpty` to improve performance in Pregel's superstep
[ https://issues.apache.org/jira/browse/SPARK-36420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36420. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33648 [https://github.com/apache/spark/pull/33648] > Use `isEmpty` to improve performance in Pregel's superstep > -- > > Key: SPARK-36420 > URL: https://issues.apache.org/jira/browse/SPARK-36420 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.4.7 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Minor > Fix For: 3.3.0 > > > When I was running `Graphx.connectedComponents` with 20+ billion vertices and > edges, I found that count is very slow. > {code:java} > object Pregel extends Logging { > ... > def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (...): Graph[VD, ED] = { > ... > // Maybe messages.isEmpty() is better than messages.count() > var activeMessages = messages.count() > // Loop > var prevG: Graph[VD, ED] = null > var i = 0 > while (activeMessages > 0 && i < maxIterations) { > ... > activeMessages = messages.count() > ... > } > ... > g > } // end of apply > } // end of class Pregel > {code} > Maybe we only need an action operator here and active-messages are not empty, > so we don’t need to use count, it’s better to use isEmpty. I verified it and > it worked very well. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36420) Use `isEmpty` to improve performance in Pregel's superstep
[ https://issues.apache.org/jira/browse/SPARK-36420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36420: Assignee: xiepengjie > Use `isEmpty` to improve performance in Pregel's superstep > -- > > Key: SPARK-36420 > URL: https://issues.apache.org/jira/browse/SPARK-36420 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.4.7 >Reporter: xiepengjie >Assignee: xiepengjie >Priority: Minor > > When I was running `Graphx.connectedComponents` with 20+ billion vertices and > edges, I found that count is very slow. > {code:java} > object Pregel extends Logging { > ... > def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (...): Graph[VD, ED] = { > ... > // Maybe messages.isEmpty() is better than messages.count() > var activeMessages = messages.count() > // Loop > var prevG: Graph[VD, ED] = null > var i = 0 > while (activeMessages > 0 && i < maxIterations) { > ... > activeMessages = messages.count() > ... > } > ... > g > } // end of apply > } // end of class Pregel > {code} > Maybe we only need an action operator here and active-messages are not empty, > so we don’t need to use count, it’s better to use isEmpty. I verified it and > it worked very well. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36442) Do not reference the deprecated UserDefinedAggregateFunction
Wenchen Fan created SPARK-36442: --- Summary: Do not reference the deprecated UserDefinedAggregateFunction Key: SPARK-36442 URL: https://issues.apache.org/jira/browse/SPARK-36442 Project: Spark Issue Type: Documentation Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan https://spark.apache.org/docs/3.1.2/sql-ref-syntax-ddl-create-function.html#parameters -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36429: --- Comment: was deleted (was: I'm working on.) > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36431) Support comparison of ANSI intervals with different fields
[ https://issues.apache.org/jira/browse/SPARK-36431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36431: Assignee: (was: Apache Spark) > Support comparison of ANSI intervals with different fields > -- > > Key: SPARK-36431 > URL: https://issues.apache.org/jira/browse/SPARK-36431 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Support comparison of > - a day-time interval with another day-time interval which has different > fields > - a year-month interval with another year-month interval where fields are > different. > The example below shows the issue: > {code:sql} > spark-sql> select interval '1' day > interval '1' hour; > Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due > to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' > HOUR)' (interval day and interval hour).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] > +- OneRowRelation > spark-sql> select interval '2' year > interval '11' month; > Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' > due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL > '11' MONTH)' (interval year and interval month).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36431) Support comparison of ANSI intervals with different fields
[ https://issues.apache.org/jira/browse/SPARK-36431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36431: Assignee: Apache Spark > Support comparison of ANSI intervals with different fields > -- > > Key: SPARK-36431 > URL: https://issues.apache.org/jira/browse/SPARK-36431 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Support comparison of > - a day-time interval with another day-time interval which has different > fields > - a year-month interval with another year-month interval where fields are > different. > The example below shows the issue: > {code:sql} > spark-sql> select interval '1' day > interval '1' hour; > Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due > to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' > HOUR)' (interval day and interval hour).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] > +- OneRowRelation > spark-sql> select interval '2' year > interval '11' month; > Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' > due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL > '11' MONTH)' (interval year and interval month).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36431) Support comparison of ANSI intervals with different fields
[ https://issues.apache.org/jira/browse/SPARK-36431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394434#comment-17394434 ] Apache Spark commented on SPARK-36431: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/33661 > Support comparison of ANSI intervals with different fields > -- > > Key: SPARK-36431 > URL: https://issues.apache.org/jira/browse/SPARK-36431 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Support comparison of > - a day-time interval with another day-time interval which has different > fields > - a year-month interval with another year-month interval where fields are > different. > The example below shows the issue: > {code:sql} > spark-sql> select interval '1' day > interval '1' hour; > Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due > to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' > HOUR)' (interval day and interval hour).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] > +- OneRowRelation > spark-sql> select interval '2' year > interval '11' month; > Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' > due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL > '11' MONTH)' (interval year and interval month).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36420) Use `isEmpty` to improve performance in Pregel's superstep
[ https://issues.apache.org/jira/browse/SPARK-36420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36420: - Fix Version/s: (was: 3.3.0) > Use `isEmpty` to improve performance in Pregel's superstep > -- > > Key: SPARK-36420 > URL: https://issues.apache.org/jira/browse/SPARK-36420 > Project: Spark > Issue Type: Improvement > Components: GraphX >Affects Versions: 2.4.7 >Reporter: xiepengjie >Priority: Minor > > When I was running `Graphx.connectedComponents` with 20+ billion vertices and > edges, I found that count is very slow. > {code:java} > object Pregel extends Logging { > ... > def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (...): Graph[VD, ED] = { > ... > // Maybe messages.isEmpty() is better than messages.count() > var activeMessages = messages.count() > // Loop > var prevG: Graph[VD, ED] = null > var i = 0 > while (activeMessages > 0 && i < maxIterations) { > ... > activeMessages = messages.count() > ... > } > ... > g > } // end of apply > } // end of class Pregel > {code} > Maybe we only need an action operator here and active-messages are not empty, > so we don’t need to use count, it’s better to use isEmpty. I verified it and > it worked very well. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36386) Fix DataFrame groupby-expanding to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394430#comment-17394430 ] Apache Spark commented on SPARK-36386: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/33646 > Fix DataFrame groupby-expanding to follow pandas 1.3 > > > Key: SPARK-36386 > URL: https://issues.apache.org/jira/browse/SPARK-36386 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36386) Fix DataFrame groupby-expanding to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394429#comment-17394429 ] Apache Spark commented on SPARK-36386: -- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/33646 > Fix DataFrame groupby-expanding to follow pandas 1.3 > > > Key: SPARK-36386 > URL: https://issues.apache.org/jira/browse/SPARK-36386 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36386) Fix DataFrame groupby-expanding to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36386: Assignee: (was: Apache Spark) > Fix DataFrame groupby-expanding to follow pandas 1.3 > > > Key: SPARK-36386 > URL: https://issues.apache.org/jira/browse/SPARK-36386 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36386) Fix DataFrame groupby-expanding to follow pandas 1.3
[ https://issues.apache.org/jira/browse/SPARK-36386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36386: Assignee: Apache Spark > Fix DataFrame groupby-expanding to follow pandas 1.3 > > > Key: SPARK-36386 > URL: https://issues.apache.org/jira/browse/SPARK-36386 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36421) Validate all SQL configs to prevent from wrong use for ConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-36421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36421: Assignee: Kent Yao > Validate all SQL configs to prevent from wrong use for ConfigEntry > -- > > Key: SPARK-36421 > URL: https://issues.apache.org/jira/browse/SPARK-36421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > > ConfigEntry(key=spark.sql.hive.metastore.version, defaultValue=2.3.7, > doc=Version) > should not go to the doc and set -v command -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36421) Validate all SQL configs to prevent from wrong use for ConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-36421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-36421: - Fix Version/s: 3.3.0 3.2.0 > Validate all SQL configs to prevent from wrong use for ConfigEntry > -- > > Key: SPARK-36421 > URL: https://issues.apache.org/jira/browse/SPARK-36421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kent Yao >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > > ConfigEntry(key=spark.sql.hive.metastore.version, defaultValue=2.3.7, > doc=Version) > should not go to the doc and set -v command -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36421) Validate all SQL configs to prevent from wrong use for ConfigEntry
[ https://issues.apache.org/jira/browse/SPARK-36421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36421. -- Resolution: Fixed Fixed in https://github.com/apache/spark/pull/33647 > Validate all SQL configs to prevent from wrong use for ConfigEntry > -- > > Key: SPARK-36421 > URL: https://issues.apache.org/jira/browse/SPARK-36421 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0 >Reporter: Kent Yao >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > > ConfigEntry(key=spark.sql.hive.metastore.version, defaultValue=2.3.7, > doc=Version) > should not go to the doc and set -v command -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36441) Downloading lintr dependencies fail on GA
[ https://issues.apache.org/jira/browse/SPARK-36441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36441. -- Fix Version/s: 3.1.3 3.2.0 3.0.4 Resolution: Fixed Issue resolved by pull request 33660 [https://github.com/apache/spark/pull/33660] > Downloading lintr dependencies fail on GA > - > > Key: SPARK-36441 > URL: https://issues.apache.org/jira/browse/SPARK-36441 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > Fix For: 3.0.4, 3.2.0, 3.1.3 > > > Downloading lintr dependencies on GA fails. > I re-triggered the GA job but it still fail with the same error. > {code} > * installing *source* package ‘devtools’ ... > ** package ‘devtools’ successfully unpacked and MD5 sums checked > ** using staged installation > ** R > ** inst > ** byte-compile and prepare package for lazy loading > ** help > *** installing help indices > *** copying figures > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (devtools) > The downloaded source packages are in > ‘/tmp/Rtmpv53Ix4/downloaded_packages’ > Using bundled GitHub PAT. Please add your own PAT to the env var `GITHUB_PAT` > Error: Failed to install 'unknown package' from GitHub: > HTTP error 401. > Bad credentials > Rate limit remaining: 59/60 > Rate limit reset at: 2021-08-06 01:37:46 UTC > > Execution halted > Error: Process completed with exit code 1. > {code} > https://github.com/apache/spark/runs/3257853825 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36441) Downloading lintr dependencies fail on GA
[ https://issues.apache.org/jira/browse/SPARK-36441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36441: Assignee: Kousuke Saruta > Downloading lintr dependencies fail on GA > - > > Key: SPARK-36441 > URL: https://issues.apache.org/jira/browse/SPARK-36441 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Downloading lintr dependencies on GA fails. > I re-triggered the GA job but it still fail with the same error. > {code} > * installing *source* package ‘devtools’ ... > ** package ‘devtools’ successfully unpacked and MD5 sums checked > ** using staged installation > ** R > ** inst > ** byte-compile and prepare package for lazy loading > ** help > *** installing help indices > *** copying figures > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (devtools) > The downloaded source packages are in > ‘/tmp/Rtmpv53Ix4/downloaded_packages’ > Using bundled GitHub PAT. Please add your own PAT to the env var `GITHUB_PAT` > Error: Failed to install 'unknown package' from GitHub: > HTTP error 401. > Bad credentials > Rate limit remaining: 59/60 > Rate limit reset at: 2021-08-06 01:37:46 UTC > > Execution halted > Error: Process completed with exit code 1. > {code} > https://github.com/apache/spark/runs/3257853825 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36441) Downloading lintr dependencies fail on GA
[ https://issues.apache.org/jira/browse/SPARK-36441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36441: Assignee: Apache Spark > Downloading lintr dependencies fail on GA > - > > Key: SPARK-36441 > URL: https://issues.apache.org/jira/browse/SPARK-36441 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > Downloading lintr dependencies on GA fails. > I re-triggered the GA job but it still fail with the same error. > {code} > * installing *source* package ‘devtools’ ... > ** package ‘devtools’ successfully unpacked and MD5 sums checked > ** using staged installation > ** R > ** inst > ** byte-compile and prepare package for lazy loading > ** help > *** installing help indices > *** copying figures > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (devtools) > The downloaded source packages are in > ‘/tmp/Rtmpv53Ix4/downloaded_packages’ > Using bundled GitHub PAT. Please add your own PAT to the env var `GITHUB_PAT` > Error: Failed to install 'unknown package' from GitHub: > HTTP error 401. > Bad credentials > Rate limit remaining: 59/60 > Rate limit reset at: 2021-08-06 01:37:46 UTC > > Execution halted > Error: Process completed with exit code 1. > {code} > https://github.com/apache/spark/runs/3257853825 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36441) Downloading lintr dependencies fail on GA
[ https://issues.apache.org/jira/browse/SPARK-36441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394413#comment-17394413 ] Apache Spark commented on SPARK-36441: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33660 > Downloading lintr dependencies fail on GA > - > > Key: SPARK-36441 > URL: https://issues.apache.org/jira/browse/SPARK-36441 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Priority: Major > > Downloading lintr dependencies on GA fails. > I re-triggered the GA job but it still fail with the same error. > {code} > * installing *source* package ‘devtools’ ... > ** package ‘devtools’ successfully unpacked and MD5 sums checked > ** using staged installation > ** R > ** inst > ** byte-compile and prepare package for lazy loading > ** help > *** installing help indices > *** copying figures > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (devtools) > The downloaded source packages are in > ‘/tmp/Rtmpv53Ix4/downloaded_packages’ > Using bundled GitHub PAT. Please add your own PAT to the env var `GITHUB_PAT` > Error: Failed to install 'unknown package' from GitHub: > HTTP error 401. > Bad credentials > Rate limit remaining: 59/60 > Rate limit reset at: 2021-08-06 01:37:46 UTC > > Execution halted > Error: Process completed with exit code 1. > {code} > https://github.com/apache/spark/runs/3257853825 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36441) Downloading lintr dependencies fail on GA
[ https://issues.apache.org/jira/browse/SPARK-36441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36441: Assignee: (was: Apache Spark) > Downloading lintr dependencies fail on GA > - > > Key: SPARK-36441 > URL: https://issues.apache.org/jira/browse/SPARK-36441 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Priority: Major > > Downloading lintr dependencies on GA fails. > I re-triggered the GA job but it still fail with the same error. > {code} > * installing *source* package ‘devtools’ ... > ** package ‘devtools’ successfully unpacked and MD5 sums checked > ** using staged installation > ** R > ** inst > ** byte-compile and prepare package for lazy loading > ** help > *** installing help indices > *** copying figures > ** building package indices > ** installing vignettes > ** testing if installed package can be loaded from temporary location > ** testing if installed package can be loaded from final location > ** testing if installed package keeps a record of temporary installation path > * DONE (devtools) > The downloaded source packages are in > ‘/tmp/Rtmpv53Ix4/downloaded_packages’ > Using bundled GitHub PAT. Please add your own PAT to the env var `GITHUB_PAT` > Error: Failed to install 'unknown package' from GitHub: > HTTP error 401. > Bad credentials > Rate limit remaining: 59/60 > Rate limit reset at: 2021-08-06 01:37:46 UTC > > Execution halted > Error: Process completed with exit code 1. > {code} > https://github.com/apache/spark/runs/3257853825 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36441) Downloading lintr dependencies fail on GA
Kousuke Saruta created SPARK-36441: -- Summary: Downloading lintr dependencies fail on GA Key: SPARK-36441 URL: https://issues.apache.org/jira/browse/SPARK-36441 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.3.0 Reporter: Kousuke Saruta Downloading lintr dependencies on GA fails. I re-triggered the GA job but it still fail with the same error. {code} * installing *source* package ‘devtools’ ... ** package ‘devtools’ successfully unpacked and MD5 sums checked ** using staged installation ** R ** inst ** byte-compile and prepare package for lazy loading ** help *** installing help indices *** copying figures ** building package indices ** installing vignettes ** testing if installed package can be loaded from temporary location ** testing if installed package can be loaded from final location ** testing if installed package keeps a record of temporary installation path * DONE (devtools) The downloaded source packages are in ‘/tmp/Rtmpv53Ix4/downloaded_packages’ Using bundled GitHub PAT. Please add your own PAT to the env var `GITHUB_PAT` Error: Failed to install 'unknown package' from GitHub: HTTP error 401. Bad credentials Rate limit remaining: 59/60 Rate limit reset at: 2021-08-06 01:37:46 UTC Execution halted Error: Process completed with exit code 1. {code} https://github.com/apache/spark/runs/3257853825 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36440) Spark3 fails to read hive table with mixed format
[ https://issues.apache.org/jira/browse/SPARK-36440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Xu updated SPARK-36440: - Description: Spark3 fails to read hive table with mixed format with hive Serde, this is a regression compares to Spark 2.4. Replication steps : 1. In spark 3 (3.0 or 3.1) spark shell: {code:java} scala> spark.sql("create table tmp.test_table (id int, name string) partitioned by (pt int) stored as rcfile") scala> spark.sql("insert into tmp.test_table (pt = 1) values (1, 'Alice'), (2, 'Bob')") {code} 2. Run hive command to change table file format (from RCFile to Parquet). {code:java} hive (default)> alter table set tmp.test_table fileformat Parquet; {code} 3. Try to read partition (in RCFile format) with hive serde using Spark shell: {code:java} scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false") scala> spark.sql("select * from tmp.test_table where pt=1").show{code} Exception: (anonymized file path with ) {code:java} Caused by: java.lang.RuntimeException: s3a:///data/part-0-22112178-5dd7-4065-89d7-2ee550296909-c000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [5, 96, 1, -33] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:524) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:286) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:285) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) {code} was: Spark3 fails to read hive table with mixed format with hive Serde, this is a regression compares to Spark 2.4. Replication steps : 1. In spark 3 (3.0 or 3.1) spark shell: {code:java} scala> spark.sql("create table tmp.test_table (id int, name string) partitioned by (pt int) stored as rcfile") scala> spark.sql("insert into tmp.test_table (pt = 1) values (1, 'Alice'), (2, 'Bob')") {code} 2. Run hive command to change table format (from RCFile to Parquet). {code:java} hive (default)> alter table set tmp.test_table fileformat Parquet; {code} 3. Try to read partition (in RCFile format) with hive serde using Spark shell: {code:java} scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false") scala> spark.sql("select * from tmp.test_table where pt=1").show{code} Exception: (anonymized file path with ) {code:java} Caused by: java.lang.RuntimeException: s3a:///data/part-0-22112178-5dd7-4065-89d7-2ee550296909-c000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [5, 96, 1, -33] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:524) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:286) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:285) at
[jira] [Updated] (SPARK-36440) Spark3 fails to read hive table with mixed format
[ https://issues.apache.org/jira/browse/SPARK-36440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Xu updated SPARK-36440: - Description: Spark3 fails to read hive table with mixed format with hive Serde, this is a regression compares to Spark 2.4. Replication steps : 1. In spark 3 (3.0 or 3.1) spark shell: {code:java} scala> spark.sql("create table tmp.test_table (id int, name string) partitioned by (pt int) stored as rcfile") scala> spark.sql("insert into tmp.test_table (pt = 1) values (1, 'Alice'), (2, 'Bob')") {code} 2. Run hive command to change table format (from RCFile to Parquet). {code:java} hive (default)> alter table set tmp.test_table fileformat Parquet; {code} 3. Try to read partition (in RCFile format) with hive serde using Spark shell: {code:java} scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false") scala> spark.sql("select * from tmp.test_table where pt=1").show{code} Exception: (anonymized file path with ) {code:java} Caused by: java.lang.RuntimeException: s3a:///data/part-0-22112178-5dd7-4065-89d7-2ee550296909-c000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [5, 96, 1, -33] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:524) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:286) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:285) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) {code} was: Spark3 fails to read hive table with mixed format with hive Serde, this is a regression compares to Spark 2.4. Replication steps : 1. In spark 3 (3.0 or 3.1) spark shell: {code:java} scala> spark.sql("create table tmp.test_table (id int, name string) partitioned by (pt int) stored as rcfile") scala> spark.sql("insert into tmp.test_table (pt = 1) values (1, 'Alice'), (2, 'Bob')" {code} 2. Run hive command to change table format (from RCFile to Parquet). {code:java} hive (default)> alter table set tmp.test_table fileformat Parquet; {code} 3. Try to read partition (in RCFile format) with hive serde using Spark shell: {code:java} scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false") scala> spark.sql("select * from tmp.test_table where pt=1").show{code} Exception: (anonymized file path with ) {code:java} Caused by: java.lang.RuntimeException: s3a:///data/part-0-22112178-5dd7-4065-89d7-2ee550296909-c000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [5, 96, 1, -33] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:524) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:286) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:285) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at
[jira] [Created] (SPARK-36440) Spark3 fails to read hive table with mixed format
Jason Xu created SPARK-36440: Summary: Spark3 fails to read hive table with mixed format Key: SPARK-36440 URL: https://issues.apache.org/jira/browse/SPARK-36440 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.2, 3.1.1, 3.0.0 Reporter: Jason Xu Spark3 fails to read hive table with mixed format with hive Serde, this is a regression compares to Spark 2.4. Replication steps : 1. In spark 3 (3.0 or 3.1) spark shell: {code:java} scala> spark.sql("create table tmp.test_table (id int, name string) partitioned by (pt int) stored as rcfile") scala> spark.sql("insert into tmp.test_table (pt = 1) values (1, 'Alice'), (2, 'Bob')" {code} 2. Run hive command to change table format (from RCFile to Parquet). {code:java} hive (default)> alter table set tmp.test_table fileformat Parquet; {code} 3. Try to read partition (in RCFile format) with hive serde using Spark shell: {code:java} scala> spark.conf.set("spark.sql.hive.convertMetastoreParquet", "false") scala> spark.sql("select * from tmp.test_table where pt=1").show{code} Exception: (anonymized file path with ) {code:java} Caused by: java.lang.RuntimeException: s3a:///data/part-0-22112178-5dd7-4065-89d7-2ee550296909-c000 is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [5, 96, 1, -33] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:524) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:433) at org.apache.hadoop.hive.ql.io.parquet.ParquetRecordReaderBase.getSplit(ParquetRecordReaderBase.java:79) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:75) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:60) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:75) at org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:286) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:285) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:243) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:96) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373) at org.apache.spark.rdd.RDD.iterator(RDD.scala:337) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36439) Implement DataFrame.join on key column
Xinrong Meng created SPARK-36439: Summary: Implement DataFrame.join on key column Key: SPARK-36439 URL: https://issues.apache.org/jira/browse/SPARK-36439 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.2.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36438) Support list-like Python objects for Series comparison
Xinrong Meng created SPARK-36438: Summary: Support list-like Python objects for Series comparison Key: SPARK-36438 URL: https://issues.apache.org/jira/browse/SPARK-36438 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36437) Enable binary operations with list-like Python objects
Xinrong Meng created SPARK-36437: Summary: Enable binary operations with list-like Python objects Key: SPARK-36437 URL: https://issues.apache.org/jira/browse/SPARK-36437 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36436) Implement 'weights' and 'axis' in sample at DataFrame and Series
Xinrong Meng created SPARK-36436: Summary: Implement 'weights' and 'axis' in sample at DataFrame and Series Key: SPARK-36436 URL: https://issues.apache.org/jira/browse/SPARK-36436 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36435) Implement MultIndex.equal_levels
Xinrong Meng created SPARK-36435: Summary: Implement MultIndex.equal_levels Key: SPARK-36435 URL: https://issues.apache.org/jira/browse/SPARK-36435 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36434) Implement DataFrame.lookup
Xinrong Meng created SPARK-36434: Summary: Implement DataFrame.lookup Key: SPARK-36434 URL: https://issues.apache.org/jira/browse/SPARK-36434 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.3.0 Reporter: Xinrong Meng -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36433) Logs should show correct URL of where HistoryServer is started
[ https://issues.apache.org/jira/browse/SPARK-36433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36433: Assignee: (was: Apache Spark) > Logs should show correct URL of where HistoryServer is started > -- > > Key: SPARK-36433 > URL: https://issues.apache.org/jira/browse/SPARK-36433 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Thejdeep Gudivada >Priority: Major > > Due to a recent refactoring in the WebUI bind() code, the log message to > print the bound host and port information got moved and because of this the > info printed is incorrect. > > Example log - 21/08/05 10:47:38 INFO HistoryServer: Bound HistoryServer to > 0.0.0.0, and started at :-1 > > Notice above that the port is incorrect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36433) Logs should show correct URL of where HistoryServer is started
[ https://issues.apache.org/jira/browse/SPARK-36433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36433: Assignee: Apache Spark > Logs should show correct URL of where HistoryServer is started > -- > > Key: SPARK-36433 > URL: https://issues.apache.org/jira/browse/SPARK-36433 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Thejdeep Gudivada >Assignee: Apache Spark >Priority: Major > > Due to a recent refactoring in the WebUI bind() code, the log message to > print the bound host and port information got moved and because of this the > info printed is incorrect. > > Example log - 21/08/05 10:47:38 INFO HistoryServer: Bound HistoryServer to > 0.0.0.0, and started at :-1 > > Notice above that the port is incorrect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36433) Logs should show correct URL of where HistoryServer is started
[ https://issues.apache.org/jira/browse/SPARK-36433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394316#comment-17394316 ] Apache Spark commented on SPARK-36433: -- User 'thejdeep' has created a pull request for this issue: https://github.com/apache/spark/pull/33659 > Logs should show correct URL of where HistoryServer is started > -- > > Key: SPARK-36433 > URL: https://issues.apache.org/jira/browse/SPARK-36433 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Thejdeep Gudivada >Priority: Major > > Due to a recent refactoring in the WebUI bind() code, the log message to > print the bound host and port information got moved and because of this the > info printed is incorrect. > > Example log - 21/08/05 10:47:38 INFO HistoryServer: Bound HistoryServer to > 0.0.0.0, and started at :-1 > > Notice above that the port is incorrect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36433) Logs should show correct URL of where HistoryServer is started
Thejdeep Gudivada created SPARK-36433: - Summary: Logs should show correct URL of where HistoryServer is started Key: SPARK-36433 URL: https://issues.apache.org/jira/browse/SPARK-36433 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 3.2.0 Reporter: Thejdeep Gudivada Due to a recent refactoring in the WebUI bind() code, the log message to print the bound host and port information got moved and because of this the info printed is incorrect. Example log - 21/08/05 10:47:38 INFO HistoryServer: Bound HistoryServer to 0.0.0.0, and started at :-1 Notice above that the port is incorrect -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36393) Try to raise memory and parallelism again for GA
[ https://issues.apache.org/jira/browse/SPARK-36393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394311#comment-17394311 ] Apache Spark commented on SPARK-36393: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/33658 > Try to raise memory and parallelism again for GA > > > Key: SPARK-36393 > URL: https://issues.apache.org/jira/browse/SPARK-36393 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > According to the feedback from GitHub, the change causing memory issue has > been rolled back. We can try to raise memory and parallelism again for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36393) Try to raise memory and parallelism again for GA
[ https://issues.apache.org/jira/browse/SPARK-36393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394308#comment-17394308 ] Apache Spark commented on SPARK-36393: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/33658 > Try to raise memory and parallelism again for GA > > > Key: SPARK-36393 > URL: https://issues.apache.org/jira/browse/SPARK-36393 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > According to the feedback from GitHub, the change causing memory issue has > been rolled back. We can try to raise memory and parallelism again for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36393) Try to raise memory and parallelism again for GA
[ https://issues.apache.org/jira/browse/SPARK-36393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394306#comment-17394306 ] Apache Spark commented on SPARK-36393: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/33657 > Try to raise memory and parallelism again for GA > > > Key: SPARK-36393 > URL: https://issues.apache.org/jira/browse/SPARK-36393 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > According to the feedback from GitHub, the change causing memory issue has > been rolled back. We can try to raise memory and parallelism again for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36393) Try to raise memory and parallelism again for GA
[ https://issues.apache.org/jira/browse/SPARK-36393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394305#comment-17394305 ] Apache Spark commented on SPARK-36393: -- User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/33657 > Try to raise memory and parallelism again for GA > > > Key: SPARK-36393 > URL: https://issues.apache.org/jira/browse/SPARK-36393 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.2.0 >Reporter: L. C. Hsieh >Assignee: L. C. Hsieh >Priority: Major > Fix For: 3.2.0 > > > According to the feedback from GitHub, the change causing memory issue has > been rolled back. We can try to raise memory and parallelism again for GA. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36432) Upgrade Jetty version to 9.4.43
[ https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394290#comment-17394290 ] Apache Spark commented on SPARK-36432: -- User 'this' has created a pull request for this issue: https://github.com/apache/spark/pull/33656 > Upgrade Jetty version to 9.4.43 > --- > > Key: SPARK-36432 > URL: https://issues.apache.org/jira/browse/SPARK-36432 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Sajith A >Priority: Minor > > Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to > fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36432) Upgrade Jetty version to 9.4.43
[ https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36432: Assignee: Apache Spark > Upgrade Jetty version to 9.4.43 > --- > > Key: SPARK-36432 > URL: https://issues.apache.org/jira/browse/SPARK-36432 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Sajith A >Assignee: Apache Spark >Priority: Minor > > Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to > fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36432) Upgrade Jetty version to 9.4.43
[ https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394289#comment-17394289 ] Apache Spark commented on SPARK-36432: -- User 'this' has created a pull request for this issue: https://github.com/apache/spark/pull/33656 > Upgrade Jetty version to 9.4.43 > --- > > Key: SPARK-36432 > URL: https://issues.apache.org/jira/browse/SPARK-36432 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Sajith A >Priority: Minor > > Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to > fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36432) Upgrade Jetty version to 9.4.43
[ https://issues.apache.org/jira/browse/SPARK-36432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36432: Assignee: (was: Apache Spark) > Upgrade Jetty version to 9.4.43 > --- > > Key: SPARK-36432 > URL: https://issues.apache.org/jira/browse/SPARK-36432 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.0 >Reporter: Sajith A >Priority: Minor > > Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to > fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36432) Upgrade Jetty version to 9.4.43
Sajith A created SPARK-36432: Summary: Upgrade Jetty version to 9.4.43 Key: SPARK-36432 URL: https://issues.apache.org/jira/browse/SPARK-36432 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.3.0 Reporter: Sajith A Upgrade Jetty version to 9.4.43.v20210629 in current Spark master in order to fix vulnerability https://nvd.nist.gov/vuln/detail/CVE-2021-34429. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation
[ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394265#comment-17394265 ] Holden Karau commented on SPARK-24815: -- cc [~tdas] for thoughts? > Structured Streaming should support dynamic allocation > -- > > Key: SPARK-24815 > URL: https://issues.apache.org/jira/browse/SPARK-24815 > Project: Spark > Issue Type: Improvement > Components: Scheduler, Spark Core, Structured Streaming >Affects Versions: 2.3.1 >Reporter: Karthik Palaniappan >Priority: Minor > > For batch jobs, dynamic allocation is very useful for adding and removing > containers to match the actual workload. On multi-tenant clusters, it ensures > that a Spark job is taking no more resources than necessary. In cloud > environments, it enables autoscaling. > However, if you set spark.dynamicAllocation.enabled=true and run a structured > streaming job, the batch dynamic allocation algorithm kicks in. It requests > more executors if the task backlog is a certain size, and removes executors > if they idle for a certain period of time. > Quick thoughts: > 1) Dynamic allocation should be pluggable, rather than hardcoded to a > particular implementation in SparkContext.scala (this should be a separate > JIRA). > 2) We should make a structured streaming algorithm that's separate from the > batch algorithm. Eventually, continuous processing might need its own > algorithm. > 3) Spark should print a warning if you run a structured streaming job when > Core's dynamic allocation is enabled -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36431) Support comparison of ANSI intervals with different fields
[ https://issues.apache.org/jira/browse/SPARK-36431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394235#comment-17394235 ] angerszhu commented on SPARK-36431: --- Working on this > Support comparison of ANSI intervals with different fields > -- > > Key: SPARK-36431 > URL: https://issues.apache.org/jira/browse/SPARK-36431 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Support comparison of > - a day-time interval with another day-time interval which has different > fields > - a year-month interval with another year-month interval where fields are > different. > The example below shows the issue: > {code:sql} > spark-sql> select interval '1' day > interval '1' hour; > Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due > to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' > HOUR)' (interval day and interval hour).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] > +- OneRowRelation > spark-sql> select interval '2' year > interval '11' month; > Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' > due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL > '11' MONTH)' (interval year and interval month).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36431) Support comparison of ANSI intervals with different fields
[ https://issues.apache.org/jira/browse/SPARK-36431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-36431: - Summary: Support comparison of ANSI intervals with different fields (was: Support comparison of ANSI interval with different fields) > Support comparison of ANSI intervals with different fields > -- > > Key: SPARK-36431 > URL: https://issues.apache.org/jira/browse/SPARK-36431 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Support comparison of > - a day-time interval with another day-time interval which has different > fields > - a year-month interval with another year-month interval where fields are > different. > The example below shows the issue: > {code:sql} > spark-sql> select interval '1' day > interval '1' hour; > Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due > to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' > HOUR)' (interval day and interval hour).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] > +- OneRowRelation > spark-sql> select interval '2' year > interval '11' month; > Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' > due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL > '11' MONTH)' (interval year and interval month).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36431) Support comparison of ANSI interval with different fields
[ https://issues.apache.org/jira/browse/SPARK-36431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394230#comment-17394230 ] Max Gekk commented on SPARK-36431: -- FYI [~cloud_fan] and [~sarutak] [~angerszhuuu] [~beliefer] Please, leave a comment here if you would like to work on this. > Support comparison of ANSI interval with different fields > - > > Key: SPARK-36431 > URL: https://issues.apache.org/jira/browse/SPARK-36431 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Max Gekk >Priority: Major > > Support comparison of > - a day-time interval with another day-time interval which has different > fields > - a year-month interval with another year-month interval where fields are > different. > The example below shows the issue: > {code:sql} > spark-sql> select interval '1' day > interval '1' hour; > Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due > to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' > HOUR)' (interval day and interval hour).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] > +- OneRowRelation > spark-sql> select interval '2' year > interval '11' month; > Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' > due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL > '11' MONTH)' (interval year and interval month).; line 1 pos 7; > 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36431) Support comparison of ANSI interval with different fields
Max Gekk created SPARK-36431: Summary: Support comparison of ANSI interval with different fields Key: SPARK-36431 URL: https://issues.apache.org/jira/browse/SPARK-36431 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.2.0 Reporter: Max Gekk Support comparison of - a day-time interval with another day-time interval which has different fields - a year-month interval with another year-month interval where fields are different. The example below shows the issue: {code:sql} spark-sql> select interval '1' day > interval '1' hour; Error in query: cannot resolve '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' due to data type mismatch: differing types in '(INTERVAL '1' DAY > INTERVAL '01' HOUR)' (interval day and interval hour).; line 1 pos 7; 'Project [unresolvedalias((INTERVAL '1' DAY > INTERVAL '01' HOUR), None)] +- OneRowRelation spark-sql> select interval '2' year > interval '11' month; Error in query: cannot resolve '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' due to data type mismatch: differing types in '(INTERVAL '2' YEAR > INTERVAL '11' MONTH)' (interval year and interval month).; line 1 pos 7; 'Project [unresolvedalias((INTERVAL '2' YEAR > INTERVAL '11' MONTH), None)] +- OneRowRelation {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36414) Disable timeout for BroadcastQueryStageExec in AQE
[ https://issues.apache.org/jira/browse/SPARK-36414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394140#comment-17394140 ] Dongjoon Hyun commented on SPARK-36414: --- Thank you, [~Qin Yao] and [~cloud_fan]. I collect this to SPARK-33828 to give a more visibility. > Disable timeout for BroadcastQueryStageExec in AQE > -- > > Key: SPARK-36414 > URL: https://issues.apache.org/jira/browse/SPARK-36414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-08-04-18-53-44-879.png > > > This reverts SPARK-31475, as there are always more concurrent jobs running in > AQE mode, especially when running multiple queries at the same time. > Currently, the broadcast timeout does not record accurately for the > BroadcastQueryStageExec only but also the time waiting for being scheduled. > If all the resources are currently being occupied for materializing other > stages, it timeouts without a chance to run actually. > > !image-2021-08-04-18-53-44-879.png! > > The default value is 300s, and it's hard to adjust the timeout for AQE mode. > Usually, you need an extremely large number for real-world cases. As you can > see the example, above, the timeout we used for it is 1800s, and obviously, > it needs 3x more or something > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36414) Disable timeout for BroadcastQueryStageExec in AQE
[ https://issues.apache.org/jira/browse/SPARK-36414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-36414: -- Parent: SPARK-33828 Issue Type: Sub-task (was: Improvement) > Disable timeout for BroadcastQueryStageExec in AQE > -- > > Key: SPARK-36414 > URL: https://issues.apache.org/jira/browse/SPARK-36414 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-08-04-18-53-44-879.png > > > This reverts SPARK-31475, as there are always more concurrent jobs running in > AQE mode, especially when running multiple queries at the same time. > Currently, the broadcast timeout does not record accurately for the > BroadcastQueryStageExec only but also the time waiting for being scheduled. > If all the resources are currently being occupied for materializing other > stages, it timeouts without a chance to run actually. > > !image-2021-08-04-18-53-44-879.png! > > The default value is 300s, and it's hard to adjust the timeout for AQE mode. > Usually, you need an extremely large number for real-world cases. As you can > see the example, above, the timeout we used for it is 1800s, and obviously, > it needs 3x more or something > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36430) Adaptively calculate the target size when coalescing shuffle partitions in AQE
[ https://issues.apache.org/jira/browse/SPARK-36430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36430: Assignee: (was: Apache Spark) > Adaptively calculate the target size when coalescing shuffle partitions in AQE > -- > > Key: SPARK-36430 > URL: https://issues.apache.org/jira/browse/SPARK-36430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36430) Adaptively calculate the target size when coalescing shuffle partitions in AQE
[ https://issues.apache.org/jira/browse/SPARK-36430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394136#comment-17394136 ] Apache Spark commented on SPARK-36430: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33655 > Adaptively calculate the target size when coalescing shuffle partitions in AQE > -- > > Key: SPARK-36430 > URL: https://issues.apache.org/jira/browse/SPARK-36430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36430) Adaptively calculate the target size when coalescing shuffle partitions in AQE
[ https://issues.apache.org/jira/browse/SPARK-36430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36430: Assignee: Apache Spark > Adaptively calculate the target size when coalescing shuffle partitions in AQE > -- > > Key: SPARK-36430 > URL: https://issues.apache.org/jira/browse/SPARK-36430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36430) Adaptively calculate the target size when coalescing shuffle partitions in AQE
[ https://issues.apache.org/jira/browse/SPARK-36430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17394134#comment-17394134 ] Apache Spark commented on SPARK-36430: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/33655 > Adaptively calculate the target size when coalescing shuffle partitions in AQE > -- > > Key: SPARK-36430 > URL: https://issues.apache.org/jira/browse/SPARK-36430 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36430) Adaptively calculate the target size when coalescing shuffle partitions in AQE
Wenchen Fan created SPARK-36430: --- Summary: Adaptively calculate the target size when coalescing shuffle partitions in AQE Key: SPARK-36430 URL: https://issues.apache.org/jira/browse/SPARK-36430 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta resolved SPARK-36426. Resolution: Invalid I re-triggered the failed GA job and it finally succeeded so I'll close this issue. > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > GA is failing and the root cause seems the latest release of pyzmq, which is > released a few hours ago. > https://github.com/apache/spark/runs/3250261989#step:11:414 > https://github.com/apache/spark/runs/3250252645?check_suite_focus=true#step:11:417 > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36414) Disable timeout for BroadcastQueryStageExec in AQE
[ https://issues.apache.org/jira/browse/SPARK-36414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36414. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33636 [https://github.com/apache/spark/pull/33636] > Disable timeout for BroadcastQueryStageExec in AQE > -- > > Key: SPARK-36414 > URL: https://issues.apache.org/jira/browse/SPARK-36414 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-08-04-18-53-44-879.png > > > This reverts SPARK-31475, as there are always more concurrent jobs running in > AQE mode, especially when running multiple queries at the same time. > Currently, the broadcast timeout does not record accurately for the > BroadcastQueryStageExec only but also the time waiting for being scheduled. > If all the resources are currently being occupied for materializing other > stages, it timeouts without a chance to run actually. > > !image-2021-08-04-18-53-44-879.png! > > The default value is 300s, and it's hard to adjust the timeout for AQE mode. > Usually, you need an extremely large number for real-world cases. As you can > see the example, above, the timeout we used for it is 1800s, and obviously, > it needs 3x more or something > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36414) Disable timeout for BroadcastQueryStageExec in AQE
[ https://issues.apache.org/jira/browse/SPARK-36414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36414: --- Assignee: Kent Yao > Disable timeout for BroadcastQueryStageExec in AQE > -- > > Key: SPARK-36414 > URL: https://issues.apache.org/jira/browse/SPARK-36414 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.3, 3.1.2, 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Attachments: image-2021-08-04-18-53-44-879.png > > > This reverts SPARK-31475, as there are always more concurrent jobs running in > AQE mode, especially when running multiple queries at the same time. > Currently, the broadcast timeout does not record accurately for the > BroadcastQueryStageExec only but also the time waiting for being scheduled. > If all the resources are currently being occupied for materializing other > stages, it timeouts without a chance to run actually. > > !image-2021-08-04-18-53-44-879.png! > > The default value is 300s, and it's hard to adjust the timeout for AQE mode. > Usually, you need an extremely large number for real-world cases. As you can > see the example, above, the timeout we used for it is 1800s, and obviously, > it needs 3x more or something > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36409) Splitting test cases from datetime.sql
[ https://issues.apache.org/jira/browse/SPARK-36409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-36409. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 33640 [https://github.com/apache/spark/pull/33640] > Splitting test cases from datetime.sql > -- > > Key: SPARK-36409 > URL: https://issues.apache.org/jira/browse/SPARK-36409 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.3.0 > > > Split the test cases related to timestamp_ntz or timestamp_ltz functions from > datetime.sql. This is to reduce the size of datetime.sql, which has around > 300 cases and will increase in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36409) Splitting test cases from datetime.sql
[ https://issues.apache.org/jira/browse/SPARK-36409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-36409: -- Assignee: Wenchen Fan (was: Gengliang Wang) > Splitting test cases from datetime.sql > -- > > Key: SPARK-36409 > URL: https://issues.apache.org/jira/browse/SPARK-36409 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Wenchen Fan >Priority: Minor > Fix For: 3.3.0 > > > Split the test cases related to timestamp_ntz or timestamp_ltz functions from > datetime.sql. This is to reduce the size of datetime.sql, which has around > 300 cases and will increase in the future. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36353) RemoveNoopOperators should keep output schema
[ https://issues.apache.org/jira/browse/SPARK-36353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-36353: --- Assignee: angerszhu > RemoveNoopOperators should keep output schema > - > > Key: SPARK-36353 > URL: https://issues.apache.org/jira/browse/SPARK-36353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Attachments: image-2021-07-30-17-46-59-196.png > > > !image-2021-07-30-17-46-59-196.png|width=539,height=220! > [https://github.com/apache/spark/pull/33587] > > Only first level? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36353) RemoveNoopOperators should keep output schema
[ https://issues.apache.org/jira/browse/SPARK-36353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-36353. - Fix Version/s: 3.2.0 Resolution: Fixed Issue resolved by pull request 33587 [https://github.com/apache/spark/pull/33587] > RemoveNoopOperators should keep output schema > - > > Key: SPARK-36353 > URL: https://issues.apache.org/jira/browse/SPARK-36353 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.0 > > Attachments: image-2021-07-30-17-46-59-196.png > > > !image-2021-07-30-17-46-59-196.png|width=539,height=220! > [https://github.com/apache/spark/pull/33587] > > Only first level? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36429: Assignee: (was: Apache Spark) > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36429: Assignee: Apache Spark > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393891#comment-17393891 ] Apache Spark commented on SPARK-36429: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/33654 > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393828#comment-17393828 ] jiaan.geng commented on SPARK-36429: I'm working on. > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36429: --- Description: Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is different between from_json and from_csv. {code:java} -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct> -- !query output {"t":null} {code} -- !query select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct<> -- !query output java.lang.Exception Unsupported type: timestamp_ntz We should make from_json throws exception too. was: Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is different between from_json and from_csv. -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct> -- !query output {"t":null} -- !query select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct<> -- !query output java.lang.Exception Unsupported type: timestamp_ntz We should make from_json throws exception too. > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
[ https://issues.apache.org/jira/browse/SPARK-36429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-36429: --- Description: Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is different between from_json and from_csv. {code:java} -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct> -- !query output {"t":null} {code} {code:java} -- !query select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct<> -- !query output java.lang.Exception Unsupported type: timestamp_ntz {code} We should make from_json throws exception too. was: Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is different between from_json and from_csv. {code:java} -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct> -- !query output {"t":null} {code} -- !query select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct<> -- !query output java.lang.Exception Unsupported type: timestamp_ntz We should make from_json throws exception too. > JacksonParser should throw exception when data type unsupported. > > > Key: SPARK-36429 > URL: https://issues.apache.org/jira/browse/SPARK-36429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Priority: Major > > Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is > different between from_json and from_csv. > {code:java} > -- !query > select from_json('{"t":"26/October/2015"}', 't Timestamp', > map('timestampFormat', 'dd/M/')) > -- !query schema > struct> > -- !query output > {"t":null} > {code} > {code:java} > -- !query > select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', > 'dd/M/')) > -- !query schema > struct<> > -- !query output > java.lang.Exception > Unsupported type: timestamp_ntz > {code} > We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36429) JacksonParser should throw exception when data type unsupported.
jiaan.geng created SPARK-36429: -- Summary: JacksonParser should throw exception when data type unsupported. Key: SPARK-36429 URL: https://issues.apache.org/jira/browse/SPARK-36429 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: jiaan.geng Currently, when set spark.sql.timestampType=TIMESTAMP_NTZ, the behavior is different between from_json and from_csv. -- !query select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct> -- !query output {"t":null} -- !query select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/M/')) -- !query schema struct<> -- !query output java.lang.Exception Unsupported type: timestamp_ntz We should make from_json throws exception too. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36355) NamedExpression add method `withName(newName: String)`
[ https://issues.apache.org/jira/browse/SPARK-36355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] angerszhu resolved SPARK-36355. --- Resolution: Not A Problem > NamedExpression add method `withName(newName: String)` > -- > > Key: SPARK-36355 > URL: https://issues.apache.org/jira/browse/SPARK-36355 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36428) the 'seconds' parameter of 'make_timestamp' should accept integer type
[ https://issues.apache.org/jira/browse/SPARK-36428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393818#comment-17393818 ] jiaan.geng commented on SPARK-36428: I will take a look. > the 'seconds' parameter of 'make_timestamp' should accept integer type > -- > > Key: SPARK-36428 > URL: https://issues.apache.org/jira/browse/SPARK-36428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > > With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because > the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be > implicitly casted to DECIMAL(8,6) under ANSI mode. > We should update the function {{make_timestamp}} to allow integer type > 'seconds' parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36428) the 'second' parameter of 'make_timestamp' should accept integer type
[ https://issues.apache.org/jira/browse/SPARK-36428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-36428: Description: With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be implicitly casted to DECIMAL(8,6) under ANSI mode. We should update the function {{make_timestamp}} to allow integer type 'seconds' > the 'second' parameter of 'make_timestamp' should accept integer type > - > > Key: SPARK-36428 > URL: https://issues.apache.org/jira/browse/SPARK-36428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > > With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because > the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be > implicitly casted to DECIMAL(8,6) under ANSI mode. > We should update the function {{make_timestamp}} to allow integer type > 'seconds' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36428) the 'seconds' parameter of 'make_timestamp' should accept integer type
[ https://issues.apache.org/jira/browse/SPARK-36428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-36428: Summary: the 'seconds' parameter of 'make_timestamp' should accept integer type (was: the 'second' parameter of 'make_timestamp' should accept integer type) > the 'seconds' parameter of 'make_timestamp' should accept integer type > -- > > Key: SPARK-36428 > URL: https://issues.apache.org/jira/browse/SPARK-36428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > > With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because > the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be > implicitly casted to DECIMAL(8,6) under ANSI mode. > We should update the function {{make_timestamp}} to allow integer type > 'seconds' -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36428) the 'seconds' parameter of 'make_timestamp' should accept integer type
[ https://issues.apache.org/jira/browse/SPARK-36428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-36428: Description: With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be implicitly casted to DECIMAL(8,6) under ANSI mode. We should update the function {{make_timestamp}} to allow integer type 'seconds' parameter. was: With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be implicitly casted to DECIMAL(8,6) under ANSI mode. We should update the function {{make_timestamp}} to allow integer type 'seconds' > the 'seconds' parameter of 'make_timestamp' should accept integer type > -- > > Key: SPARK-36428 > URL: https://issues.apache.org/jira/browse/SPARK-36428 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0 >Reporter: Wenchen Fan >Priority: Major > > With ANSI mode, {{SELECT make_timestamp(1, 1, 1, 1, 1, 1)}} fails, because > the 'seconds' parameter needs to be of type DECIMAL(8,6), and INT can't be > implicitly casted to DECIMAL(8,6) under ANSI mode. > We should update the function {{make_timestamp}} to allow integer type > 'seconds' parameter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36428) the 'second' parameter of 'make_timestamp' should accept integer type
Wenchen Fan created SPARK-36428: --- Summary: the 'second' parameter of 'make_timestamp' should accept integer type Key: SPARK-36428 URL: https://issues.apache.org/jira/browse/SPARK-36428 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36426: --- Description: GA is failing and the root cause seems the latest release of pyzmq, which is released a few hours ago. https://github.com/apache/spark/runs/3250261989#step:11:414 https://github.com/apache/spark/runs/3250252645?check_suite_focus=true#step:11:417 https://pypi.org/project/pyzmq/ was: GA is failing and the root cause seems the latest release of pyzmq, which is released a few hours ago. https://pypi.org/project/pyzmq/ > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > GA is failing and the root cause seems the latest release of pyzmq, which is > released a few hours ago. > https://github.com/apache/spark/runs/3250261989#step:11:414 > https://github.com/apache/spark/runs/3250252645?check_suite_focus=true#step:11:417 > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36426: --- Description: GA is failing and the root cause seems the latest release of pyzmq, which is released a few hours ago. https://pypi.org/project/pyzmq/ was: GA is failing and the root cause seems the latest release of pyzmq. https://pypi.org/project/pyzmq/ > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > GA is failing and the root cause seems the latest release of pyzmq, which is > released a few hours ago. > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36426: --- Description: GA is failing and the root cause seems the latest release of pyzmq. https://pypi.org/project/pyzmq/ was: This is a hotfix PR to recover GA. See https://github.com/apache/spark/runs/3250261989 The root cause seems the latest release of pyzmq. https://pypi.org/project/pyzmq/ > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > GA is failing and the root cause seems the latest release of pyzmq. > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-36426: --- Affects Version/s: 3.2.0 > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > This is a hotfix PR to recover GA. > See https://github.com/apache/spark/runs/3250261989 > The root cause seems the latest release of pyzmq. > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap
[ https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu reassigned SPARK-36425: -- Assignee: Weichen Xu > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap > - > > Key: SPARK-36425 > URL: https://issues.apache.org/jira/browse/SPARK-36425 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.2.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393743#comment-17393743 ] Apache Spark commented on SPARK-36426: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/33653 > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > This is a hotfix PR to recover GA. > See https://github.com/apache/spark/runs/3250261989 > The root cause seems the latest release of pyzmq. > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36427) Scala API: support CrossValidatorModel get standard deviation of metrics for each paramMap
Weichen Xu created SPARK-36427: -- Summary: Scala API: support CrossValidatorModel get standard deviation of metrics for each paramMap Key: SPARK-36427 URL: https://issues.apache.org/jira/browse/SPARK-36427 Project: Spark Issue Type: New Feature Components: ML Affects Versions: 3.2.0 Reporter: Weichen Xu This is the parity feature of https://issues.apache.org/jira/browse/SPARK-36425 Note: We need also update PySpark CrossValidatorModel.to_java/from_java methods in this task. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36426: Assignee: Apache Spark (was: Kousuke Saruta) > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > This is a hotfix PR to recover GA. > See https://github.com/apache/spark/runs/3250261989 > The root cause seems the latest release of pyzmq. > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36426) Pin pyzmq to 2.22.0
[ https://issues.apache.org/jira/browse/SPARK-36426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36426: Assignee: Kousuke Saruta (was: Apache Spark) > Pin pyzmq to 2.22.0 > --- > > Key: SPARK-36426 > URL: https://issues.apache.org/jira/browse/SPARK-36426 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > This is a hotfix PR to recover GA. > See https://github.com/apache/spark/runs/3250261989 > The root cause seems the latest release of pyzmq. > https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36426) Pin pyzmq to 2.22.0
Kousuke Saruta created SPARK-36426: -- Summary: Pin pyzmq to 2.22.0 Key: SPARK-36426 URL: https://issues.apache.org/jira/browse/SPARK-36426 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta This is a hotfix PR to recover GA. See https://github.com/apache/spark/runs/3250261989 The root cause seems the latest release of pyzmq. https://pypi.org/project/pyzmq/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap
[ https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393740#comment-17393740 ] Apache Spark commented on SPARK-36425: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/33652 > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap > - > > Key: SPARK-36425 > URL: https://issues.apache.org/jira/browse/SPARK-36425 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.2.0 >Reporter: Weichen Xu >Priority: Major > > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap
[ https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36425: Assignee: (was: Apache Spark) > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap > - > > Key: SPARK-36425 > URL: https://issues.apache.org/jira/browse/SPARK-36425 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.2.0 >Reporter: Weichen Xu >Priority: Major > > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap
[ https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-36425: Assignee: Apache Spark > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap > - > > Key: SPARK-36425 > URL: https://issues.apache.org/jira/browse/SPARK-36425 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.2.0 >Reporter: Weichen Xu >Assignee: Apache Spark >Priority: Major > > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36425) PySpark: support CrossValidatorModel get standard deviation of metrics for each paramMap
[ https://issues.apache.org/jira/browse/SPARK-36425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17393739#comment-17393739 ] Apache Spark commented on SPARK-36425: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/33652 > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap > - > > Key: SPARK-36425 > URL: https://issues.apache.org/jira/browse/SPARK-36425 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.2.0 >Reporter: Weichen Xu >Priority: Major > > PySpark: support CrossValidatorModel get standard deviation of metrics for > each paramMap. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org