[jira] [Assigned] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37028: Assignee: Apache Spark > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Assignee: Apache Spark >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37028: Assignee: (was: Apache Spark) > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429652#comment-17429652 ] Apache Spark commented on SPARK-37028: -- User 'weixiuli' has created a pull request for this issue: https://github.com/apache/spark/pull/34302 > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem, but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this problem, > but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a 'kill' link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a 'kill' link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in the Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executor link in the Web UI. (was: Add a 'kill' executor link in Web UI.) > Add a 'kill' executor link in the Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or it has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: The executor which is running in a bad node(eg. The system is overloaded or disks are busy) or it has big GC overheads may affect the efficiency of job execution, although there are speculative mechanisms to resolve this problem,but sometimes the speculated task may also run in a bad executor. We should have a "kill" link for each executor, similar to what we have for each stage, so it's easier for users to kill executors in the UI. was:Add a 'kill' executors link in Web UI. > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > The executor which is running in a bad node(eg. The system is overloaded or > disks are busy) or it has big GC overheads may affect the efficiency of job > execution, although there are speculative mechanisms to resolve this > problem,but sometimes the speculated task may also run in a bad executor. > We should have a "kill" link for each executor, similar to what we have for > each stage, so it's easier for users to kill executors in the UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executor link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executor link in Web UI. (was: Add a 'kill' executors link in Web UI.) > Add a 'kill' executor link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > Add a 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add a 'kill' executors link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Summary: Add a 'kill' executors link in Web UI. (was: Add 'kill' executors link in Web UI.) > Add a 'kill' executors link in Web UI. > --- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > Add a 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37028) Add 'kill' executors link in Web UI.
weixiuli created SPARK-37028: Summary: Add 'kill' executors link in Web UI. Key: SPARK-37028 URL: https://issues.apache.org/jira/browse/SPARK-37028 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.3.0 Reporter: weixiuli Add 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37028) Add 'kill' executors link in Web UI.
[ https://issues.apache.org/jira/browse/SPARK-37028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] weixiuli updated SPARK-37028: - Description: Add a 'kill' executors link in Web UI. (was: Add 'kill' executors link in Web UI.) > Add 'kill' executors link in Web UI. > -- > > Key: SPARK-37028 > URL: https://issues.apache.org/jira/browse/SPARK-37028 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: weixiuli >Priority: Major > > Add a 'kill' executors link in Web UI. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Component/s: Build > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: Build, ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37027) Fix behavior inconsistent in Hive table when ‘path’ is provided in SERDEPROPERTIES
[ https://issues.apache.org/jira/browse/SPARK-37027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuzhou Sun updated SPARK-37027: --- Attachment: SPARK-37027-test-example.patch > Fix behavior inconsistent in Hive table when ‘path’ is provided in > SERDEPROPERTIES > -- > > Key: SPARK-37027 > URL: https://issues.apache.org/jira/browse/SPARK-37027 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.1.2 >Reporter: Yuzhou Sun >Priority: Trivial > Attachments: SPARK-37027-test-example.patch > > > If a Hive table is created with both {{WITH SERDEPROPERTIES > ('path'='')}} and {{LOCATION }}, Spark can > return doubled rows when reading the table. This issue seems to be an > extension of SPARK-30507. > Reproduce steps: > # Create table and insert records via Hive (Spark doesn't allow to insert > into table like this) > {code:sql} > CREATE TABLE `test_table`( > `c1` LONG, > `c2` STRING) > ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' > WITH SERDEPROPERTIES ('path'=''" ) > STORED AS > INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' > LOCATION ''; > INSERT INTO TABLE `test_table` > VALUES (0, '0'); > SELECT * FROM `test_table`; > -- will return > -- 0 0 > {code} > # Read above table from Spark > {code:sql} > SELECT * FROM `test_table`; > -- will return > -- 0 0 > -- 0 0 > {code} > But if we set {{spark.sql.hive.convertMetastoreParquet=false}}, Spark will > return same result as Hive (i.e. single row) > A similar case is that, if a Hive table is created with both {{WITH > SERDEPROPERTIES ('path'='')}} and {{LOCATION }}, > Spark will read both rows under {{anotherPath}} and rows under > {{tableLocation}}, regardless of {{spark.sql.hive.convertMetastoreParquet}} > ‘s value. However, actually Hive seems to return only rows under > {{tableLocation}} > Another similar case is that, if {{path}} is provided in {{TBLPROPERTIES}}, > Spark won’t double the rows when {{'path'=''}}. If > {{'path'=''}}, Spark will read both rows under {{anotherPath}} > and rows under {{tableLocation}}, Hive seems to keep ignoring the {{path}} in > {{TBLPROPERTIES}} > Code examples for the above cases (diff patch wrote in > {{HiveParquetMetastoreSuite.scala}}) can be found in Attachments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37027) Fix behavior inconsistent in Hive table when ‘path’ is provided in SERDEPROPERTIES
Yuzhou Sun created SPARK-37027: -- Summary: Fix behavior inconsistent in Hive table when ‘path’ is provided in SERDEPROPERTIES Key: SPARK-37027 URL: https://issues.apache.org/jira/browse/SPARK-37027 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.2, 2.4.5 Reporter: Yuzhou Sun If a Hive table is created with both {{WITH SERDEPROPERTIES ('path'='')}} and {{LOCATION }}, Spark can return doubled rows when reading the table. This issue seems to be an extension of SPARK-30507. Reproduce steps: # Create table and insert records via Hive (Spark doesn't allow to insert into table like this) {code:sql} CREATE TABLE `test_table`( `c1` LONG, `c2` STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' WITH SERDEPROPERTIES ('path'=''" ) STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION ''; INSERT INTO TABLE `test_table` VALUES (0, '0'); SELECT * FROM `test_table`; -- will return -- 0 0 {code} # Read above table from Spark {code:sql} SELECT * FROM `test_table`; -- will return -- 0 0 -- 0 0 {code} But if we set {{spark.sql.hive.convertMetastoreParquet=false}}, Spark will return same result as Hive (i.e. single row) A similar case is that, if a Hive table is created with both {{WITH SERDEPROPERTIES ('path'='')}} and {{LOCATION }}, Spark will read both rows under {{anotherPath}} and rows under {{tableLocation}}, regardless of {{spark.sql.hive.convertMetastoreParquet}} ‘s value. However, actually Hive seems to return only rows under {{tableLocation}} Another similar case is that, if {{path}} is provided in {{TBLPROPERTIES}}, Spark won’t double the rows when {{'path'=''}}. If {{'path'=''}}, Spark will read both rows under {{anotherPath}} and rows under {{tableLocation}}, Hive seems to keep ignoring the {{path}} in {{TBLPROPERTIES}} Code examples for the above cases (diff patch wrote in {{HiveParquetMetastoreSuite.scala}}) can be found in Attachments -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429644#comment-17429644 ] Apache Spark commented on SPARK-37026: -- User 'sarutak' has created a pull request for this issue: https://github.com/apache/spark/pull/34301 > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37026: Assignee: Apache Spark (was: Kousuke Saruta) > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37026: Assignee: Kousuke Saruta (was: Apache Spark) > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Summary: Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 (was: Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13) > Ensure the element type of ResolvedRFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Summary: Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13 (was: ResolvedRFormula.toString throws ClassCastException with Scala 2.13) > Ensure the element type of RFormula.terms is scala.Seq for Scala 2.13 > - > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Description: ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. (was: ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the type of ResolvedRFormula.terms is scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed.) > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 > --- > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is scala.Seq[scala.Seq[String]] but > scala.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13
Kousuke Saruta created SPARK-37026: -- Summary: ResolvedRFormula.toString throws ClassCastException with Scala 2.13 Key: SPARK-37026 URL: https://issues.apache.org/jira/browse/SPARK-37026 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 3.3.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because the type of ResolvedRFormula.terms is scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37026) ResolvedRFormula.toString throws ClassCastException with Scala 2.13
[ https://issues.apache.org/jira/browse/SPARK-37026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated SPARK-37026: --- Issue Type: Bug (was: Improvement) > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 > --- > > Key: SPARK-37026 > URL: https://issues.apache.org/jira/browse/SPARK-37026 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 3.3.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > ResolvedRFormula.toString throws ClassCastException with Scala 2.13 because > the type of ResolvedRFormula.terms is > scala.collection.immutable.Seq[scala.collection.imutable.Seq[String]] but > scala.collection.immutable.Seq[scala.collection.mutable.ArraySeq$ofRef] will > be passed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Affects Version/s: 1.6.0 > Spark evicts RDD partitions instead of allowing OOM > --- > > Key: SPARK-36966 > URL: https://issues.apache.org/jira/browse/SPARK-36966 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.0, 2.4.4 >Reporter: sam >Priority: Major > > In the past Spark (pre 1.6) jobs would give OOM if an RDD could not fit into > memory (when trying to cache with MEMORY_ONLY). These days it seems Spark > jobs will evict partitions from the cache and recompute them from scratch. > We have some jobs that cache an RDD then traverse it 300 times. In order to > know when we need to increase the memory on our cluster, we need to know when > it has run out of memory. > The new behaviour of Spark makes this difficult ... rather than the job > OOMing (like in the past), the job instead just takes forever (our > surrounding logic eventually times out the job). Diagnosing why the job > failed becomes difficult because it's not immediately obvious from the logs > that the job has run out of memory (since no OOM is thrown). One can find > "evicted" log lines. > As a bit of a hack, we are using accumulators with mapPartitionsWithIndex and > updating a count of the number of times each partition is traversed, then we > forcibly blow up the job when this count is 2 or more (indicating an > eviction). This hack doesn't work very well as it seems to give false > positives (RCA not yet understood, it doesn't seem to be speculative > execution, nor are tasks failing (`egrep "task .* in stage [0-9]+\.1" | wc > -l` gives 0). > *Question 1*: Is there a way to disable this new behaviour of Spark and make > it behave like it used to (i.e. just blow up with OOM) - I've looked in the > Spark configuration and cannot find anything like "disable-eviction". > *Question 2*: Is there any method on the SparkSession or SparkContext that we > can call to easily detect when eviction is happening? > If not, then for our use case this is an effective regression - we need a way > to make Spark behave predictably, or at least a way to determine > automatically when Spark is running slowly due to lack of memory. > FOR CONTEXT > > Borrowed storage memory may be evicted when memory pressure arises > From > https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-spark-1.pdf, > which is attached to https://issues.apache.org/jira/browse/SPARK-1 which > was implemented in 1.6 > https://spark.apache.org/releases/spark-release-1-6-0.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Description: In the past Spark (pre 1.6) jobs would give OOM if an RDD could not fit into memory (when trying to cache with MEMORY_ONLY). These days it seems Spark jobs will evict partitions from the cache and recompute them from scratch. We have some jobs that cache an RDD then traverse it 300 times. In order to know when we need to increase the memory on our cluster, we need to know when it has run out of memory. The new behaviour of Spark makes this difficult ... rather than the job OOMing (like in the past), the job instead just takes forever (our surrounding logic eventually times out the job). Diagnosing why the job failed becomes difficult because it's not immediately obvious from the logs that the job has run out of memory (since no OOM is thrown). One can find "evicted" log lines. As a bit of a hack, we are using accumulators with mapPartitionsWithIndex and updating a count of the number of times each partition is traversed, then we forcibly blow up the job when this count is 2 or more (indicating an eviction). This hack doesn't work very well as it seems to give false positives (RCA not yet understood, it doesn't seem to be speculative execution, nor are tasks failing (`egrep "task .* in stage [0-9]+\.1" | wc -l` gives 0). *Question 1*: Is there a way to disable this new behaviour of Spark and make it behave like it used to (i.e. just blow up with OOM) - I've looked in the Spark configuration and cannot find anything like "disable-eviction". *Question 2*: Is there any method on the SparkSession or SparkContext that we can call to easily detect when eviction is happening? If not, then for our use case this is an effective regression - we need a way to make Spark behave predictably, or at least a way to determine automatically when Spark is running slowly due to lack of memory. FOR CONTEXT > Borrowed storage memory may be evicted when memory pressure arises >From >https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-spark-1.pdf, > which is attached to https://issues.apache.org/jira/browse/SPARK-1 which >was implemented in 1.6 >https://spark.apache.org/releases/spark-release-1-6-0.html was: In the past Spark jobs would give OOM if an RDD could not fit into memory (when trying to cache with MEMORY_ONLY). These days it seems Spark jobs will evict partitions from the cache and recompute them from scratch. We have some jobs that cache an RDD then traverse it 300 times. In order to know when we need to increase the memory on our cluster, we need to know when it has run out of memory. The new behaviour of Spark makes this difficult ... rather than the job OOMing (like in the past), the job instead just takes forever (our surrounding logic eventually times out the job). Diagnosing why the job failed becomes difficult because it's not immediately obvious from the logs that the job has run out of memory (since no OOM is thrown). One can find "evicted" log lines. As a bit of a hack, we are using accumulators with mapPartitionsWithIndex and updating a count of the number of times each partition is traversed, then we forcably blow up the job when this count is 2 or more (indicating an evicition). This hack doesn't work very well as it seems to give false positives (RCA not yet understood, it doesn't seem to be speculative execution, nor are tasks failing (`egrep "task .* in stage [0-9]+\.1" | wc -l` gives 0). *Question 1*: Is there a way to disable this new behaviour of Spark and make it behave like it used to (i.e. just blow up with OOM) - I've looked in the Spark configuration and cannot find anything like "disable-eviction". *Question 2*: Is there any method on the SparkSession or SparkContext that we can call to easily detect when eviction is happening? If not, then for our use case this is an effective regression - we need a way to make Spark behave predictably, or at least a way to determine automatically when Spark is running slowly due to lack of memory. FOR CONTEXT > Borrowed storage memory may be evicted when memory pressure arises >From >https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-spark-1.pdf, > which is attached to https://issues.apache.org/jira/browse/SPARK-1 which >was implemented in 1.6 >https://spark.apache.org/releases/spark-release-1-6-0.html > Spark evicts RDD partitions instead of allowing OOM > --- > > Key: SPARK-36966 > URL: https://issues.apache.org/jira/browse/SPARK-36966 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: sam >Priority: Major > > In the past Spark (
[jira] [Updated] (SPARK-36966) Spark evicts RDD partitions instead of allowing OOM
[ https://issues.apache.org/jira/browse/SPARK-36966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-36966: Description: In the past Spark jobs would give OOM if an RDD could not fit into memory (when trying to cache with MEMORY_ONLY). These days it seems Spark jobs will evict partitions from the cache and recompute them from scratch. We have some jobs that cache an RDD then traverse it 300 times. In order to know when we need to increase the memory on our cluster, we need to know when it has run out of memory. The new behaviour of Spark makes this difficult ... rather than the job OOMing (like in the past), the job instead just takes forever (our surrounding logic eventually times out the job). Diagnosing why the job failed becomes difficult because it's not immediately obvious from the logs that the job has run out of memory (since no OOM is thrown). One can find "evicted" log lines. As a bit of a hack, we are using accumulators with mapPartitionsWithIndex and updating a count of the number of times each partition is traversed, then we forcably blow up the job when this count is 2 or more (indicating an evicition). This hack doesn't work very well as it seems to give false positives (RCA not yet understood, it doesn't seem to be speculative execution, nor are tasks failing (`egrep "task .* in stage [0-9]+\.1" | wc -l` gives 0). *Question 1*: Is there a way to disable this new behaviour of Spark and make it behave like it used to (i.e. just blow up with OOM) - I've looked in the Spark configuration and cannot find anything like "disable-eviction". *Question 2*: Is there any method on the SparkSession or SparkContext that we can call to easily detect when eviction is happening? If not, then for our use case this is an effective regression - we need a way to make Spark behave predictably, or at least a way to determine automatically when Spark is running slowly due to lack of memory. FOR CONTEXT > Borrowed storage memory may be evicted when memory pressure arises >From >https://issues.apache.org/jira/secure/attachment/12765646/unified-memory-management-spark-1.pdf, > which is attached to https://issues.apache.org/jira/browse/SPARK-1 which >was implemented in 1.6 >https://spark.apache.org/releases/spark-release-1-6-0.html was: In the past Spark jobs would give OOM if an RDD could not fit into memory (when trying to cache with MEMORY_ONLY). These days it seems Spark jobs will evict partitions from the cache and recompute them from scratch. We have some jobs that cache an RDD then traverse it 300 times. In order to know when we need to increase the memory on our cluster, we need to know when it has run out of memory. The new behaviour of Spark makes this difficult ... rather than the job OOMing (like in the past), the job instead just takes forever (our surrounding logic eventually times out the job). Diagnosing why the job failed becomes difficult because it's not immediately obvious from the logs that the job has run out of memory (since no OOM is thrown). One can find "evicted" log lines. As a bit of a hack, we are using accumulators with mapPartitionsWithIndex and updating a count of the number of times each partition is traversed, then we forcably blow up the job when this count is 2 or more (indicating an evicition). This hack doesn't work very well as it seems to give false positives (RCA not yet understood, it doesn't seem to be speculative execution, nor are tasks failing (`egrep "task .* in stage [0-9]+\.1" | wc -l` gives 0). *Question 1*: Is there a way to disable this new behaviour of Spark and make it behave like it used to (i.e. just blow up with OOM) - I've looked in the Spark configuration and cannot find anything like "disable-eviction". *Question 2*: Is there any method on the SparkSession or SparkContext that we can call to easily detect when eviction is happening? If not, then for our use case this is an effective regression - we need a way to make Spark behave predictably, or at least a way to determine automatically when Spark is running slowly due to lack of memory. > Spark evicts RDD partitions instead of allowing OOM > --- > > Key: SPARK-36966 > URL: https://issues.apache.org/jira/browse/SPARK-36966 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4 >Reporter: sam >Priority: Major > > In the past Spark jobs would give OOM if an RDD could not fit into memory > (when trying to cache with MEMORY_ONLY). These days it seems Spark jobs will > evict partitions from the cache and recompute them from scratch. > We have some jobs that cache an RDD then traverse it 300 times. In order to > know when we need to increase the memory on our cluster, we need to know when > i
[jira] [Assigned] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22
[ https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37025: Assignee: Apache Spark (was: Lantao Jin) > Upgrade RoaringBitmap to 0.9.22 > --- > > Key: SPARK-37025 > URL: https://issues.apache.org/jira/browse/SPARK-37025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Igor Dvorzhak >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22
[ https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37025: Assignee: Lantao Jin (was: Apache Spark) > Upgrade RoaringBitmap to 0.9.22 > --- > > Key: SPARK-37025 > URL: https://issues.apache.org/jira/browse/SPARK-37025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Igor Dvorzhak >Assignee: Lantao Jin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22
[ https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429583#comment-17429583 ] Apache Spark commented on SPARK-37025: -- User 'medb' has created a pull request for this issue: https://github.com/apache/spark/pull/34300 > Upgrade RoaringBitmap to 0.9.22 > --- > > Key: SPARK-37025 > URL: https://issues.apache.org/jira/browse/SPARK-37025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Igor Dvorzhak >Assignee: Lantao Jin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22
[ https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Dvorzhak updated SPARK-37025: -- Labels: (was: correctness) > Upgrade RoaringBitmap to 0.9.22 > --- > > Key: SPARK-37025 > URL: https://issues.apache.org/jira/browse/SPARK-37025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Igor Dvorzhak >Assignee: Lantao Jin >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22
[ https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Igor Dvorzhak updated SPARK-37025: -- Fix Version/s: (was: 2.4.2) (was: 2.3.4) (was: 3.0.0) Affects Version/s: (was: 2.3.3) (was: 2.4.0) (was: 3.0.0) (was: 2.2.0) (was: 2.1.0) (was: 2.0.0) 3.2.0 Description: (was: HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap-0.5.11 couldn't be ser/deser with unsafe KryoSerializer. We can use below UT to reproduce: {code} test("kryo serialization with RoaringBitmap") { val bitmap = new RoaringBitmap bitmap.add(1787) val safeSer = new KryoSerializer(conf).newInstance() val bitmap2 : RoaringBitmap = safeSer.deserialize(safeSer.serialize(bitmap)) assert(bitmap2.equals(bitmap)) conf.set("spark.kryo.unsafe", "true") val unsafeSer = new KryoSerializer(conf).newInstance() val bitmap3 : RoaringBitmap = unsafeSer.deserialize(unsafeSer.serialize(bitmap)) assert(bitmap3.equals(bitmap)) // this will fail } {code} Upgrade to latest version 0.7.45 to fix it) > Upgrade RoaringBitmap to 0.9.22 > --- > > Key: SPARK-37025 > URL: https://issues.apache.org/jira/browse/SPARK-37025 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: Igor Dvorzhak >Assignee: Lantao Jin >Priority: Major > Labels: correctness > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22
Igor Dvorzhak created SPARK-37025: - Summary: Upgrade RoaringBitmap to 0.9.22 Key: SPARK-37025 URL: https://issues.apache.org/jira/browse/SPARK-37025 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.3, 2.4.0, 3.0.0 Reporter: Igor Dvorzhak Assignee: Lantao Jin Fix For: 2.3.4, 2.4.2, 3.0.0 HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But RoaringBitmap-0.5.11 couldn't be ser/deser with unsafe KryoSerializer. We can use below UT to reproduce: {code} test("kryo serialization with RoaringBitmap") { val bitmap = new RoaringBitmap bitmap.add(1787) val safeSer = new KryoSerializer(conf).newInstance() val bitmap2 : RoaringBitmap = safeSer.deserialize(safeSer.serialize(bitmap)) assert(bitmap2.equals(bitmap)) conf.set("spark.kryo.unsafe", "true") val unsafeSer = new KryoSerializer(conf).newInstance() val bitmap3 : RoaringBitmap = unsafeSer.deserialize(unsafeSer.serialize(bitmap)) assert(bitmap3.equals(bitmap)) // this will fail } {code} Upgrade to latest version 0.7.45 to fix it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27312) PropertyGraph <-> GraphX conversions
[ https://issues.apache.org/jira/browse/SPARK-27312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429577#comment-17429577 ] Munish Sharma commented on SPARK-27312: --- [~mengxr] can I start to work on this ? If so, please assign it to me. > PropertyGraph <-> GraphX conversions > > > Key: SPARK-27312 > URL: https://issues.apache.org/jira/browse/SPARK-27312 > Project: Spark > Issue Type: Story > Components: Graph, GraphX >Affects Versions: 3.0.0 >Reporter: Xiangrui Meng >Assignee: Weichen Xu >Priority: Major > > As a user, I can convert a GraphX graph into a PropertyGraph and a > PropertyGraph into a GraphX graph if they are compatible. > * Scala only > * Whether this is an internal API is pending design discussion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36992) Improve byte array sort perf by unify getPrefix function of UTF8String and ByteArray
[ https://issues.apache.org/jira/browse/SPARK-36992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-36992. -- Fix Version/s: 3.3.0 Assignee: XiDuo You Resolution: Fixed Resolved by https://github.com/apache/spark/pull/34267 > Improve byte array sort perf by unify getPrefix function of UTF8String and > ByteArray > > > Key: SPARK-36992 > URL: https://issues.apache.org/jira/browse/SPARK-36992 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Major > Fix For: 3.3.0 > > > When execute sort operator, we first compare the prefix. However the > getPrefix function of byte array is slow. We use first 8 bytes as the prefix, > so at most we will call 8 times with `Platform.getByte` which is slower than > call once with `Platform.getInt` or `Platform.getLong`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36992) Improve byte array sort perf by unify getPrefix function of UTF8String and ByteArray
[ https://issues.apache.org/jira/browse/SPARK-36992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-36992: - Priority: Minor (was: Major) > Improve byte array sort perf by unify getPrefix function of UTF8String and > ByteArray > > > Key: SPARK-36992 > URL: https://issues.apache.org/jira/browse/SPARK-36992 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Fix For: 3.3.0 > > > When execute sort operator, we first compare the prefix. However the > getPrefix function of byte array is slow. We use first 8 bytes as the prefix, > so at most we will call 8 times with `Platform.getByte` which is slower than > call once with `Platform.getInt` or `Platform.getLong`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-37008: Assignee: Yang Jie > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-37008: - Priority: Minor (was: Major) > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37008) WholeStageCodegenSparkSubmitSuite Failed with Java 17
[ https://issues.apache.org/jira/browse/SPARK-37008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-37008. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34286 [https://github.com/apache/spark/pull/34286] > WholeStageCodegenSparkSubmitSuite Failed with Java 17 > -- > > Key: SPARK-37008 > URL: https://issues.apache.org/jira/browse/SPARK-37008 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.3.0 > > > WholeStageCodegenSparkSubmitSuite test failed when use Java 17 > {code:java} > 2021-10-14 04:32:38.038 - stderr> Exception in thread "main" > org.scalatest.exceptions.TestFailedException: 16 was not greater than 16 > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:472) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:471) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1231) > 2021-10-14 04:32:38.038 - stderr> at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:1295) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite$.main(WholeStageCodegenSparkSubmitSuite.scala:82) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.sql.execution.WholeStageCodegenSparkSubmitSuite.main(WholeStageCodegenSparkSubmitSuite.scala) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > 2021-10-14 04:32:38.038 - stderr> at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > 2021-10-14 04:32:38.038 - stderr> at > java.base/java.lang.reflect.Method.invoke(Method.java:568) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) > 2021-10-14 04:32:38.038 - stderr> at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17
[ https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-36900: Assignee: (was: Yang Jie) > "SPARK-36464: size returns correct positive number even with over 2GB data" > will oom with JDK17 > > > Key: SPARK-36900 > URL: https://issues.apache.org/jira/browse/SPARK-36900 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > Execute > > {code:java} > build/mvn clean install -pl core -am -Dtest=none > -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite > {code} > with JDK 17, > {code:java} > ChunkedByteBufferOutputStreamSuite: > - empty output > - write a single byte > - write a single near boundary > - write a single at boundary > - single chunk output > - single chunk output at boundary size > - multiple chunk output > - multiple chunk output at boundary size > *** RUN ABORTED *** > java.lang.OutOfMemoryError: Java heap space > at java.base/java.lang.Integer.valueOf(Integer.java:1081) > at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75) > at java.base/java.io.OutputStream.write(OutputStream.java:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown > Source) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17
[ https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-36900. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34284 [https://github.com/apache/spark/pull/34284] > "SPARK-36464: size returns correct positive number even with over 2GB data" > will oom with JDK17 > > > Key: SPARK-36900 > URL: https://issues.apache.org/jira/browse/SPARK-36900 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > > Execute > > {code:java} > build/mvn clean install -pl core -am -Dtest=none > -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite > {code} > with JDK 17, > {code:java} > ChunkedByteBufferOutputStreamSuite: > - empty output > - write a single byte > - write a single near boundary > - write a single at boundary > - single chunk output > - single chunk output at boundary size > - multiple chunk output > - multiple chunk output at boundary size > *** RUN ABORTED *** > java.lang.OutOfMemoryError: Java heap space > at java.base/java.lang.Integer.valueOf(Integer.java:1081) > at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75) > at java.base/java.io.OutputStream.write(OutputStream.java:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown > Source) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17
[ https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-36900: Assignee: Yang Jie > "SPARK-36464: size returns correct positive number even with over 2GB data" > will oom with JDK17 > > > Key: SPARK-36900 > URL: https://issues.apache.org/jira/browse/SPARK-36900 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > > Execute > > {code:java} > build/mvn clean install -pl core -am -Dtest=none > -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite > {code} > with JDK 17, > {code:java} > ChunkedByteBufferOutputStreamSuite: > - empty output > - write a single byte > - write a single near boundary > - write a single at boundary > - single chunk output > - single chunk output at boundary size > - multiple chunk output > - multiple chunk output at boundary size > *** RUN ABORTED *** > java.lang.OutOfMemoryError: Java heap space > at java.base/java.lang.Integer.valueOf(Integer.java:1081) > at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75) > at java.base/java.io.OutputStream.write(OutputStream.java:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127) > at > org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown > Source) > at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36915) Pin actions to a full length commit SHA
[ https://issues.apache.org/jira/browse/SPARK-36915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-36915: - Issue Type: Improvement (was: Bug) Priority: Minor (was: Major) > Pin actions to a full length commit SHA > --- > > Key: SPARK-36915 > URL: https://issues.apache.org/jira/browse/SPARK-36915 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.1.2 >Reporter: Naveen S >Assignee: Naveen S >Priority: Minor > Fix For: 3.3.0 > > > Pinning an action to a full-length commit SHA is currently the only way to > use an action as > an immutable release. Pinning to a particular SHA helps mitigate the risk of > a bad actor adding a > backdoor to the action's repository, as they would need to generate a SHA-1 > collision for > a valid Git object payload. > [https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions] > [https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36915) Pin actions to a full length commit SHA
[ https://issues.apache.org/jira/browse/SPARK-36915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-36915. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34163 [https://github.com/apache/spark/pull/34163] > Pin actions to a full length commit SHA > --- > > Key: SPARK-36915 > URL: https://issues.apache.org/jira/browse/SPARK-36915 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.1.2 >Reporter: Naveen S >Assignee: Naveen S >Priority: Major > Fix For: 3.3.0 > > > Pinning an action to a full-length commit SHA is currently the only way to > use an action as > an immutable release. Pinning to a particular SHA helps mitigate the risk of > a bad actor adding a > backdoor to the action's repository, as they would need to generate a SHA-1 > collision for > a valid Git object payload. > [https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions] > [https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36915) Pin actions to a full length commit SHA
[ https://issues.apache.org/jira/browse/SPARK-36915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-36915: Assignee: Naveen S > Pin actions to a full length commit SHA > --- > > Key: SPARK-36915 > URL: https://issues.apache.org/jira/browse/SPARK-36915 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 3.1.2 >Reporter: Naveen S >Assignee: Naveen S >Priority: Major > > Pinning an action to a full-length commit SHA is currently the only way to > use an action as > an immutable release. Pinning to a particular SHA helps mitigate the risk of > a bad actor adding a > backdoor to the action's repository, as they would need to generate a SHA-1 > collision for > a valid Git object payload. > [https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-third-party-actions] > [https://github.com/ossf/scorecard/blob/main/docs/checks.md#pinned-dependencies] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37024) Even more decimal overflow issues in average
Robert Joseph Evans created SPARK-37024: --- Summary: Even more decimal overflow issues in average Key: SPARK-37024 URL: https://issues.apache.org/jira/browse/SPARK-37024 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0 Reporter: Robert Joseph Evans As a part of trying to accelerate the {{Decimal}} average aggregation on a [GPU|https://nvidia.github.io/spark-rapids/] I noticed a few issues around overflow. I think all of these can be fixed by replacing {{Average}} with explicit {{Sum}}, {{Count}}, and {{Divide}} operations for decimal instead of implicitly doing them. But the extra checks would come with a performance cost. This is related to SPARK-35955, but goes quite a bit beyond it. # There are no ANSI overflow checks on the summation portion of average. # Nulls are inserted/overflow is detected on summation differently depending on code generation and parallelism. # If the input decimal precision is 11 or below all overflow checks are disabled, and the answer is wrong instead of null on overflow. *Details:* *there are no ANSI overflow checks on the summation portion.* {code:scala} scala> spark.conf.set("spark.sql.ansi.enabled", "true") scala> spark.time(spark.range(201) .repartition(2) .selectExpr("id", "CAST('' AS DECIMAL(32, 0)) as v") .selectExpr("AVG(v)") .show(truncate = false)) +--+ |avg(v)| +--+ |null | +--+ Time taken: 622 ms scala> spark.time(spark.range(201) .repartition(2) .selectExpr("id", "CAST('' AS DECIMAL(32, 0)) as v") .selectExpr("SUM(v)") .show(truncate = false)) 21/10/16 06:08:00 ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 19) java.lang.ArithmeticException: Overflow in sum of decimals. ... {code} *nulls are inserted on summation overflow differently depending on code generation and parallelism.* Because there are no explicit overflow checks when doing the sum a user can get very inconsistent results for when a null is inserted on overflow. The checks really only take place when the {{Decimal}} value is converted and stored into an {{UnsafeRow}}. This happens when the values are shuffled, or after each operation if code gen is disabled. For a {{DECIMAL(32, 0)}} you can add 1,000,000 max values before the summation overflows. {code:scala} scala> spark.time(spark.range(100) .coalesce(2) .selectExpr("id", "CAST('' AS DECIMAL(32, 0)) as v") .selectExpr("SUM(v) as s", "COUNT(v) as c", "AVG(v) as a") .selectExpr("s", "c", "s/c as sum_div_count", "a") .show(truncate = false)) +--+---+---+-+ |s |c |sum_div_count |a| +--+---+---+-+ |00|100|.00|.| +--+---+---+-+ Time taken: 241 ms scala> spark.time(spark.range(200) .coalesce(2) .selectExpr("id", "CAST('' AS DECIMAL(32, 0)) as v") .selectExpr("SUM(v) as s", "COUNT(v) as c", "AVG(v) as a") .selectExpr("s", "c", "s/c as sum_div_count", "a") .show(truncate = false)) ++---+-+-+ |s |c |sum_div_count|a| ++---+-+-+ |null|200|null |.| ++---+-+-+ Time taken: 228 ms scala> spark.time(spark.range(300) .coalesce(2) .selectExpr("id", "CAST('' AS DECIMAL(32, 0)) as v") .selectExpr("SUM(v) as s", "COUNT(v) as c", "AVG(v) as a") .selectExpr("s", "c", "s/c as sum_div_count", "a") .show(truncate = false)) ++---+-++ |s |c |sum_div_count|a | ++---+-++ |null|300|null |null| ++---+-++ Time taken: 347 ms scala> spark.conf.set("spark.sql.codegen.wholeStage", "false") scala> spark.time(spark.range(101) .coalesce(2) .selectExpr("id", "CAST('' AS DECIMAL(32, 0)) as v") .selectExpr("SUM(v) as s", "COUNT(v) as c", "AVG(v) as a") .selectExpr("s", "c", "s/c as sum_div_count", "a") .show(truncate = false)) ++---+-++ |s |c
[jira] [Updated] (SPARK-37021) JDBC option "sessionInitStatement" is ignored in resolveTable
[ https://issues.apache.org/jira/browse/SPARK-37021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valery Meleshkin updated SPARK-37021: - Summary: JDBC option "sessionInitStatement" is ignored in resolveTable (was: JDBC option "sessionInitStatement" does not execute set sql statement when resolving a table) > JDBC option "sessionInitStatement" is ignored in resolveTable > - > > Key: SPARK-37021 > URL: https://issues.apache.org/jira/browse/SPARK-37021 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Valery Meleshkin >Priority: Major > > If {{sessionInitStatement}} is required to grant permissions or resolve an > ambiguity, schema resolution will fail when reading a JDBC table. > Consider the following example running against Oracle database: > {code:scala} > reader.format("jdbc").options( > Map( > "url" -> jdbcUrl, > "dbtable" -> "FOO", > "user" -> "BOB", > "sessionInitStatement" -> """ALTER SESSION SET CURRENT_SCHEMA = "BAR, > "password" -> password > )).load > {code} > Table {{FOO}} is in schema {{BAR}}, but default value for {{CURRENT_SCHEMA}} > for the JDBC connection will be {{BOB}}. Therefore, the code above will fail > with an error ({{ORA-00942: table or view does not exist}} if it's Oracle). > It happens because [resolveTable > |https://github.com/apache/spark/blob/9d061e3939a021c602c070fc13cef951a8f94c82/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L67] > that is called during planning phase ignores {{sessionInitStatement}}. > This might sound like an artificial example, in a simple case like the one > above it's easy enough to specify {{"BOB.FOO"}} as {{dbtable}}. But when > {{sessionInitStatement}} contains a more complicated setup it might not be as > straightforward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37021) JDBC option "sessionInitStatement" does not execute set sql statement when resolving a table
[ https://issues.apache.org/jira/browse/SPARK-37021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Valery Meleshkin updated SPARK-37021: - Description: If {{sessionInitStatement}} is required to grant permissions or resolve an ambiguity, schema resolution will fail when reading a JDBC table. Consider the following example running against Oracle database: {code:scala} reader.format("jdbc").options( Map( "url" -> jdbcUrl, "dbtable" -> "FOO", "user" -> "BOB", "sessionInitStatement" -> """ALTER SESSION SET CURRENT_SCHEMA = "BAR, "password" -> password )).load {code} Table {{FOO}} is in schema {{BAR}}, but default value for {{CURRENT_SCHEMA}} for the JDBC connection will be {{BOB}}. Therefore, the code above will fail with an error ({{ORA-00942: table or view does not exist}} if it's Oracle). It happens because [resolveTable |https://github.com/apache/spark/blob/9d061e3939a021c602c070fc13cef951a8f94c82/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L67] that is called during planning phase ignores {{sessionInitStatement}}. This might sound like an artificial example, in a simple case like the one above it's easy enough to specify {{"BOB.FOO"}} as {{dbtable}}. But when {{sessionInitStatement}} contains a more complicated setup it might not be as straightforward. was: If {{sessionInitStatement}} is required to grant permissions or resolve an ambiguity, schema resolution will fail when reading a JDBC table. Consider the following example running against Oracle database: {code:scala} reader.format("jdbc").options( Map( "url" -> jdbcUrl, "dbtable" -> "SELECT * FROM FOO", "user" -> "BOB", "sessionInitStatement" -> """ALTER SESSION SET CURRENT_SCHEMA = "BAR, "password" -> password )).load {code} Table {{FOO}} is in schema {{BAR}}, but default value for {{CURRENT_SCHEMA}} for the JDBC connection will be {{BOB}}. Therefore, the code above will fail with an error ({{ORA-00942: table or view does not exist}} if it's Oracle). It happens because [resolveTable |https://github.com/apache/spark/blob/9d061e3939a021c602c070fc13cef951a8f94c82/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L67] that is called during planning phase ignores {{sessionInitStatement}}. > JDBC option "sessionInitStatement" does not execute set sql statement when > resolving a table > > > Key: SPARK-37021 > URL: https://issues.apache.org/jira/browse/SPARK-37021 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.2 >Reporter: Valery Meleshkin >Priority: Major > > If {{sessionInitStatement}} is required to grant permissions or resolve an > ambiguity, schema resolution will fail when reading a JDBC table. > Consider the following example running against Oracle database: > {code:scala} > reader.format("jdbc").options( > Map( > "url" -> jdbcUrl, > "dbtable" -> "FOO", > "user" -> "BOB", > "sessionInitStatement" -> """ALTER SESSION SET CURRENT_SCHEMA = "BAR, > "password" -> password > )).load > {code} > Table {{FOO}} is in schema {{BAR}}, but default value for {{CURRENT_SCHEMA}} > for the JDBC connection will be {{BOB}}. Therefore, the code above will fail > with an error ({{ORA-00942: table or view does not exist}} if it's Oracle). > It happens because [resolveTable > |https://github.com/apache/spark/blob/9d061e3939a021c602c070fc13cef951a8f94c82/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRDD.scala#L67] > that is called during planning phase ignores {{sessionInitStatement}}. > This might sound like an artificial example, in a simple case like the one > above it's easy enough to specify {{"BOB.FOO"}} as {{dbtable}}. But when > {{sessionInitStatement}} contains a more complicated setup it might not be as > straightforward. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.
[ https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz updated SPARK-37022: --- Description: [{{black}}|https://github.com/psf/black] is a popular Python code formatter. It is used by a number of projects, both small and large, including prominent ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used to format a {{pyspark.pandas}} and (though not enforced) stubs files. We should consider using black to enforce formatting of all PySpark files. There are multiple reasons to do that: - Consistency: black is already used across existing codebase and black formatted chunks of code are already added to modules other than pyspark.pandas as a result of type hints inlining (SPARK-36845). - Lower cost of contributing and reviewing: Formatting can be automatically enforced and applied. - Simplify reviews: In general, black formatted code, produces small and highly readable diffs. - Reduce effort required to maintain patched forks: smaller diffs + predictable formatting. Risks: - Initial reformatting requires quite significant changes. - Applying black will break blame in GitHub UI (for git in general see [Avoiding ruining git blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). Additional steps: - To simplify backporting, black will have to be applied to all active branches. was: [{{black}}|https://github.com/psf/black] is a popular Python code formatter. It is used by a number of projects, both small and large, including prominent ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used to format a {{pyspark.pandas}} and (though not enforced) stubs files. We should consider using black to enforce formatting of all PySpark files. There are multiple reasons to do that: - Consistency: black is already used across existing codebase and black formatted chunks of code are already added to modules other than pyspark.pandas as a result of type hints inlining (SPARK-36845). - Lower cost of contributing and reviewing: Formatting can be automatically enforced and applied. - Simplify reviews: In general, black formatted code, produces small and highly readable diffs. Risks: - Initial reformatting requires quite significant changes. - Applying black will break blame in GitHub UI (for git in general see [Avoiding ruining git blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). Additional steps: - To simplify backporting, black will have to be applied to all active branches. > Use black as a formatter for the whole PySpark codebase. > > > Key: SPARK-37022 > URL: https://issues.apache.org/jira/browse/SPARK-37022 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > Attachments: black-diff-stats.txt, pyproject.toml > > > [{{black}}|https://github.com/psf/black] is a popular Python code formatter. > It is used by a number of projects, both small and large, including prominent > ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used > to format a {{pyspark.pandas}} and (though not enforced) stubs files. > We should consider using black to enforce formatting of all PySpark files. > There are multiple reasons to do that: > - Consistency: black is already used across existing codebase and black > formatted chunks of code are already added to modules other than > pyspark.pandas as a result of type hints inlining (SPARK-36845). > - Lower cost of contributing and reviewing: Formatting can be automatically > enforced and applied. > - Simplify reviews: In general, black formatted code, produces small and > highly readable diffs. > - Reduce effort required to maintain patched forks: smaller diffs + > predictable formatting. > Risks: > - Initial reformatting requires quite significant changes. > - Applying black will break blame in GitHub UI (for git in general see > [Avoiding ruining git > blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). > Additional steps: > - To simplify backporting, black will have to be applied to all active > branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.
[ https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17429532#comment-17429532 ] Maciej Szymkiewicz commented on SPARK-37022: Looking at the diffs, the majority of changes will have minimal impact (magic trailing commas, switching to double quotes). h3. h3. > Use black as a formatter for the whole PySpark codebase. > > > Key: SPARK-37022 > URL: https://issues.apache.org/jira/browse/SPARK-37022 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > Attachments: black-diff-stats.txt, pyproject.toml > > > [{{black}}|https://github.com/psf/black] is a popular Python code formatter. > It is used by a number of projects, both small and large, including prominent > ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used > to format a {{pyspark.pandas}} and (though not enforced) stubs files. > We should consider using black to enforce formatting of all PySpark files. > There are multiple reasons to do that: > - Consistency: black is already used across existing codebase and black > formatted chunks of code are already added to modules other than > pyspark.pandas as a result of type hints inlining (SPARK-36845). > - Lower cost of contributing and reviewing: Formatting can be automatically > enforced and applied. > - Simplify reviews: In general, black formatted code, produces small and > highly readable diffs. > Risks: > - Initial reformatting requires quite significant changes. > - Applying black will break blame in GitHub UI (for git in general see > [Avoiding ruining git > blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). > Additional steps: > - To simplify backporting, black will have to be applied to all active > branches. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36232) Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled
[ https://issues.apache.org/jira/browse/SPARK-36232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36232. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34299 [https://github.com/apache/spark/pull/34299] > Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled > > > Key: SPARK-36232 > URL: https://issues.apache.org/jira/browse/SPARK-36232 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Priority: Major > Fix For: 3.3.0 > > > > {code:java} > >>> import decimal as d > >>> import pyspark.pandas as ps > >>> import numpy as np > >>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled', > >>> True) > >>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)]) > 0 1 > 1 2 > 2None > dtype: object > >>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled', > >>> False) > >>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)]) > 21/07/02 15:01:07 ERROR Executor: Exception in task 6.0 in stage 13.0 (TID 51) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > ... > {code} > As the code is shown above, we cannot create a Series with `Decimal('NaN')` > when Arrow disabled. We ought to fix that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36232) Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled
[ https://issues.apache.org/jira/browse/SPARK-36232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36232: Assignee: Yikun Jiang > Support creating a ps.Series/Index with `Decimal('NaN')` with Arrow disabled > > > Key: SPARK-36232 > URL: https://issues.apache.org/jira/browse/SPARK-36232 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > > {code:java} > >>> import decimal as d > >>> import pyspark.pandas as ps > >>> import numpy as np > >>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled', > >>> True) > >>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)]) > 0 1 > 1 2 > 2None > dtype: object > >>> ps.utils.default_session().conf.set('spark.sql.execution.arrow.pyspark.enabled', > >>> False) > >>> ps.Series([d.Decimal(1.0), d.Decimal(2.0), d.Decimal(np.nan)]) > 21/07/02 15:01:07 ERROR Executor: Exception in task 6.0 in stage 13.0 (TID 51) > net.razorvine.pickle.PickleException: problem construction object: > java.lang.reflect.InvocationTargetException > ... > {code} > As the code is shown above, we cannot create a Series with `Decimal('NaN')` > when Arrow disabled. We ought to fix that. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36230) hasnans for Series of Decimal(`NaN`)
[ https://issues.apache.org/jira/browse/SPARK-36230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-36230. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34299 [https://github.com/apache/spark/pull/34299] > hasnans for Series of Decimal(`NaN`) > > > Key: SPARK-36230 > URL: https://issues.apache.org/jira/browse/SPARK-36230 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Yikun Jiang >Priority: Major > Fix For: 3.3.0 > > > {code:java} > >>> import pandas as pd > >>> pser = pd.Series([Decimal('0.1'), Decimal('NaN')]) > >>> pser > 00.1 > 1NaN > dtype: object > >>> psser = ps.from_pandas(pser) > >>> psser > 0 0.1 > 1None > dtype: object > >>> psser.hasnans > False > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-36230) hasnans for Series of Decimal(`NaN`)
[ https://issues.apache.org/jira/browse/SPARK-36230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-36230: Assignee: Yikun Jiang > hasnans for Series of Decimal(`NaN`) > > > Key: SPARK-36230 > URL: https://issues.apache.org/jira/browse/SPARK-36230 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.2.0 >Reporter: Xinrong Meng >Assignee: Yikun Jiang >Priority: Major > > {code:java} > >>> import pandas as pd > >>> pser = pd.Series([Decimal('0.1'), Decimal('NaN')]) > >>> pser > 00.1 > 1NaN > dtype: object > >>> psser = ps.from_pandas(pser) > >>> psser > 0 0.1 > 1None > dtype: object > >>> psser.hasnans > False > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org