[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30082: -- Affects Version/s: 2.0.2 2.1.3 2.2.3 2.3.4 > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4 >Reporter: John Ayad >Assignee: John Ayad >Priority: Major > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-30082: Labels: correctness (was: ) > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: John Ayad >Assignee: John Ayad >Priority: Major > Labels: correctness > Fix For: 2.4.5, 3.0.0 > > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-30082: Fix Version/s: 2.4.5 > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: John Ayad >Assignee: John Ayad >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-30082: -- Component/s: (was: PySpark) SQL > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Major > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30082: - Priority: Major (was: Critical) > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Major > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-30082: - Flags: (was: Important) > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Critical > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Ayad updated SPARK-30082: -- Priority: Critical (was: Major) > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Critical > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs
[ https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Ayad updated SPARK-30082: -- Description: If you attempt to run {code:java} df = df.replace(float('nan'), somethingToReplaceWith) {code} It will replace all {{0}} s in columns of type {{Integer}} Example code snippet to repro this: {code:java} from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show() {code} Here's the output I get when I run this code: {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> spark = SQLContext(sc).sparkSession >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ |1|0| |2|3| |3|0| +-+-+ >>> df = df.replace(float('nan'), 5) >>> df.show() +-+-+ |index|value| +-+-+ |1|5| |2|3| |3|5| +-+-+ >>> {code} was: If you attempt to run {code} df = df.replace(float('nan'), somethingToReplaceWith) {code} It will replace all {{0}}s in columns of type {{Integer}} Example code snippet to repro this: {code} from pyspark.sql import SQLContext spark = SQLContext(sc).sparkSession df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) df.show() df = df.replace(float('nan'), 5) df.show() {code} Here's the output I get when I run this code: {code} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.4.4 /_/ Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) SparkSession available as 'spark'. >>> from pyspark.sql import SQLContext >>> spark = SQLContext(sc).sparkSession >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) >>> df.show() +-+-+ |index|value| +-+-+ |1|0| |2|3| |3|0| +-+-+ >>> df = df.replace(float('nan'), 5) >>> df.show() +-+-+ |index|value| +-+-+ |1|5| |2|3| |3|5| +-+-+ >>> {code} > Zeros are being treated as NaNs > --- > > Key: SPARK-30082 > URL: https://issues.apache.org/jira/browse/SPARK-30082 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: John Ayad >Priority: Major > > If you attempt to run > {code:java} > df = df.replace(float('nan'), somethingToReplaceWith) > {code} > It will replace all {{0}} s in columns of type {{Integer}} > Example code snippet to repro this: > {code:java} > from pyspark.sql import SQLContext > spark = SQLContext(sc).sparkSession > df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > df.show() > df = df.replace(float('nan'), 5) > df.show() > {code} > Here's the output I get when I run this code: > {code:java} > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.4.4 > /_/ > Using Python version 3.7.5 (default, Nov 1 2019 02:16:32) > SparkSession available as 'spark'. > >>> from pyspark.sql import SQLContext > >>> spark = SQLContext(sc).sparkSession > >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value")) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|0| > |2|3| > |3|0| > +-+-+ > >>> df = df.replace(float('nan'), 5) > >>> df.show() > +-+-+ > |index|value| > +-+-+ > |1|5| > |2|3| > |3|5| > +-+-+ > >>> > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org