[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-14 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-18753:
---
Fix Version/s: 2.2.0

> Inconsistent behavior after writing to parquet files
> 
>
> Key: SPARK-18753
> URL: https://issues.apache.org/jira/browse/SPARK-18753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
> Fix For: 2.1.0, 2.2.0
>
>
> Found an inconsistent behavior when using parquet.
> {code}
> scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
> java.lang.Boolean, new java.lang.Boolean(false)).toDS
> ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]
> scala> ds.filter('value === "true").show
> +-+
> |value|
> +-+
> +-+
> {code}
> In the above example, `ds.filter('value === "true")` returns nothing as 
> "true" will be converted to null and the filter expression will be always 
> null, so it drops all rows.
> However, if I store `ds` to a parquet file and read it back, `filter('value 
> === "true")` will return non null values.
> {code}
> scala> ds.write.parquet("testfile")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> scala> val ds2 = spark.read.parquet("testfile")
> ds2: org.apache.spark.sql.DataFrame = [value: boolean]
> scala> ds2.filter('value === "true").show
> +-+
> |value|
> +-+
> | true|
> |false|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-14 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian updated SPARK-18753:
---
Assignee: Hyukjin Kwon

> Inconsistent behavior after writing to parquet files
> 
>
> Key: SPARK-18753
> URL: https://issues.apache.org/jira/browse/SPARK-18753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
>Assignee: Hyukjin Kwon
> Fix For: 2.1.0, 2.2.0
>
>
> Found an inconsistent behavior when using parquet.
> {code}
> scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
> java.lang.Boolean, new java.lang.Boolean(false)).toDS
> ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]
> scala> ds.filter('value === "true").show
> +-+
> |value|
> +-+
> +-+
> {code}
> In the above example, `ds.filter('value === "true")` returns nothing as 
> "true" will be converted to null and the filter expression will be always 
> null, so it drops all rows.
> However, if I store `ds` to a parquet file and read it back, `filter('value 
> === "true")` will return non null values.
> {code}
> scala> ds.write.parquet("testfile")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> scala> val ds2 = spark.read.parquet("testfile")
> ds2: org.apache.spark.sql.DataFrame = [value: boolean]
> scala> ds2.filter('value === "true").show
> +-+
> |value|
> +-+
> | true|
> |false|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-06 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18753:
-
Description: 
Found an inconsistent behavior when using parquet.

{code}
scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
java.lang.Boolean, new java.lang.Boolean(false)).toDS
ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

scala> ds.filter('value === "true").show
+-+
|value|
+-+
+-+

{code}

In the above example, `ds.filter('value === "true")` returns nothing as "true" 
will be converted to null and the filter expression will be always null, so it 
drops all rows.

However, if I store `ds` to a parquet file and read it back, `filter('value === 
"true")` will return non null values.

{code}
scala> ds.write.parquet("testfile")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> val ds2 = spark.read.parquet("testfile")
ds2: org.apache.spark.sql.DataFrame = [value: boolean]

scala> ds2.filter('value === "true").show
+-+
|value|
+-+
| true|
|false|
+-+

{code}

  was:
Found an inconsistent behavior when using parquet.

{code}
scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
java.lang.Boolean, new java.lang.Boolean(false)).toDS
ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

scala> ds.filter('value === "true").show
+-+
|value|
+-+
+-+

{code}

In the above example, `ds.filter('value === "true")` returns nothing as "true" 
will be converted to null and the filter expression will be always null.

However, if I store `ds` to a parquet file and read it back, `filter('value === 
"true")` will return non null values.

{code}
scala> ds.write.parquet("testfile")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> val ds2 = spark.read.parquet("testfile")
ds2: org.apache.spark.sql.DataFrame = [value: boolean]

scala> ds2.filter('value === "true").show
+-+
|value|
+-+
| true|
|false|
+-+

{code}


> Inconsistent behavior after writing to parquet files
> 
>
> Key: SPARK-18753
> URL: https://issues.apache.org/jira/browse/SPARK-18753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
>
> Found an inconsistent behavior when using parquet.
> {code}
> scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
> java.lang.Boolean, new java.lang.Boolean(false)).toDS
> ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]
> scala> ds.filter('value === "true").show
> +-+
> |value|
> +-+
> +-+
> {code}
> In the above example, `ds.filter('value === "true")` returns nothing as 
> "true" will be converted to null and the filter expression will be always 
> null, so it drops all rows.
> However, if I store `ds` to a parquet file and read it back, `filter('value 
> === "true")` will return non null values.
> {code}
> scala> ds.write.parquet("testfile")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> scala> val ds2 = spark.read.parquet("testfile")
> ds2: org.apache.spark.sql.DataFrame = [value: boolean]
> scala> ds2.filter('value === "true").show
> +-+
> |value|
> +-+
> | true|
> |false|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-06 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18753:
-
Description: 
Found an inconsistent behavior when using parquet.

{code}
scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
java.lang.Boolean, new java.lang.Boolean(false)).toDS
ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

scala> ds.filter('value === "true").show
+-+
|value|
+-+
+-+

{code}

In the above example, `ds.filter('value === "true")` returns nothing as "true" 
will be converted to null and the filter expression will be always null.

However, if I store `ds` to a parquet file and read it back, `filter('value === 
"true")` will return non null values.

{code}
scala> ds.write.parquet("testfile")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> val ds2 = spark.read.parquet("testfile")
ds2: org.apache.spark.sql.DataFrame = [value: boolean]

scala> ds2.filter('value === "true").show
+-+
|value|
+-+
| true|
|false|
+-+

{code}

  was:
Found an inconsistent behavior when using parquet.

{code}
scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
java.lang.Boolean, new java.lang.Boolean(false)).toDS
ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

scala> ds.filter('value === "true").show
+-+
|value|
+-+
+-+

{code}

In the above example, `ds.filter('value === "true")` returns nothing as "true" 
will be converted to null and the filter expression will always null.

However, if I store `ds` to a parquet file and read it back, `filter('value === 
"true")` will return non null values.

{code}
scala> ds.write.parquet("testfile")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> val ds2 = spark.read.parquet("testfile")
ds2: org.apache.spark.sql.DataFrame = [value: boolean]

scala> ds2.filter('value === "true").show
+-+
|value|
+-+
| true|
|false|
+-+

{code}


> Inconsistent behavior after writing to parquet files
> 
>
> Key: SPARK-18753
> URL: https://issues.apache.org/jira/browse/SPARK-18753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
>
> Found an inconsistent behavior when using parquet.
> {code}
> scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
> java.lang.Boolean, new java.lang.Boolean(false)).toDS
> ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]
> scala> ds.filter('value === "true").show
> +-+
> |value|
> +-+
> +-+
> {code}
> In the above example, `ds.filter('value === "true")` returns nothing as 
> "true" will be converted to null and the filter expression will be always 
> null.
> However, if I store `ds` to a parquet file and read it back, `filter('value 
> === "true")` will return non null values.
> {code}
> scala> ds.write.parquet("testfile")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> scala> val ds2 = spark.read.parquet("testfile")
> ds2: org.apache.spark.sql.DataFrame = [value: boolean]
> scala> ds2.filter('value === "true").show
> +-+
> |value|
> +-+
> | true|
> |false|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18753) Inconsistent behavior after writing to parquet files

2016-12-06 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18753:
-
Description: 
Found an inconsistent behavior when using parquet.

{code}
scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
java.lang.Boolean, new java.lang.Boolean(false)).toDS
ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

scala> ds.filter('value === "true").show
+-+
|value|
+-+
+-+

{code}

In the above example, `ds.filter('value === "true")` returns nothing as "true" 
will be converted to null and the filter expression will always null.

However, if I store `ds` to a parquet file and read it back, `filter('value === 
"true")` will return non null values.

{code}
scala> ds.write.parquet("testfile")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> val ds2 = spark.read.parquet("testfile")
ds2: org.apache.spark.sql.DataFrame = [value: boolean]

scala> ds2.filter('value === "true").show
+-+
|value|
+-+
| true|
|false|
+-+

{code}

  was:
Found an inconsistent behavior when using parquet.

{code}
scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
java.lang.Boolean, new java.lang.Boolean(false)).toDS
ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]

scala> ds.filter('value === "true").show
+-+
|value|
+-+
+-+

{code}

In the avoid example, `ds.filter('value === "true")` returns nothing as "true" 
will be converted to null and the filter expression will always null.

However, if I store `ds` to a parquet file and read it back, `filter('value === 
"true")` will return non null values.

{code}
scala> ds.write.parquet("testfile")
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.

scala> val ds2 = spark.read.parquet("testfile")
ds2: org.apache.spark.sql.DataFrame = [value: boolean]

scala> ds2.filter('value === "true").show
+-+
|value|
+-+
| true|
|false|
+-+

{code}


> Inconsistent behavior after writing to parquet files
> 
>
> Key: SPARK-18753
> URL: https://issues.apache.org/jira/browse/SPARK-18753
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Shixiong Zhu
>
> Found an inconsistent behavior when using parquet.
> {code}
> scala> val ds = Seq[java.lang.Boolean](new java.lang.Boolean(true), null: 
> java.lang.Boolean, new java.lang.Boolean(false)).toDS
> ds: org.apache.spark.sql.Dataset[Boolean] = [value: boolean]
> scala> ds.filter('value === "true").show
> +-+
> |value|
> +-+
> +-+
> {code}
> In the above example, `ds.filter('value === "true")` returns nothing as 
> "true" will be converted to null and the filter expression will always null.
> However, if I store `ds` to a parquet file and read it back, `filter('value 
> === "true")` will return non null values.
> {code}
> scala> ds.write.parquet("testfile")
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> scala> val ds2 = spark.read.parquet("testfile")
> ds2: org.apache.spark.sql.DataFrame = [value: boolean]
> scala> ds2.filter('value === "true").show
> +-+
> |value|
> +-+
> | true|
> |false|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org