[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-11 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764769#comment-16764769
 ] 

Gabor Somogyi commented on SPARK-26845:
---

[~Gengliang.Wang] Thanks for the confirmation! Hope you're refreshed :) I've 
asked things in mail (yeah, mail because not a bug no a feature).

> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-11 Thread Gengliang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764747#comment-16764747
 ] 

Gengliang Wang commented on SPARK-26845:


[~attilapiros]Thanks for the help!
[~gsomogyi] Sorry for the late reply. I was on vacation. You can see the Avro 
schema by 

{code:java}
SchemaConverters.toAvroType(df.schema).toString(true)
{code}


> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-08 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763868#comment-16763868
 ] 

Gabor Somogyi commented on SPARK-26845:
---

I think I've found the reason for the second question: 
https://github.com/apache/spark/pull/23735
Closing this jira...

> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-08 Thread Gabor Somogyi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763848#comment-16763848
 ] 

Gabor Somogyi commented on SPARK-26845:
---

[~attilapiros] Thanks for the help, this explains why the mentioned test was 
failing. I think the original issue is not valid on the other hand it's a good 
question why it's not working without topLevelRecord. [~Gengliang.Wang]?

> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-08 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763391#comment-16763391
 ] 

Dongjoon Hyun commented on SPARK-26845:
---

cc [~Gengliang.Wang]

> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-07 Thread Attila Zsolt Piros (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763094#comment-16763094
 ] 

Attila Zsolt Piros commented on SPARK-26845:


This also works:
{code}
test("roundtrip in to_avro and from_avro - string") {
val df = spark.createDataset(Seq("1", "2", 
"3")).select('value.cast("string").as("str"))

val avroDF = df.select(to_avro('str).as("b"))
val avroTypeStr = s"""
  |{
  |   "type": "record",
  |   "name": "topLevelRecord",
  |   "fields": [
  | {
  |   "name": "str",
  |   "type": ["string", "null"]
  | }
  |   ]
  |}""".stripMargin
checkAnswer(
  avroDF.select(from_avro('b, avroTypeStr).as("rec")).select($"rec.str"),
  df)
  }
{code}
I have introduced a topLevelRecord as at the top level union types is not 
allowed / not working (good question why), I mean this:
{code:javascript}
  {
"name": "str",
"type": ["string", "null"]
  }
{code}
Throws an exception:
{noformat}
org.apache.avro.SchemaParseException: No type: 
{"name":"str","type":["string","null"]} 
{noformat}

> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26845) Avro to_avro from_avro roundtrip fails if data type is string

2019-02-07 Thread Attila Zsolt Piros (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763066#comment-16763066
 ] 

Attila Zsolt Piros commented on SPARK-26845:


The test would work if you replace the line
{code:java}
val df = spark.createDataset(Seq("1", "2", 
"3")).select('value.cast("string").as("str"))
{code}
with
{code:java}
val df = spark.range(3).select('id.cast("string").as("str"))
{code}

*And the difference is caused by the nullable flag of the _StructField_.*

For the _Seq_ you used the schema is:
{code:java}
scala> spark.createDataset(Seq("1", "2", 
"3")).select('value.cast("string").as("str")).schema 
res0: org.apache.spark.sql.types.StructType = 
StructType(StructField(str,StringType,true))
{code}
And for the range:
{code:java}
scala> spark.range(3).select('id.cast("string").as("str")).schema 
res1: org.apache.spark.sql.types.StructType = 
StructType(StructField(str,StringType,false))
{code}
So in your case the _avroTypeStr_ does not match to the data.

> Avro to_avro from_avro roundtrip fails if data type is string
> -
>
> Key: SPARK-26845
> URL: https://issues.apache.org/jira/browse/SPARK-26845
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Gabor Somogyi
>Priority: Critical
>  Labels: correctness
>
> I was playing with AvroFunctionsSuite and created a situation where test 
> fails which I believe it shouldn't:
> {code:java}
>   test("roundtrip in to_avro and from_avro - string") {
> val df = spark.createDataset(Seq("1", "2", 
> "3")).select('value.cast("string").as("str"))
> val avroDF = df.select(to_avro('str).as("b"))
> val avroTypeStr = s"""
>   |{
>   |  "type": "string",
>   |  "name": "str"
>   |}
> """.stripMargin
> checkAnswer(avroDF.select(from_avro('b, avroTypeStr)), df)
>   }
> {code}
> {code:java}
> == Results ==
> !== Correct Answer - 3 ==   == Spark Answer - 3 ==
> !struct struct
> ![1][]
> ![2][]
> ![3][]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org