[ 
https://issues.apache.org/jira/browse/SPARK-21450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hossein Falaki updated SPARK-21450:
-----------------------------------
    Description: 
Consider the following two cases copied from {{test_sparkSQL.R}}:

{code}
df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
schema <- structType(structField("date", "date"))
s1 <- collect(select(df, from_json(df$col, schema)))
s2 <- collect(select(df, from_json(df$col, schema, dateFormat = "dd/MM/yyyy")))
{code}

If you inspect s1 using {{str(s1)}} you will find:
{code}
'data.frame':   2 obs. of  1 variable:
 $ jsontostructs(col):List of 2
  ..$ : logi NA
{code}

But for s2, running {{str(s2)}} results in:
{code}
'data.frame':   2 obs. of  1 variable:
 $ jsontostructs(col):List of 2
  ..$ :List of 1
  .. ..$ date: Date, format: "2014-10-21"
  .. ..- attr(*, "class")= chr "struct"
{code}

I assume this is not intentional and is just a subtle bug. Do you think 
otherwise? [~shivaram] and [~felixcheung]


  was:
Consider the following two cases copied from {{test_sparkSQL.R}}:

{code}
df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
schema <- structType(structField("date", "date"))
s1 <- collect(select(df, from_json(df$col, schema)))
s2 <- collect(select(df, from_json(df$col, schema2, dateFormat = "dd/MM/yyyy")))
{code}

If you inspect s1 using {{str(s1)}} you will find:
{code}
'data.frame':   2 obs. of  1 variable:
 $ jsontostructs(col):List of 2
  ..$ : logi NA
{code}

But for s2, running {{str(s2)}} results in:
{code}
'data.frame':   2 obs. of  1 variable:
 $ jsontostructs(col):List of 2
  ..$ :List of 1
  .. ..$ date: Date, format: "2014-10-21"
  .. ..- attr(*, "class")= chr "struct"
{code}

I assume this is not intentional and is just a subtle bug. Do you think 
otherwise? [~shivaram] and [~felixcheung]



> List of NA is flattened inside a SparkR struct type
> ---------------------------------------------------
>
>                 Key: SPARK-21450
>                 URL: https://issues.apache.org/jira/browse/SPARK-21450
>             Project: Spark
>          Issue Type: Bug
>          Components: SparkR
>    Affects Versions: 2.2.0
>            Reporter: Hossein Falaki
>
> Consider the following two cases copied from {{test_sparkSQL.R}}:
> {code}
> df <- as.DataFrame(list(list("col" = "{\"date\":\"21/10/2014\"}")))
> schema <- structType(structField("date", "date"))
> s1 <- collect(select(df, from_json(df$col, schema)))
> s2 <- collect(select(df, from_json(df$col, schema, dateFormat = 
> "dd/MM/yyyy")))
> {code}
> If you inspect s1 using {{str(s1)}} you will find:
> {code}
> 'data.frame': 2 obs. of  1 variable:
>  $ jsontostructs(col):List of 2
>   ..$ : logi NA
> {code}
> But for s2, running {{str(s2)}} results in:
> {code}
> 'data.frame': 2 obs. of  1 variable:
>  $ jsontostructs(col):List of 2
>   ..$ :List of 1
>   .. ..$ date: Date, format: "2014-10-21"
>   .. ..- attr(*, "class")= chr "struct"
> {code}
> I assume this is not intentional and is just a subtle bug. Do you think 
> otherwise? [~shivaram] and [~felixcheung]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to