from:"Jay Teguh Wijaya Purwanto \(JIRA\)"

[jira] [Comment Edited] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-07-30 Thread Jay Teguh Wijaya Purwanto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400658#comment-15400658
 ] 

Jay Teguh Wijaya Purwanto edited comment on SPARK-16700 at 7/30/16 12:34 PM:
-

When using `Row` object, but with multiple struct types, also returns similar 
error:

{code}
_struct = [
  SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
  SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
  SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])

## Both methods do not work:
# _schema = SparkTypes.StructType()
# for _s in _struct:
#   _schema.add(_s)
_schema = SparkTypes.StructType(_struct)

_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
{code}

Returned error:

{code}
DoubleType can not accept object '1' in type 
{code}


was (Author: jaycode):
When using `Row` object, but with multiple struct types, also returns similar 
error:

```
_struct = [
  SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
  SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
  SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])

## Both methods do not work:
# _schema = SparkTypes.StructType()
# for _s in _struct:
#   _schema.add(_s)
_schema = SparkTypes.StructType(_struct)

_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
```

Returned error:

```
DoubleType can not accept object '1' in type 
```

> StructType doesn't accept Python dicts anymore
> --
>
> Key: SPARK-16700
> URL: https://issues.apache.org/jira/browse/SPARK-16700
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>
> Hello,
> I found this issue while testing my codebase with 2.0.0-rc5
> StructType in Spark 1.6.2 accepts the Python  type, which is very 
> handy. 2.0.0-rc5 does not and throws an error.
> I don't know if this was intended but I'd advocate for this behaviour to 
> remain the same. MapType is probably wasteful when your key names never 
> change and switching to Python tuples would be cumbersome.
> Here is a minimal script to reproduce the issue: 
> {code}
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> struct_schema = SparkTypes.StructType([
> SparkTypes.StructField("id", SparkTypes.LongType())
> ])
> rdd = sc.parallelize([{"id": 0}, {"id": 1}])
> df = sqlc.createDataFrame(rdd, struct_schema)
> print df.collect()
> # 1.6.2 prints [Row(id=0), Row(id=1)]
> # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in 
> type 
> {code}
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16700) StructType doesn't accept Python dicts anymore

2016-07-30 Thread Jay Teguh Wijaya Purwanto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400658#comment-15400658
 ] 

Jay Teguh Wijaya Purwanto commented on SPARK-16700:
---

When using `Row` object, but with multiple struct types, also returns similar 
error:

```
_struct = [
  SparkTypes.StructField('string_field', SparkTypes.StringType(), True),
  SparkTypes.StructField('long_field', SparkTypes.LongType(), True),
  SparkTypes.StructField('double_field', SparkTypes.DoubleType(), True)
]
_rdd = sc.parallelize([Row(string_field='1', long_field=1, double_field=1.1)])

## Both methods do not work:
# _schema = SparkTypes.StructType()
# for _s in _struct:
#   _schema.add(_s)
_schema = SparkTypes.StructType(_struct)

_df = sqlContext.createDataFrame(_rdd, schema=_schema)
_df.take(1)
```

Returned error:

```
DoubleType can not accept object '1' in type 
```

> StructType doesn't accept Python dicts anymore
> --
>
> Key: SPARK-16700
> URL: https://issues.apache.org/jira/browse/SPARK-16700
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>
> Hello,
> I found this issue while testing my codebase with 2.0.0-rc5
> StructType in Spark 1.6.2 accepts the Python  type, which is very 
> handy. 2.0.0-rc5 does not and throws an error.
> I don't know if this was intended but I'd advocate for this behaviour to 
> remain the same. MapType is probably wasteful when your key names never 
> change and switching to Python tuples would be cumbersome.
> Here is a minimal script to reproduce the issue: 
> {code}
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> struct_schema = SparkTypes.StructType([
> SparkTypes.StructField("id", SparkTypes.LongType())
> ])
> rdd = sc.parallelize([{"id": 0}, {"id": 1}])
> df = sqlc.createDataFrame(rdd, struct_schema)
> print df.collect()
> # 1.6.2 prints [Row(id=0), Row(id=1)]
> # 2.0.0-rc5 raises TypeError: StructType can not accept object {'id': 0} in 
> type 
> {code}
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16700) StructType doesn't accept Python dicts anymore

[jira] [Commented] (SPARK-16700) StructType doesn't accept Python dicts anymore

2 matches

Site Navigation

Mail list logo

Footer information