[ 
https://issues.apache.org/jira/browse/SPARK-22250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266642#comment-16266642
 ] 

Fernando Pereira commented on SPARK-22250:
------------------------------------------

[~bryanc] It could help, but it doesn't solve the problem. If we have SQL field 
that is an Array, the best equivalent representation from the Python side would 
be a plain Numpy array, given that lists are not efficient. 
When building a dataframe in our projects we have use-cases that would 
immensely benefit from such support. 
>From dataframe to Python returning Array fields as Numpy IMHO would be better, 
>but also changes behavior, so it might be trickier to support. We could 
>eventually control that by detecting if Numpy is available in the system, 
>otherwise raise a warning and fall back to use plain lists.
What do the developers think?

> Be less restrictive on type checking
> ------------------------------------
>
>                 Key: SPARK-22250
>                 URL: https://issues.apache.org/jira/browse/SPARK-22250
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Fernando Pereira
>            Priority: Minor
>
> I find types.py._verify_type() often too restrictive. E.g. 
> {code}
> TypeError: FloatType can not accept object 0 in type <type 'int'>
> {code}
> I believe it would be globally acceptable to fill a float field with an int, 
> especially since in some formats (json) you don't have a way of inferring the 
> type correctly.
> Another situation relates to other equivalent numerical types, like 
> array.array or numpy. A numpy scalar int is not accepted as an int, and these 
> arrays have always to be converted down to plain lists, which can be 
> prohibitively large and computationally expensive.
> Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to