[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2018-09-11 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610798#comment-16610798 ] Wes McKinney commented on SPARK-19489: -- Since there's a native Rust library for Arrow in

[jira] [Commented] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2018-08-22 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589288#comment-16589288 ] Wes McKinney commented on SPARK-21375: -- Seems there might be some requirements that need to be

[jira] [Commented] (SPARK-24760) Pandas UDF does not handle NaN correctly

2018-07-23 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16553315#comment-16553315 ] Wes McKinney commented on SPARK-24760: -- If data comes to Spark from pandas, any "NaN" values should

[jira] [Commented] (SPARK-19552) Upgrade Netty version to 4.1.x final

2017-11-28 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16269357#comment-16269357 ] Wes McKinney commented on SPARK-19552: -- Is there anyone who might be able to help with upgrading

[jira] [Commented] (SPARK-22221) Add User Documentation for Working with Arrow in Spark

2017-10-07 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16195874#comment-16195874 ] Wes McKinney commented on SPARK-1: -- Hm, I'm not sure that using a MultiIndex is the right

[jira] [Commented] (SPARK-21552) Add decimal type support to ArrowWriter.

2017-07-30 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106532#comment-16106532 ] Wes McKinney commented on SPARK-21552: -- cc @cpcloud -- I linked some associated issues in Arrow

[jira] [Comment Edited] (SPARK-21552) Add decimal type support to ArrowWriter.

2017-07-30 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106532#comment-16106532 ] Wes McKinney edited comment on SPARK-21552 at 7/30/17 3:40 PM: --- cc

[jira] [Comment Edited] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100371#comment-16100371 ] Wes McKinney edited comment on SPARK-21375 at 7/25/17 5:10 PM: --- What is the

[jira] [Commented] (SPARK-21375) Add date and timestamp support to ArrowConverters for toPandas() collection

2017-07-25 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16100371#comment-16100371 ] Wes McKinney commented on SPARK-21375: -- What is the summary of how you're handling the time zone

[jira] [Commented] (SPARK-20960) make ColumnVector public

2017-06-06 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16039648#comment-16039648 ] Wes McKinney commented on SPARK-20960: -- [~cloud_fan] this will be very exciting to have as a

[jira] [Commented] (SPARK-18924) Improve collect/createDataFrame performance in SparkR

2017-05-14 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009818#comment-16009818 ] Wes McKinney commented on SPARK-18924: -- [~deanchen] once R bindings for Arrow are in ship shape,

[jira] [Comment Edited] (SPARK-18924) Improve collect/createDataFrame performance in SparkR

2017-05-14 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009783#comment-16009783 ] Wes McKinney edited comment on SPARK-18924 at 5/14/17 4:00 PM: --- I can also

[jira] [Commented] (SPARK-18924) Improve collect/createDataFrame performance in SparkR

2017-05-14 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16009783#comment-16009783 ] Wes McKinney commented on SPARK-18924: -- I can also with this, particularly on the C++ side. I was

[jira] [Commented] (SPARK-19489) Stable serialization format for external & native code integration

2017-02-08 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15858577#comment-15858577 ] Wes McKinney commented on SPARK-19489: -- I'm really glad to see this is becoming a priority in 2017.

[jira] (SPARK-10057) Faill to load class org.slf4j.impl.StaticLoggerBinder

2017-01-31 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15847242#comment-15847242 ] Wes McKinney commented on SPARK-10057: -- I have been having a really horrible time trying to get

[jira] [Commented] (SPARK-13946) PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions

2016-05-08 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15275807#comment-15275807 ] Wes McKinney commented on SPARK-13946: -- The expression {{F.count(sdf2.foo)}} derives from a

[jira] [Commented] (SPARK-13946) PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions

2016-05-04 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270847#comment-15270847 ] Wes McKinney commented on SPARK-13946: -- {{import pyspark.sql.functions as F}} > PySpark DataFrames

[jira] [Commented] (SPARK-13943) The behavior of sum(booleantype) in Spark DataFrames is not intuitive

2016-03-24 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211208#comment-15211208 ] Wes McKinney commented on SPARK-13943: -- As I tried to explain on the mailing list, the general

[jira] [Created] (SPARK-13943) The behavior of sum(booleantype) in Spark DataFrames is not intuitive

2016-03-19 Thread Wes McKinney (JIRA)
Wes McKinney created SPARK-13943: Summary: The behavior of sum(booleantype) in Spark DataFrames is not intuitive Key: SPARK-13943 URL: https://issues.apache.org/jira/browse/SPARK-13943 Project: Spark

[jira] [Created] (SPARK-13946) PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions

2016-03-19 Thread Wes McKinney (JIRA)
Wes McKinney created SPARK-13946: Summary: PySpark DataFrames allows you to silently use aggregate expressions derived from different table expressions Key: SPARK-13946 URL:

[jira] [Created] (SPARK-13947) PySpark DataFrames: The error message from using an invalid table reference is not clear

2016-03-18 Thread Wes McKinney (JIRA)
Wes McKinney created SPARK-13947: Summary: PySpark DataFrames: The error message from using an invalid table reference is not clear Key: SPARK-13947 URL: https://issues.apache.org/jira/browse/SPARK-13947

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-02-28 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171273#comment-15171273 ] Wes McKinney commented on SPARK-13534: -- SPARK-13391 would need to have its scope more narrowly

[jira] [Updated] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-02-27 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wes McKinney updated SPARK-13534: - Description: The current code path for accessing Spark DataFrame data in Python using PySpark

[jira] [Created] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-02-27 Thread Wes McKinney (JIRA)
Wes McKinney created SPARK-13534: Summary: Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas Key: SPARK-13534 URL: https://issues.apache.org/jira/browse/SPARK-13534

[jira] [Commented] (SPARK-13391) Use Apache Arrow as In-memory columnar store implementation

2016-02-22 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157309#comment-15157309 ] Wes McKinney commented on SPARK-13391: -- I published some analysis on this subject:

[jira] [Commented] (SPARK-13391) Use Apache Arrow as In-memory columnar store implementation

2016-02-19 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155268#comment-15155268 ] Wes McKinney commented on SPARK-13391: -- Indeed, one of the major motivations of Arrow (for Python

[jira] [Commented] (SPARK-7035) Drop __getattr__ on pyspark.sql.DataFrame

2015-04-30 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522165#comment-14522165 ] Wes McKinney commented on SPARK-7035: - [~rxin] asked me to comment on this issue. I

[jira] [Commented] (SPARK-7035) Drop __getattr__ on pyspark.sql.DataFrame

2015-04-30 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522188#comment-14522188 ] Wes McKinney commented on SPARK-7035: - It's not a bad idea, but I really don't see the