[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2017-06-13 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 This patch only handled the raw columns, not the vector / array value columns. So maybe that original JIRA should still be open, or create another one specific to this. --- If your project is

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2017-06-13 Thread catlain
Github user catlain commented on the issue: https://github.com/apache/spark/pull/14783 done [jira](https://issues.apache.org/jira/browse/SPARK-16785) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2017-06-12 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14783 @catlain could you please open a JIRA. like this, set component to SparkR https://issues.apache.org/jira/browse/SPARK-21068?filter=12333531 --- If your project is set up for it, you can

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2017-06-02 Thread catlain
Github user catlain commented on the issue: https://github.com/apache/spark/pull/14783 still have this issue when input data is a array column with different length each vector, like: ``` test1 key value 1 4dda7d68a202e9e3

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Thanks for the update. LGTM. Merging this to master and branch-2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65027/ Test PASSed. ---

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #65027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65027/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #65027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65027/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-06 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Sorry for the delay @clarkfitzg - The code change looks pretty good to me. I just had one question about mixed type columns. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-06 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 I'm presenting something related to this on Thursday- it would be nice to tell the audience this patch made it in. Can I do anything to help this along? --- If your project is set up for it,

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-01 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 @sun-rui Any other comments ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-09-01 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14783 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64756/ Test PASSed. ---

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64756/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64756/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Sorry I think this was a break that I just fixed in #14904 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64737/ Test FAILed. ---

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64737 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64737/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64737/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64712/ Test FAILed. ---

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64712 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64712/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64712 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64712/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/14783 should we have a test against DataFrame with binary column? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-31 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Yes, this is only for a bug fix. @shivaram mentioned in a previous email exchange it would be good to see some performance benchmarks as well. --- If your project is set up for it, you can

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-30 Thread sun-rui
Github user sun-rui commented on the issue: https://github.com/apache/spark/pull/14783 @clarkfitzg, your patch is for bug fix but not for performance improvement, right? If so, since there is no performance regression according to your benchmark, let's focus on the functionality. We

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-30 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 @shivaram what do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-29 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Tried some more benchmarks today. Didn't see any difference in speed before / after patch. Observing the processes as they run I see the vast majority of time spent in the local R process, while

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-25 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 Not sure why these timings are so bad. Found out today that by using bytes and calling directly into Java's `org.apache.spark.api.r.RRDD` these can be improved by 2 orders of magnitude. --- If

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 This change doesn't appear to make any difference in speed. ``` # Wed Aug 24 14:12:12 KST 2016 # Benchmarking performance before and after dapplyCollect patch #

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64337/ Test PASSed. ---

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64337/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64337/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64335/ Test PASSed. ---

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64335/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14783 **[Test build #64335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64335/consoleFull)** for PR 14783 at commit

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread clarkfitzg
Github user clarkfitzg commented on the issue: https://github.com/apache/spark/pull/14783 My pleasure. Let me know if / when I should squash these commits or rebase. Working on some before and after benchmarks now. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Jenkins, ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/14783 Thanks @clarkfitzg -- I'll take a look at this tomorrow --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #14783: SPARK-16785 R dapply doesn't return array or raw columns

2016-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14783 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this