Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
This patch only handled the raw columns, not the vector / array value
columns. So maybe that original JIRA should still be open, or create another
one specific to this.
---
If your project
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r77764184
--- Diff: R/pkg/inst/tests/testthat/test_utils.R ---
@@ -183,4 +183,28 @@ test_that("overrideEnvs", {
expect_equal(config[["conf
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r77763275
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
is_sparkR_shell <- function() {
grepl("
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r77760776
--- Diff: R/pkg/R/utils.R ---
@@ -697,3 +697,18 @@ is_master_local <- function(master) {
is_sparkR_shell <- function() {
grepl("
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
I'm presenting something related to this on Thursday- it would be nice to
tell the audience this patch made it in. Can I do anything to help this along?
---
If your project is set up
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
Yes, this is only for a bug fix. @shivaram mentioned in a previous email
exchange it would be good to see some performance benchmarks as well.
---
If your project is set up for it, you can
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
@shivaram what do you think?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
Tried some more benchmarks today. Didn't see any difference in speed before
/ after patch. Observing the processes as they run I see the vast majority of
time spent in the local R process, while
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
Not sure why these timings are so bad. Found out today that by using bytes
and calling directly into Java's `org.apache.spark.api.r.RRDD` these can be
improved by 2 orders of magnitude
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
This change doesn't appear to make any difference in speed.
```
# Wed Aug 24 14:12:12 KST 2016
# Benchmarking performance before and after dapplyCollect patch
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r76007525
--- Diff: R/pkg/inst/worker/worker.R ---
@@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer,
deserializer, key,
# availa
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r76007311
--- Diff: R/pkg/inst/worker/worker.R ---
@@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer,
deserializer, key,
# availa
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r76004770
--- Diff: R/pkg/inst/worker/worker.R ---
@@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer,
deserializer, key,
# availa
Github user clarkfitzg commented on a diff in the pull request:
https://github.com/apache/spark/pull/14783#discussion_r76004521
--- Diff: R/pkg/inst/worker/worker.R ---
@@ -36,7 +36,14 @@ compute <- function(mode, partition, serializer,
deserializer, key,
# availa
Github user clarkfitzg commented on the issue:
https://github.com/apache/spark/pull/14783
My pleasure. Let me know if / when I should squash these commits or rebase.
Working on some before and after benchmarks now.
---
If your project is set up for it, you can reply
GitHub user clarkfitzg opened a pull request:
https://github.com/apache/spark/pull/14783
SPARK-16785 R dapply doesn't return array or raw columns
## What changes were proposed in this pull request?
Fixed bug in `dapplyCollect` by changing the `compute` function
17 matches
Mail list logo