[GitHub] spark pull request #13765: [SPARK-16052][SQL] Add CollapseRepartitionBy opti...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r67610142 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala --- @@ -374,6 +374,9 @@ package object dsl { case

[GitHub] spark pull request #13768: Add `spark_partition_id` in SparkR

2016-06-19 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13768 Add `spark_partition_id` in SparkR ## What changes were proposed in this pull request? This PR adds `spark_partition_id` virtual column function in SparkR for API parity

[GitHub] spark issue #13765: [SPARK-16052][SQL] Add CollapseRepartitionBy optimizer

2016-06-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13765 Hi, @cloud-fan . Could you review this optimizer? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13768: [SPARK-16053][R] Add `spark_partition_id` in SparkR

2016-06-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13768 Hi, @davies . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Add CollapseRepartitionBy opti...

2016-06-18 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13765 [SPARK-16052][SQL] Add CollapseRepartitionBy optimizer ## What changes were proposed in this pull request? This issue adds a new optimizer, `CollapseRepartitionBy`, which is similar

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Add CollapseRepartitionBy opti...

2016-06-19 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r67611572 --- Diff: python/pyspark/sql/dataframe.py --- @@ -451,10 +451,10 @@ def repartition(self, numPartitions, *cols

[GitHub] spark pull request #13765: [SPARK-16052][SQL] Add CollapseRepartitionBy opti...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13765#discussion_r67610137 --- Diff: python/pyspark/sql/dataframe.py --- @@ -451,10 +451,10 @@ def repartition(self, numPartitions, *cols

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

2016-06-18 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13763 [SPARK-16051][R] Add `read.orc/write.orc` to SparkR ## What changes were proposed in this pull request? This issue adds `read.orc/write.orc` to SparkR for API parity. ## How

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r67597228 --- Diff: core/src/main/scala/org/apache/spark/util/StatCounter.scala --- @@ -125,9 +128,12 @@ class StatCounter(values: TraversableOnce[Double

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 Yep. The case still exists for `parquet/csv` and I updated the cases. The previous `text` case changes like the following and looks legitimate. ``` scala

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 Hi, @shivaram and @felixcheung . Now the document is updated with master and shows merged notes correctly. I manually checked all the merged notes and used function signatures

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 Hi, @rxin . Could you review this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` descript...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13403 Thank you so much for your review, @srowen ! I updated the PR according to your comments. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r67597273 --- Diff: core/src/test/scala/org/apache/spark/PartitioningSuite.scala --- @@ -244,6 +244,10 @@ class PartitioningSuite extends SparkFunSuite

[GitHub] spark issue #13635: [SPARK-15159][SPARKR] SparkR SparkSession API

2016-06-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13635 Oh, great!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 Oh, sorry. The master was changed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 I will recheck this PR again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13774: [SPARK-16059][R] Add `monotonically_increasing_id...

2016-06-19 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13774 [SPARK-16059][R] Add `monotonically_increasing_id` function in SparkR ## What changes were proposed in this pull request? This PR adds `monotonically_increasing_id` column function

[GitHub] spark pull request #13870: [SPARK-16165][SQL] Fix the update logic for InMem...

2016-06-23 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13870 [SPARK-16165][SQL] Fix the update logic for InMemoryTableScanExec.readBatches ## What changes were proposed in this pull request? Currently, `readBatches` accumulator

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r68188989 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends

[GitHub] spark issue #13872: [SPARK-16164][SQL] Filter pushdown should keep the order...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13872 Sure, I fully agree with your view. That's the declarative language. However, we can provide more *natural* order as a default order like in this PR. As you see, without considering

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r68184739 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends

[GitHub] spark pull request #13872: [SPARK-16164][SQL] Filter pushdown should keep th...

2016-06-23 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13872 [SPARK-16164][SQL] Filter pushdown should keep the ordering in the logical plan ## What changes were proposed in this pull request? Chris McCubbin reported a bug when he used

[GitHub] spark pull request #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` d...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13403#discussion_r68185536 --- Diff: core/src/main/scala/org/apache/spark/rdd/DoubleRDDFunctions.scala --- @@ -74,6 +74,22 @@ class DoubleRDDFunctions(self: RDD[Double]) extends

[GitHub] spark issue #13872: [SPARK-16164][SQL] Update `CombineFilters` to try to con...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13872 For any conclusion, thank you for review, @mengxr and @liancheng ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 Hi, @tdas . Could you give me some advice for the direction about how to change this PR? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` descript...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13403 Thank you for everything, @srowen , @mengxr , @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13872: [SPARK-16164][SQL] Filter pushdown should keep the order...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13872 I think I had better change the title of this PR. (I just copied from the JIRA.) Does that will reduce your concern a little bit? --- If your project is set up for it, you can reply

[GitHub] spark issue #13870: [SPARK-16165][SQL] Fix the update logic for InMemoryTabl...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13870 Hi, @liancheng . Could you review this PR, too? This was initially introduced in your https://github.com/apache/spark/commit/74049249abb952ad061c0e221c22ff894a9e9c8d#diff

[GitHub] spark issue #13854: [SPARK-15956] [SQL] When unwrapping ORC avoid pattern ma...

2016-06-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13854 +1 @hvanhovell . Since Jenkins does not run scala-2.10 compilation, I ran the build locally on Ubuntu/JDK7/Scala 2.10 (just for double-check for this PR.) --- If your project is set

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13763 Thank you, @felixcheung ! By the way, unfortunately, `DataFrameReader.scala` provides ORC and Parquet feature differently. For ORC, we can accept only one path now

[GitHub] spark pull request #13768: [SPARK-16053][R] Add `spark_partition_id` in Spar...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13768#discussion_r67671989 --- Diff: R/pkg/R/generics.R --- @@ -1126,6 +1126,10 @@ setGeneric("sort_array", function(x, asc = TRUE) { standardGeneric(&

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 By the way, before discussing further, should I retarget this PR for Spark 2.1.0 ? I think this PR misses the deadline a little bit. --- If your project is set up for it, you can reply

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13763 Thank you so much, @felixcheung ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 It seems you're concerning multiple issue. I'll focus on the same versions issue first. The principle of this PR is simply adding `since` tag for all exposed functions. IMHO

[GitHub] spark issue #13763: [SPARK-16051][R] Add `read.orc/write.orc` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13763 Actually, for the ORC, the reason I didn't try to get multiple file is the API consistently. Scala/Python also only supports single ORC, so R should does. I didn't dig futher, but I

[GitHub] spark issue #13768: [SPARK-16053][R] Add `spark_partition_id` in SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13768 Thank you for merging, @shivaram ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13403: [SPARK-15660][CORE] Update RDD `variance/stdev` descript...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13403 Ping~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #13763: [SPARK-16051][R] Add `read.orc/write.orc` to Spar...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13763#discussion_r67725053 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1667,6 +1668,25 @@ test_that("mutate(), transform(), rename() and

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 For other important issue about `see also`, all the previous doc look like that. http://spark.apache.org/docs/1.6.0/api/R/approxCountDistinct.html http://spark.apache.org/docs

[GitHub] spark issue #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13786 Hi, @shivaram , @felixcheung . This is the up-to-date `pivot` PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-20 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13786 [SPARK-15294][R] Add `pivot` to SparkR ## What changes were proposed in this pull request? This PR adds `pivot` function to SparkR for API parity. Since this PR is based on https

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67811795 --- Diff: R/pkg/R/stats.R --- @@ -134,9 +129,7 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character&qu

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67812243 --- Diff: R/pkg/R/stats.R --- @@ -134,9 +129,7 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character&qu

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67812187 --- Diff: R/pkg/R/stats.R --- @@ -134,9 +129,7 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character&qu

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67812909 --- Diff: R/pkg/R/stats.R --- @@ -134,9 +129,7 @@ setMethod("freqItems", signature(x = "SparkDataFrame", cols = "character&qu

[GitHub] spark pull request #13798: [SPARKR][DOCS] R code doc cleanup

2016-06-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13798#discussion_r67813477 --- Diff: R/pkg/R/DataFrame.R --- @@ -606,10 +607,10 @@ setMethod("unpersist", #' #' The following options for repartition ar

[GitHub] spark pull request #13768: [SPARK-16053][R] Add `spark_partition_id` in Spar...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13768#discussion_r67750057 --- Diff: R/pkg/R/functions.R --- @@ -1179,6 +1179,27 @@ setMethod("soundex", column(jc) })

[GitHub] spark issue #13782: [SPARKR] fix R roxygen2 doc for count on GroupedData

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13782 LGTM. It's that all? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 Thank you, @shivaram ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13774: [SPARK-16059][R] Add `monotonically_increasing_id` funct...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13774 Thank you, @shivaram and @felixcheung ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13295: [SPARK-15294][SPARKR][MINOR] Add pivot functionality to ...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13295 This will be really useful. I'll make this PR up-to-date and adds credit description for @mhnatiuk . --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #13768: [SPARK-16053][R] Add `spark_partition_id` in Spar...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13768#discussion_r67744103 --- Diff: R/pkg/R/functions.R --- @@ -1179,6 +1179,27 @@ setMethod("soundex",

[GitHub] spark issue #13768: [SPARK-16053][R] Add `spark_partition_id` in SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13768 Thank you, @shivaram . According to your advice and #13394 , I fixed the title convention. That's all for this PR. (For my other PRs, I will fix like that, too.) --- If your

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 Thank you for narrowing the scope. Indeed, this seems to be an experimental attempt. I rebased to resolved the conflict. As @felixcheung mentioned, the use of function signature

[GitHub] spark issue #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13786 Thank you always, @felixcheung ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67803750 --- Diff: R/pkg/R/stats.R --- @@ -19,7 +19,8 @@ setOldClass("jobj") -#' crosstab +#' @title SparkDataFrame statistic

[GitHub] spark issue #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13786 Thank you, @shivaram ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67806331 --- Diff: R/pkg/R/stats.R --- @@ -19,7 +19,8 @@ setOldClass("jobj") -#' crosstab +#' @title SparkDataFrame statistic

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13109 In line 333 of `functions.R`, `@rdname covar_pop` -> `@rdname cov`? ``` #' covar_pop #' #' Compute the population covariance between two expressions. #' #' @rdn

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67809806 --- Diff: R/pkg/R/generics.R --- @@ -430,19 +430,19 @@ setGeneric("coltypes<-", function(x, value) { standardGeneric("coltypes<-

[GitHub] spark pull request #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13109#discussion_r67806157 --- Diff: R/pkg/R/stats.R --- @@ -19,7 +19,8 @@ setOldClass("jobj") -#' crosstab +#' @title SparkDataFrame statistic

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13109 Oh, the root cause exists in `generics.R`. Nice catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13109: [SPARK-15319][SPARKR][DOCS] Fix SparkR doc layout for co...

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13109 Yes. Indeed, we had better keep each function on own RD generally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13786#discussion_r67801232 --- Diff: R/pkg/R/group.R --- @@ -129,6 +129,48 @@ methods <- c("avg", "max", "mean", "min", "s

[GitHub] spark issue #13684: [SPARK-15908][R] Add varargs-type dropDuplicates() funct...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13684 Thank you, @shivaram and @sun-rui . Now, it's ready for review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #13684: [SPARK-15908][R] Add varargs-type dropDuplicates() funct...

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13684 Yes, @sun-rui . I realigned the parameter comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13714: [SPARK-15996][R] Fix R dataframe example by remov...

2016-06-16 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13714 [SPARK-15996][R] Fix R dataframe example by removing deprecated functions ## What changes were proposed in this pull request? Currently, R dataframe example fails like the following

[GitHub] spark issue #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should conside...

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13643 Hi, @srowen . Could you review and merge this PR please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13714: [SPARK-15996][R] Fix R examples by removing deprecated f...

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13714 Hi, @shivaram , @felixcheung , @sun-rui . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13870: [SPARK-16165][SQL] Fix the update logic for InMemoryTabl...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13870 No problem! And, thank you for attention! :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13870: [SPARK-16165][SQL] Fix the update logic for InMemoryTabl...

2016-06-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13870 Oh, thank you for review, @davies . But, sorry. I'm not sure what you mean. Do you mean there is a reason that `readBatches` should be `0` when the option

[GitHub] spark pull request #13887: [SPARK-16186][SQL] Support partition batch prunin...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13887#discussion_r68380170 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -79,6 +79,11 @@ private[sql] case class

[GitHub] spark pull request #13887: [SPARK-16186][SQL] Support partition batch prunin...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13887#discussion_r68381115 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -79,6 +79,11 @@ private[sql] case class

[GitHub] spark pull request #13876: [SPARK-16174][SQL] Add RemoveLiteralRepetitionFro...

2016-06-23 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13876 [SPARK-16174][SQL] Add RemoveLiteralRepetitionFromIn optimizer ## What changes were proposed in this pull request? This PR adds an optimizer to remove the duplicated literals from

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13887 Hi, @cloud-fan . I updated the PR. IMO, - InSet is used for large size of `IN` . - This PR is used for small size of `IN`. --- If your project is set up for it, you can reply

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13887 Thank you, @cloud-fan ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13887: [SPARK-16186][SQL] Support partition batch prunin...

2016-06-24 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13887 [SPARK-16186][SQL] Support partition batch pruning with `IN` predicate in InMemoryTableScanExec ## What changes were proposed in this pull request? One of the most frequent usage

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13887 cc @rxin , @davies , @cloud-fan . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13786: [SPARK-15294][R] Add `pivot` to SparkR

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13786 Hi, @Div333 . @mhnatiuk is right. For the binary and document, Spark 2.0 is very close to release. You had better wait. :) --- If your project is set up for it, you can reply

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13887 Although I decided to make this PR after observing TPC-DS queries, I will definitely update this PR if there are another useful scenarios. --- If your project is set up for it, you can reply

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13876: [SPARK-16174][SQL] Improve OptimizeIn optimizer to remov...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13876 Hi, @rxin . Could you review this PR again when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13765: [SPARK-16052][SQL] Add CollapseRepartitionBy optimizer

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13765 Rebased. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13887: [SPARK-16186][SQL] Support partition batch pruning with ...

2016-06-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13887 Thank you for your review and valuable improvement ideas, @davies . Let me rephrase about your ideas, 1. For `IN` with single expression, we definitely had better improve

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 Thank you, @felixcheung . I removed the note on `dataFrame`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13734: [SPARK-14995][R] Add `since` tag in Roxygen documentatio...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13734 Hi, @shivaram and @felixcheung . Now, PR and the generated HTML site are up-to-date again. (For HTML site, you may need to refresh.) --- If your project is set up for it, you can reply

[GitHub] spark pull request #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, ...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13751#discussion_r67604389 --- Diff: docs/sparkr.md --- @@ -14,29 +14,24 @@ supports operations like selection, filtering, aggregation etc. (similar to R da [dplyr](https

[GitHub] spark pull request #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, ...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13751#discussion_r67604494 --- Diff: docs/sparkr.md --- @@ -14,29 +14,24 @@ supports operations like selection, filtering, aggregation etc. (similar to R da [dplyr](https

[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r67578803 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/test/DataFrameReaderWriterSuite.scala --- @@ -572,4 +572,16 @@ class

[GitHub] spark issue #13730: [SPARK-16006][SQL] Attemping to write empty DataFrame wi...

2016-06-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13730 Hi, @tdas . Could you review this PR again when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, ...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13751#discussion_r67604590 --- Diff: docs/sparkr.md --- @@ -158,20 +152,19 @@ write.df(people, path="people.parquet", source="parquet", mode="overwr

[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Oh, I see. I will fix tonight. Thank you, @tdas ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13721: [SPARK-16005][R] Add `randomSplit` to SparkR

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13721#discussion_r67457577 --- Diff: R/pkg/R/DataFrame.R --- @@ -2884,3 +2884,39 @@ setMethod("write.jdbc", write <- callJMethod(write,

[GitHub] spark pull request #13721: [SPARK-16005][R] Add `randomSplit` to SparkR

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13721#discussion_r67457739 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2264,6 +2264,14 @@ test_that("createDataFrame sqlContext parameter backward compatib

[GitHub] spark issue #13684: [SPARK-15908][R] Add varargs-type dropDuplicates() funct...

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13684 Thank you, @shivaram ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13721: [SPARK-16005][R] Add `randomSplit` to SparkR

2016-06-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13721#discussion_r67457591 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2264,6 +2264,14 @@ test_that("createDataFrame sqlContext parameter backward compatib

[GitHub] spark pull request #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, ...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13751#discussion_r67604919 --- Diff: docs/sparkr.md --- @@ -91,17 +86,17 @@ The following options can be set in `sparkEnvir` with `sparkR.init` from RStudio

[GitHub] spark issue #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, program...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13751 @felixcheung LGTM excepts a few comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #13751: [SPARK-15159][SPARKR] SparkSession roxygen2 doc, ...

2016-06-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13751#discussion_r67604853 --- Diff: docs/sparkr.md --- @@ -113,16 +108,15 @@ head(df) ### From Data Sources -SparkR supports operating on a variety of data

<    1   2   3   4   5   6   7   8   9   10   >