[GitHub] spark pull request: [SPARK-6852][SPARKR] Accept numeric as numPart...

2015-04-21 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/5613 [SPARK-6852][SPARKR] Accept numeric as numPartitions in SparkR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-6852

[GitHub] spark pull request: [SPARK-6818][SPARKR] Support column deletion i...

2015-04-23 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/5655 [SPARK-6818][SPARKR] Support column deletion in SparkR DataFrame API. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-6818

[GitHub] spark pull request: [SPARK-7033][SPARKR] Clean usage of split. Use...

2015-04-22 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/5628 [SPARK-7033][SPARKR] Clean usage of split. Use partition instead where applicable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui

[GitHub] spark pull request: [SPARK-7033][SPARKR] Clean usage of split. Use...

2015-04-23 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/5628#issuecomment-95774826 @shivaram , yeah, I forgot to do cleanup in tests. Now the tests are cleaned. please check it. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-6812][SparkR] filter() on DataFrame doe...

2015-05-06 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/5938 [SPARK-6812][SparkR] filter() on DataFrame does not work as expected. According to the R manual: https://stat.ethz.ch/R-manual/R-devel/library/base/html/Startup.html, if a function .First

[GitHub] spark pull request: Spark-7435[R]: Make DataFrame.show() consisten...

2015-05-08 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/5989#issuecomment-100116112 @rekhajoshm , Thank you. As discussed in the JIRA issue, we can keep these names as is. But we can improve the implementation of showDF() to print c-style strings

[GitHub] spark pull request: [SPARK-7482][SparkR] Rename some DataFrame API...

2015-05-08 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/6007 [SPARK-7482][SparkR] Rename some DataFrame API methods in SparkR to match their counterparts in Scala. You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] spark pull request: [SPARK-7482][SparkR] Rename some DataFrame API...

2015-05-08 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6007#issuecomment-100420754 @shivaram, I think it is OK to rename APIs to ones that R user are accustomed to, for example, read.df and write.df. It would be better if there are more feedback from

[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/6183 [SPARK-7227][SPARKR] Support fillna / dropna in R DataFrame. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-7227

[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30455256 --- Diff: R/pkg/R/DataFrame.R --- @@ -1431,3 +1431,119 @@ setMethod(describe, sdf - callJMethod(x@sdf, describe, listToSeq(colList

[GitHub] spark pull request: [SPARK-7227][SPARKR] Support fillna / dropna i...

2015-05-15 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6183#discussion_r30455447 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -166,6 +166,24 @@ private[spark] object SerDe { } } + def

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-05-15 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-102560770 @hqzizania, is it possible to simplify the conversion to one place,something like: jrdd - getJRDD(lapply(rdd,naToNull), row)? Another thought is, can we do

[GitHub] spark pull request: [SPARK-7482][SparkR] Rename some DataFrame API...

2015-05-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6007#discussion_r30202774 --- Diff: R/pkg/R/generics.R --- @@ -453,7 +453,11 @@ setGeneric(saveAsTable, function(df, tableName, source, mode, ...) { standardGeneric(saveAsTable

[GitHub] spark pull request: [SPARK-7482][SparkR] Rename some DataFrame API...

2015-05-12 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6007#discussion_r30202781 --- Diff: R/pkg/R/SQLContext.R --- @@ -446,14 +446,15 @@ dropTempTable - function(sqlCtx, tableName) { #' @param source the name of external data source

[GitHub] spark pull request: [SPARK-7482][SparkR] Rename some DataFrame API...

2015-05-12 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6007#issuecomment-101514500 rebased to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6820][SPARKR]Convert NAs to null type i...

2015-06-02 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6190#issuecomment-108145372 @davies, do you have any comment on this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-10 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/6743 [SPARK-6797][SPARKR] Add support for YARN cluster mode. This PR enables SparkR to dynamically ship the SparkR binary package to the AM node in YARN cluster mode, thus it is no longer required

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-13 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-111738919 rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-11 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-111315827 Yes this assigns a symbol link name. Thus we can refer to the shipped package via the logical name instead of the specific archive file name. --- If your project

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-15 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-111943437 I originally planned to have a separate JIRA issue for adding shipping of the SparkR package for RDD APIs. But if this is still required by DataFrame API, I can do

[GitHub] spark pull request: [SPARK-8063][SPARKR] Spark master URL conflict...

2015-06-02 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/6605 [SPARK-8063][SPARKR] Spark master URL conflict between MASTER env variable and --master command line option. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-7714][SPARKR] SparkR tests should use m...

2015-07-01 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/7152 [SPARK-7714][SPARKR] SparkR tests should use more specific expectations than expect_true 1. Update the pattern 'expect_true(a == b)' to 'expect_equal(a, b)'. 2. Update the pattern 'expect_true

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116698303 @tgravescs, I think the problem of shipping R itself is that R executable is platform specific. Also it may require OS specific installation before running R

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116734776 rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116710169 Add support for shipping SparkR package for R workers required by RDD APIs. Tested createDataFrame() by creating a DataFrame from an R list. Remove sparkRLibDir

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-01 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-117859704 @andrewor14, I have tested this patch with a real YARN cluster. For an R program, it can source other R files and call functions within them, or it can

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-01 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33738526 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-01 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33737682 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-01 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33739229 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -339,6 +341,23 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...

2015-08-13 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7494#issuecomment-130662126 @CHOIJAEHONG1, @shivaram: 1. R worker can be in any locale, because R can recognize UTF-8 and preserve UTF-8 encoding when manipulating strings. The root cause

[GitHub] spark pull request: [SPARK-10053] Export lapply and lapplyPartitio...

2015-08-17 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8240#issuecomment-132065626 RDD API is deliberatly hidden in spark 1.4. For discussion of exposing subset of RDD API and high level parallel computation API, please refer to https

[GitHub] spark pull request: [SPARK-8844][SPARKR] head/collect is broken in...

2015-08-15 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7419#issuecomment-131476840 @shivaram, fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-18 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/8276 [SPARK-10048][SPARKR] Support arbitrary nested Java array in serde. This PR: 1. supports transferring arbitrary nested array from JVM to R side in SerDe; 2. based on 1, collect

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-19 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-132571243 @shivaram, I tried to support ArrayType. By adding code like: // Convert Seq[Any] to Array[Any] val value = if (obj.isInstanceOf[Seq[Any

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8276#discussion_r37490241 --- Diff: R/pkg/R/DataFrame.R --- @@ -628,18 +628,49 @@ setMethod(dim, setMethod(collect, signature(x = DataFrame

[GitHub] spark pull request: [SPARK-10053] Export lapply and lapplyPartitio...

2015-08-19 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8240#issuecomment-132843607 I think UDF() is important, hope it can be available in 1.6 release. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-19 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-132852075 will add test cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8276#discussion_r37490835 --- Diff: R/pkg/R/DataFrame.R --- @@ -628,18 +628,49 @@ setMethod(dim, setMethod(collect, signature(x = DataFrame

[GitHub] spark pull request: [SPARK-10053] Export lapply and lapplyPartitio...

2015-08-18 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8240#issuecomment-132131289 Yes, DataFrame API has lapply and lapplyPartition, but actually there are RDD-like API ,and its implementation is based on the corresponding RDD API after converting

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-24 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-134217045 rebased to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-18 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8276#discussion_r37371028 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -210,22 +213,31 @@ private[spark] object SerDe { writeType(dos, void

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-18 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-132406893 @davies , that will be done in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-19 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-132478924 @shivaram, now ArrayType in a DataFrame is still not supported, as ArrayType's class is something like scala.collection.mutable.WrappedArray$ofRef, it will be passed

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8276#discussion_r37723615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -98,27 +98,17 @@ private[r] object SQLUtils { val bos = new

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-24 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-134066063 @shivaram , test cases for SerDe added. Now the SerDe does not support transferring a list of different element types from R side to JVM side. Let's leave

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-19 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8276#discussion_r37490860 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -98,27 +98,20 @@ private[r] object SQLUtils { val bos = new

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-116892414 @shivaram, yes, I saw that function, but felt confusing that it does not consider the YARN mode case. @davies, it seems unit tests for pySpark in YARN modes were

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33539511 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33539610 --- Diff: core/src/main/scala/org/apache/spark/api/r/RRDD.scala --- @@ -390,9 +386,10 @@ private[r] object RRDD { thread

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33539384 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33539628 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33539591 --- Diff: core/src/main/scala/org/apache/spark/deploy/RRunner.scala --- @@ -71,9 +71,10 @@ object RRunner { val builder = new ProcessBuilder(Seq

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-29 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33539310 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -0,0 +1,44 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-30 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/6743#discussion_r33555215 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -339,6 +340,24 @@ object SparkSubmit

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-30 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-117086228 rebased --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-30 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-117372487 @davies, I will add test case for YARN modes in a separate PR which is intended to update SparkR related logic to align with SPARK-5479. --- If your project is set up

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-06-30 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-117369661 @shivaram, the rLibDir parameter of sparkR.init() was intended for locating SparkR package on worker nodes at the time when SparkR was a separate project. Since now

[GitHub] spark pull request: [SPARK-8844][SPARKR] head/collect is broken in...

2015-08-15 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7419#issuecomment-131324275 @shivaram , as you pointed out, I come with a simpler fix. I realized that simply creating an empty vector using vector() without mode is OK when there is no row

[GitHub] spark pull request: [SPARK-9053][SparkR] Fix spaces around parens,...

2015-07-23 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/7584#discussion_r35396836 --- Diff: R/pkg/R/DataFrame.R --- @@ -1384,7 +1384,7 @@ setMethod(saveAsTable, org.apache.spark.sql.parquet

[GitHub] spark pull request: [SPARK-9053][SparkR] Fix spaces around parens,...

2015-07-23 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/7584#discussion_r35392454 --- Diff: R/pkg/R/DataFrame.R --- @@ -1384,7 +1384,7 @@ setMethod(saveAsTable, org.apache.spark.sql.parquet

[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...

2015-07-24 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/7494#discussion_r35407059 --- Diff: R/pkg/R/deserialize.R --- @@ -56,8 +56,10 @@ readTypedObject - function(con, type) { readString - function(con) { stringLen

[GitHub] spark pull request: [SPARK-9053][SparkR] Fix spaces around parens,...

2015-07-22 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/7584#discussion_r35289630 --- Diff: R/pkg/R/DataFrame.R --- @@ -1384,7 +1384,7 @@ setMethod(saveAsTable, org.apache.spark.sql.parquet

[GitHub] spark pull request: [SPARK-9053][SparkR] Fix spaces around parens,...

2015-07-22 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/7584#discussion_r35288709 --- Diff: R/pkg/R/context.R --- @@ -121,7 +121,7 @@ parallelize - function(sc, coll, numSlices = 1) { numSlices - length(coll) sliceLen

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-13 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-120893058 @shivaram , tests done. Also tested with YARN cluster, yarn-client, standalone, createDataFrame() in YARN client mode. --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-8808][SPARKR] Fix assignments in SparkR...

2015-07-14 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/7395 [SPARK-8808][SPARKR] Fix assignments in SparkR. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-8808 Alternatively you can

[GitHub] spark pull request: [SPARK-8807][SparkR] Add between operator in S...

2015-07-16 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7356#issuecomment-121848726 @viirya, 1. Why pass the lower bound and upper bound in a vector instead of pass separately? They are passed separately in Scala API. and A vector can not hold

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-16 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-121884026 @brkyvz, more comments:) 3. Is it possible to run R specific tests only when -psparkr (sparkr profile) is specified? If sparkr profile is specified when building

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-16 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-121880406 @brkyvz, could you give more explaination on your usage scenario that this PR is expected to support? 1. This PR introduces a manifest keyword, a hybrid JAR

[GitHub] spark pull request: [SPARK-8808][SPARKR] Fix assignments in SparkR...

2015-07-14 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7395#issuecomment-121436723 OK. I will clean them together --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-16 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-122144505 @brkyvz, Thanks for your explanation of this requirement from Devs. 1. Is there any existing JIRA or discussion which you can point me to so that I can learn

[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...

2015-07-20 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7494#issuecomment-122837016 I think readString() in deserialize.R should be updated accordingly. Could you try: string - readBin(...) Encoding(string) - UTF-8 string - enc2native

[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...

2015-07-20 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7494#issuecomment-122867464 yeah, rawToChar() is needed. Then does it work now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-13 Thread sun-rui
Github user sun-rui closed the pull request at: https://github.com/apache/spark/pull/6743 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-6797][SPARKR] Add support for YARN clus...

2015-07-13 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/6743#issuecomment-121103339 @shivaram, close the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-8951][SparkR] support Unicode character...

2015-07-20 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7494#issuecomment-123162043 Could you try adding a zero as done previously in writeString(): val utf8 = value.getBytes(UTF-8) val len = utf8.length out.writeInt(len + 1

[GitHub] spark pull request: [SPARK-8844][SPARKR] head/collect is broken in...

2015-07-15 Thread sun-rui
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/7419 [SPARK-8844][SPARKR] head/collect is broken in SparkR. This is a WIP patch for SPARK-8844 for collecting reviews. This bug is about reading an empty DataFrame. in readCol

[GitHub] spark pull request: [SPARK-8808][SPARKR] Fix assignments in SparkR...

2015-07-14 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/7395#issuecomment-121478749 All reported issues of such kind are cleaned. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10048][SPARKR] Support arbitrary nested...

2015-08-24 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/8276#issuecomment-134443211 The test passed on my machine, I don't know the reason. Anyway, add spark context initialization into test_Serde to see if it can pass on Jenkins. --- If your project

[GitHub] spark pull request: [SPARK-9319][SPARKR] Add support for setting c...

2015-10-24 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9218#issuecomment-150809533 I am not clear that for both coltypes() and coltypes<-(), how to represent complex types in R types? do you have idea? --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-11244][SPARKR] sparkR.stop() should rem...

2015-10-22 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9205#issuecomment-150119806 is it possible for lin-r to skip # comment with code outside a function body? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-10971][SPARKR] RRunner should allow set...

2015-10-21 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9179#issuecomment-150106716 @shivaram, documentation updated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2015-10-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9192#discussion_r42711959 --- Diff: R/pkg/R/SQLContext.R --- @@ -17,6 +17,34 @@ # SQLcontext.R: SQLContext-driven functions +#' Temporary function to reroute old

[GitHub] spark pull request: [SPARK-11209][SPARKR] Add window functions int...

2015-10-21 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9193#issuecomment-150105010 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2015-10-21 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9192#discussion_r42711903 --- Diff: R/pkg/R/SQLContext.R --- @@ -17,6 +17,34 @@ # SQLcontext.R: SQLContext-driven functions +#' Temporary function to reroute old

[GitHub] spark pull request: SPARK-11258 Remove quadratic runtime complexit...

2015-10-22 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9222#discussion_r42733892 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala --- @@ -130,16 +130,17 @@ private[r] object SQLUtils { } def

[GitHub] spark pull request: SPARK-11258 Remove quadratic runtime complexit...

2015-10-22 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9222#issuecomment-150377869 dfToCols is meant to be called from R side, it makes sense to test it in R. Having a test case in R for it can test not only the logic in Scala, but also helps to test

[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2015-10-22 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9192#discussion_r42726488 --- Diff: R/pkg/R/SQLContext.R --- @@ -17,6 +17,34 @@ # SQLcontext.R: SQLContext-driven functions +#' Temporary function to reroute old

[GitHub] spark pull request: SPARK-11258 Remove quadratic runtime complexit...

2015-10-22 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9222#issuecomment-150166755 Instead of creating a new testsuite in Scala, you can add a new test case in R, using callJStatic to invoke "dfToCols" on the Scala side. --- If yo

[GitHub] spark pull request: [SPARKR] [SPARK-11199] Improve R context manag...

2015-10-22 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9185#issuecomment-150162723 On the R side, there is a cache of created SQLContext/HiveContext, so R won't call createSQLContext() second time. See https://github.com/apache/spark/blob/master/R/pkg

[GitHub] spark pull request: [SPARK-9319][SPARKR] Add support for setting c...

2015-10-22 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9218#issuecomment-150164199 could we support both names() and colnames()? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-10903] [SPARKR] R - Simplify SQLContext...

2015-10-22 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9192#discussion_r42726208 --- Diff: R/pkg/R/jobj.R --- @@ -77,6 +77,11 @@ print.jobj <- function(x, ...) { cat("Java ref type", name, "id

[GitHub] spark pull request: SPARK-11258 Converting a Spark DataFrame into ...

2015-10-26 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9222#issuecomment-151060081 @FRosner, I am not denying a unit test:). My point is that it seems not necessary to add a Scala unit case for dfToCols, which is dedicated for SparkR now and its logic

[GitHub] spark pull request: [SPARK-11209][SPARKR] Add window functions int...

2015-10-26 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9193#issuecomment-151055969 @shivaram, [R lag function](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/lag.html) is for time series objects, which is irrelevant to the lead/lag here

[GitHub] spark pull request: [SPARK-9319][SPARKR] Add support for setting c...

2015-10-27 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9218#issuecomment-151399054 @felixcheung, type inferring works for complex types in createDataFrame(). You can refer to the test case for "create DataFrame with complex types" in test_

[GitHub] spark pull request: [SPARK-11340][SPARKR] Support setting driver p...

2015-10-27 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9290#discussion_r43089490 --- Diff: R/pkg/R/sparkR.R --- @@ -123,16 +123,30 @@ sparkR.init <- function( uriSep <- "" }

[GitHub] spark pull request: [SPARK-11209][SPARKR] Add window functions int...

2015-10-26 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9193#issuecomment-151350204 Yes. the lag will be masked. Just as discussed before, sometimes, this is allowed, as I assume lag is not so commonly and frequently used ("dplyr" masks

[GitHub] spark pull request: [SPARK-11209][SPARKR] Add window functions int...

2015-10-26 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9193#issuecomment-151375443 @shivaram, for your suggestion that "we should a make a list of functions that we mask and are incompatible", I submitted https://issues.apache.org/jira/br

[GitHub] spark pull request: [SPARK-11210][SPARKR][WIP] Add window function...

2015-10-26 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9196#issuecomment-151376631 Rebased to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-11210][SPARKR] Add window functions int...

2015-10-28 Thread sun-rui
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/9196#discussion_r43243744 --- Diff: R/pkg/R/functions.R --- @@ -2111,3 +2133,66 @@ setMethod("ntile", jc <- callJStatic("org.apache.spark.sql.

[GitHub] spark pull request: SPARK-11258 Converting a Spark DataFrame into ...

2015-10-24 Thread sun-rui
Github user sun-rui commented on the pull request: https://github.com/apache/spark/pull/9222#issuecomment-150803696 @FRosner, dfToCols is not a public API, and is now only a helper function for SparkR. Could you add a private modifier for it? --- If your project is set up

  1   2   3   4   5   6   7   8   9   >