[GitHub] spark pull request: [FLAKY-TEST-FIX][STREAMING][TEST] Make sure St...
GitHub user tdas opened a pull request: https://github.com/apache/spark/pull/10124 [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test You can merge this pull request into a Git repository by running: $ git pull https://github.com/tdas/spark InputStreamSuite-flaky-test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10124.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10124 commit a66723e5451b9e001e578fb1bbc56aeeea9ba439 Author: Tathagata Das Date: 2015-12-03T07:58:11Z flaky test fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10263] [ML] Add @Since annotation to ml...
Github user taishi-oss commented on the pull request: https://github.com/apache/spark/pull/8935#issuecomment-161544658 in Pipeline.scala, I have some questions. 1. for example, Pipeline.fit was introduced in v1.2.0. but, signature was changed in v1.4.0 ``` v1.2.0: def fit(dataset: SchemaRDD, paramMap: ParamMap): PipelineModel v1.4.0: fit(dataset: DataFrame): PipelineModel ``` Which should I choose? There are some methods of which signature is changed. for the present, I choose old version. 2. "class PipelineModel"'s public valiable "stages" is private until v.1.4.0 but is public since v.1.4.0. ``` hiro [spark] (master) > git show v1.2.0:mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | grep "val stages: Array\[Transformer\]" private[ml] val stages: Array[Transformer]) hiro [spark] (master) > git show v1.3.0:mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | grep "val stages: Array\[Transformer\]" private[ml] val stages: Array[Transformer]) hiro [spark] (master) > git show v1.4.0:mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | grep "val stages: Array\[Transformer\]" val stages: Array[Transformer]) ``` I choose v.1.4.0 but I think v1.2.0 is also collect. which should I choose? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161544537 @srowen I think we have the consistent idea that to provide both ```getNumPartitions``` and ```numPartitions``` at SparkR side, and mark ```numPartitions``` as deprecated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10263] [ML] Add @Since annotation to ml...
Github user taishi-oss commented on the pull request: https://github.com/apache/spark/pull/8935#issuecomment-161544578 @yu-iskw Sorry for being late, I modify my miss and add anotation to all public class, object, methods, and variables. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Deprecate the JAVA-spe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161543721 **[Test build #47125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47125/consoleFull)** for PR 10092 at commit [`014a3a8`](https://github.com/apache/spark/commit/014a3a8f31958bf1337a0c8df293fe15ac54cd9f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Deprecate the JAVA-spe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161543749 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Deprecate the JAVA-spe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161543752 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47124/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user yanboliang commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161543557 @felixcheung Thanks for your comments. I got it's not an exposed API, but I think to provide consistent function name is necessary especially when we want to expose RDD API someday. I think another solution is to add ```getNumPartitions``` as an alias of ```numPartitions``` which will not cause breaking change, and we can expose ```getNumPartitions``` when we want to expose RDD API. Looking forward to other members comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161543452 Yeah, it's unfortunately because until a change 3 days ago, we had 3 different methods in 4 languages for this simple function. Now everything but R uses `getNumPartitions`. It's not worth breaking something but may be worth adding/deprecating a method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12084][Core]Fix codes that uses ByteBuf...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10083#issuecomment-161543004 **[Test build #47126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47126/consoleFull)** for PR 10083 at commit [`81d1812`](https://github.com/apache/spark/commit/81d18120bff0a772a566ddfe19e439f309b5d5df). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10037#issuecomment-161542798 **[Test build #47127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47127/consoleFull)** for PR 10037 at commit [`3ee7d5c`](https://github.com/apache/spark/commit/3ee7d5c37a0b3815c2ff139964775d23e593837b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10263] [ML] Add @Since annotation to ml...
Github user taishi-oss commented on a diff in the pull request: https://github.com/apache/spark/pull/8935#discussion_r46520868 --- Diff: mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala --- @@ -82,8 +82,11 @@ abstract class PipelineStage extends Params with Logging { * an identity transformer. */ @Experimental -class Pipeline(override val uid: String) extends Estimator[PipelineModel] { +@Since("1.5.0") +class Pipeline( +@Since("1.5.0") override val uid: String) extends Estimator[PipelineModel] { + @Since("1.5.0") --- End diff -- "git show" says Pipeline.this was introduced in v1.4.0. ``` hiro [spark] (master) > git show v1.2.0:mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | grep "def this" hiro [spark] (master) > git show v1.3.0:mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | grep "def this" hiro [spark] (master) > git show v1.4.0:mllib/src/main/scala/org/apache/spark/ml/Pipeline.scala | grep "def this" def this() = this(Identifiable.randomUID("pipeline")) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12084][Core]Fix codes that uses ByteBuf...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10083#issuecomment-161542010 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10092#discussion_r46520645 --- Diff: python/pyspark/storagelevel.py --- @@ -49,12 +51,8 @@ def __str__(self): StorageLevel.DISK_ONLY = StorageLevel(True, False, False, False) StorageLevel.DISK_ONLY_2 = StorageLevel(True, False, False, False, 2) -StorageLevel.MEMORY_ONLY = StorageLevel(False, True, False, True) -StorageLevel.MEMORY_ONLY_2 = StorageLevel(False, True, False, True, 2) -StorageLevel.MEMORY_ONLY_SER = StorageLevel(False, True, False, False) --- End diff -- Agree! Just updated the codes with the deprecated notes. Trying to follow the existing PySpark style. Please check if they are good. : ) Not sure if this will be merged to 1.6. The note is still using 1.6. Thank you! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10037#issuecomment-161541160 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47123/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10037#issuecomment-161541156 **[Test build #47123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47123/consoleFull)** for PR 10037 at commit [`95fdd2c`](https://github.com/apache/spark/commit/95fdd2c5cfe1cf1d5e44c6de677fc52cc361ce19). * This patch **fails R style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10037#issuecomment-161541158 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10037#issuecomment-161540871 **[Test build #47123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47123/consoleFull)** for PR 10037 at commit [`95fdd2c`](https://github.com/apache/spark/commit/95fdd2c5cfe1cf1d5e44c6de677fc52cc361ce19). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12016][MLlib][PySpark] Wrap Word2VecMod...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10100#issuecomment-161539791 **[Test build #47122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47122/consoleFull)** for PR 10100 at commit [`56c250e`](https://github.com/apache/spark/commit/56c250e630a2fdc16809101a34a7eaa8b94e1a9e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10120#issuecomment-161537145 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10120#issuecomment-161537147 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47117/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161537094 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161537095 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47121/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10120#issuecomment-161537070 **[Test build #47117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47117/consoleFull)** for PR 10120 at commit [`5d23a6a`](https://github.com/apache/spark/commit/5d23a6a77d0b5c1259190c1d27f4138ba1a4938d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161537019 **[Test build #47121 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47121/consoleFull)** for PR 10123 at commit [`6870073`](https://github.com/apache/spark/commit/6870073c860dae6806531c949bec040e61ae67d6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11691][SQL] Allow to specify compressio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9657#issuecomment-161536165 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47118/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11691][SQL] Allow to specify compressio...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9657#issuecomment-161536163 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161536076 This is actually not exported from SparkR - since it is first integrated in Spark 1.4, SparkR is exporting a smaller/different set of API. You can see in https://github.com/apache/spark/blob/master/R/pkg/NAMESPACE While it is possible to access this with Spark:::numPartitions(), it has been available since Spark 1.4 so this rename is actually going to be a breaking change (of an internal API). So I'd vote for no change in SparkR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11691][SQL] Allow to specify compressio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9657#issuecomment-161536062 **[Test build #47118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47118/consoleFull)** for PR 9657 at commit [`657ba5a`](https://github.com/apache/spark/commit/657ba5a537a8880abd90480ee30cc28fe3b521bd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10120#discussion_r46518529 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -42,7 +47,18 @@ public void grow(int neededSize, UnsafeRow row) { final int length = totalSize() + neededSize; if (buffer.length < length) { // This will not happen frequently, because the buffer is re-used. - final byte[] tmp = new byte[length * 2]; + final byte[] tmp; + try { +tmp = new byte[length * 2]; + } catch (NegativeArraySizeException e) { +String errorMessage = + "NegativeArraySizeException is triggered. The current length is " + buffer.length + + ". The new length is " + length + " * 2. totalSize is " + totalSize() + --- End diff -- Might be helpful to catch the exception here/move current logging to https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java#L196 and log some more stuff and the entire row. Is the whole thing corrupt? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10120#discussion_r46518443 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -42,7 +47,18 @@ public void grow(int neededSize, UnsafeRow row) { final int length = totalSize() + neededSize; if (buffer.length < length) { // This will not happen frequently, because the buffer is re-used. - final byte[] tmp = new byte[length * 2]; + final byte[] tmp; + try { +tmp = new byte[length * 2]; + } catch (NegativeArraySizeException e) { +String errorMessage = + "NegativeArraySizeException is triggered. The current length is " + buffer.length + + ". The new length is " + length + " * 2. totalSize is " + totalSize() + --- End diff -- Sorry read that too fast. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12016][MLlib][PySpark] Wrap Word2VecMod...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10100#issuecomment-161535500 It makes sense. Thanks. I will update this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10123#issuecomment-161534084 **[Test build #47121 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47121/consoleFull)** for PR 10123 at commit [`6870073`](https://github.com/apache/spark/commit/6870073c860dae6806531c949bec040e61ae67d6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Join nondeterministic
Github user zhonghaihua commented on the pull request: https://github.com/apache/spark/pull/10122#issuecomment-161533958 I am so sorry to create this pull request, this pr is not on the right branch.I will close it right now. This is my mistake, cause trouble, very sorry. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Join nondeterministic
Github user zhonghaihua closed the pull request at: https://github.com/apache/spark/pull/10122 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12115] [SparkR] Change numPartitions() ...
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/10123 [SPARK-12115] [SparkR] Change numPartitions() to getNumPartitions() to be consistent with Scala/Python Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yanboliang/spark spark-12115 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10123.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10123 commit 6870073c860dae6806531c949bec040e61ae67d6 Author: Yanbo Liang Date: 2015-12-03T06:41:23Z Change numPartitions() to getNumPartitions() to be consistent with Scala/Python --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10120#discussion_r46515884 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java --- @@ -42,7 +47,18 @@ public void grow(int neededSize, UnsafeRow row) { final int length = totalSize() + neededSize; if (buffer.length < length) { // This will not happen frequently, because the buffer is re-used. - final byte[] tmp = new byte[length * 2]; + final byte[] tmp; + try { +tmp = new byte[length * 2]; + } catch (NegativeArraySizeException e) { +String errorMessage = + "NegativeArraySizeException is triggered. The current length is " + buffer.length + + ". The new length is " + length + " * 2. totalSize is " + totalSize() + --- End diff -- * 2 is inside quotes. Probably not what you meant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10120#discussion_r46515840 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java --- @@ -445,7 +450,13 @@ public UTF8String getUTF8String(int ordinal) { final long offsetAndSize = getLong(ordinal); final int offset = (int) (offsetAndSize >> 32); final int size = (int) offsetAndSize; -return UTF8String.fromAddress(baseObject, baseOffset + offset, size); +final UTF8String str = UTF8String.fromAddress(baseObject, baseOffset + offset, size); --- End diff -- also check size >= 0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10120#issuecomment-161532033 **[Test build #47120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47120/consoleFull)** for PR 10120 at commit [`c8fc2ec`](https://github.com/apache/spark/commit/c8fc2ec5aea720c6f3b3553f9efa99bc5b545d70). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12019][SPARKR] Support character vector...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10034#issuecomment-161526172 @shivaram we might want this PR in Spark 1.6 ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Join nondeterministic
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10122#issuecomment-161531329 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Join nondeterministic
GitHub user zhonghaihua opened a pull request: https://github.com/apache/spark/pull/10122 Join nondeterministic You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhonghaihua/spark join_nondeterministic Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10122.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10122 commit 6d8ebc801799714d297c83be6935b37e26dc2df7 Author: Xiangrui Meng Date: 2015-08-26T05:35:49Z [SPARK-10243] [MLLIB] update since versions in mllib.tree Same as #8421 but for `mllib.tree`. cc jkbradley Author: Xiangrui Meng Closes #8442 from mengxr/SPARK-10236. (cherry picked from commit fb7e12fe2e14af8de4c206ca8096b2e8113bfddc) Signed-off-by: Xiangrui Meng commit 08d390f457f80ffdc2dfce61ea579d9026047f12 Author: Xiangrui Meng Date: 2015-08-26T05:49:33Z [SPARK-10235] [MLLIB] update since versions in mllib.regression Same as #8421 but for `mllib.regression`. cc freeman-lab dbtsai Author: Xiangrui Meng Closes #8426 from mengxr/SPARK-10235 and squashes the following commits: 6cd28e4 [Xiangrui Meng] update since versions in mllib.regression (cherry picked from commit 4657fa1f37d41dd4c7240a960342b68c7c591f48) Signed-off-by: DB Tsai commit 21a10a86d20ec1a6fea42286b4d2aae9ce7e848d Author: Xiangrui Meng Date: 2015-08-26T06:45:41Z [SPARK-10236] [MLLIB] update since versions in mllib.feature Same as #8421 but for `mllib.feature`. cc dbtsai Author: Xiangrui Meng Closes #8449 from mengxr/SPARK-10236.feature and squashes the following commits: 0e8d658 [Xiangrui Meng] remove unnecessary comment ad70b03 [Xiangrui Meng] update since versions in mllib.feature (cherry picked from commit 321d7759691bed9867b1f0470f12eab2faa50aff) Signed-off-by: DB Tsai commit 5220db9e352b5d5eae59cead9478ca0a9f73f16b Author: felixcheung Date: 2015-08-26T06:48:16Z [SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) Add support for ``` df[df$name == "Smith", c(1,2)] df[df$age %in% c(19, 30), 1:2] ``` shivaram Author: felixcheung Closes #8394 from felixcheung/rsubset. (cherry picked from commit 75d4773aa50e24972c533e8b48697fde586429eb) Signed-off-by: Shivaram Venkataraman commit b0dde36009ce371824ce3e47e60fa0711d7733bb Author: Xiangrui Meng Date: 2015-08-26T18:47:05Z [SPARK-9665] [MLLIB] audit MLlib API annotations I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR doesn't cover Python APIs. cc jkbradley Author: Xiangrui Meng Closes #8452 from mengxr/SPARK-9665. (cherry picked from commit 6519fd06cc8175c9182ef16cf8a37d7f255eb846) Signed-off-by: Joseph K. Bradley commit efbd7af44e855efcbb1fa224e80db24947e2b153 Author: Xiangrui Meng Date: 2015-08-26T21:02:19Z [SPARK-10241] [MLLIB] update since versions in mllib.recommendation Same as #8421 but for `mllib.recommendation`. cc srowen coderxiang Author: Xiangrui Meng Closes #8432 from mengxr/SPARK-10241. (cherry picked from commit 086d4681df3ebfccfc04188262c10482f44553b0) Signed-off-by: Xiangrui Meng commit 0bdb800575ae2872e2655983a1be94dcf2e8c36b Author: Davies Liu Date: 2015-08-26T23:04:44Z [SPARK-10305] [SQL] fix create DataFrame from Python class cc jkbradley Author: Davies Liu Closes #8470 from davies/fix_create_df. (cherry picked from commit d41d6c48207159490c1e1d9cc54015725cfa41b2) Signed-off-by: Davies Liu commit cef707d2185ca7e0c5635fabe709d5e26915b5bb Author: Shivaram Venkataraman Date: 2015-08-27T01:13:07Z [SPARK-10308] [SPARKR] Add %in% to the exported namespace I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine. cc yu-iskw Author: Shivaram Venkataraman Closes #8473 from shivaram/in-namespace. (cherry picked from commit ad7f0f160be096c0fdae6e6cf7e3b6ba4a606de7) Signed-off-by: Shivaram Venkataraman commit 04c85a8ecbb8a27628a7d1260c19531d56d764d3 Author: Cheng Lian Date: 2015-08-27T01:14:54Z [SPARK-9424] [SQL] Parquet programming guide updates for 1.5 Author: Cheng Lian Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5. commit 165be9ad176dcd1c431a6338ff86b339d23b6d0e Author: Shivaram Venkataraman Date: 2015-08-27T05:27:31Z [SPARK-10219] [SPARKR] Fix varargsToEnv and add test case cc sun-rui davies Author: Shivaram Venkataraman Closes #8475 from
[GitHub] spark pull request: [SPARK-12032] [SQL] Re-order inner joins to do...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10073#issuecomment-161529960 @marmbrus @nongli @cloud-fan Is this ready to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10092#discussion_r46516167 --- Diff: python/pyspark/storagelevel.py --- @@ -49,12 +51,8 @@ def __str__(self): StorageLevel.DISK_ONLY = StorageLevel(True, False, False, False) StorageLevel.DISK_ONLY_2 = StorageLevel(True, False, False, False, 2) -StorageLevel.MEMORY_ONLY = StorageLevel(False, True, False, True) -StorageLevel.MEMORY_ONLY_2 = StorageLevel(False, True, False, True, 2) -StorageLevel.MEMORY_ONLY_SER = StorageLevel(False, True, False, False) --- End diff -- Removing these will break backward compatibility, I'd like to deprecate them, explain the difference between Python and Java (say records will always serialized in Python) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161529592 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47119/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161529523 **[Test build #47119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47119/consoleFull)** for PR 10092 at commit [`0e074b6`](https://github.com/apache/spark/commit/0e074b6b7bde1705f04a360273b2915cdc1f383c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161529591 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9026] [SPARK-4514] Modifications to Job...
Github user reggert commented on the pull request: https://github.com/apache/spark/pull/9264#issuecomment-161528329 @zsxwing @JoshRosen I just want to make sure that you guys haven't forgotten about this. I haven't heard anything in a week and a half. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161524007 **[Test build #47119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47119/consoleFull)** for PR 10092 at commit [`0e074b6`](https://github.com/apache/spark/commit/0e074b6b7bde1705f04a360273b2915cdc1f383c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10118#discussion_r46514380 --- Diff: R/pkg/R/DataFrame.R --- @@ -822,21 +822,21 @@ setMethod("collect", # Get a column of complex type returns a list. # Get a cell from a column of complex type returns a list instead of a vector. col <- listCols[[colIndex]] -colName <- dtypes[[colIndex]][[1]] if (length(col) <= 0) { - df[[colName]] <- col + df[[colIndex]] <- col } else { colType <- dtypes[[colIndex]][[2]] # Note that "binary" columns behave like complex types. if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != "binary") { vec <- do.call(c, col) stopifnot(class(vec) != "list") -df[[colName]] <- vec +df[[colIndex]] <- vec } else { -df[[colName]] <- col +df[[colIndex]] <- col } } } + names(df) <- names(x) --- End diff -- I think the current behavior in 1.6 is actually an unintentional change from a recent change in the `collect()` code Matching back to 1.5.x seems to make sense --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/10118#discussion_r46514455 --- Diff: R/pkg/R/DataFrame.R --- @@ -822,21 +822,21 @@ setMethod("collect", # Get a column of complex type returns a list. # Get a cell from a column of complex type returns a list instead of a vector. col <- listCols[[colIndex]] -colName <- dtypes[[colIndex]][[1]] if (length(col) <= 0) { - df[[colName]] <- col + df[[colIndex]] <- col } else { colType <- dtypes[[colIndex]][[2]] # Note that "binary" columns behave like complex types. if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != "binary") { vec <- do.call(c, col) stopifnot(class(vec) != "list") -df[[colName]] <- vec +df[[colIndex]] <- vec } else { -df[[colName]] <- col +df[[colIndex]] <- col } } } + names(df) <- names(x) --- End diff -- I tested with Spark 1.4.1 and 1.5.1, both just have the same names instead of making the duplicated names unique. So this PR's behavior is backward-compatible. Actually, it is very easy to make unique column names, like: ``` names(df) <- make.names(names(x), unique = TRUE) ``` But we need discussion is this preferred behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161522366 Based on the comments of @mateiz , the extra changes are made: - Renaming MEMORY_ONLY_SER to MEMORY_ONLY - Renaming MEMORY_ONLY_SER_2 to MEMORY_ONLY_2 - Renaming MEMORY_AND_DISK_SER to MEMORY_AND_DISK - Renaming MEMORY_AND_DISK_SER_2 to MEMORY_AND_DISK_2 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161522008 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/10118#discussion_r46514167 --- Diff: R/pkg/R/DataFrame.R --- @@ -822,21 +822,21 @@ setMethod("collect", # Get a column of complex type returns a list. # Get a cell from a column of complex type returns a list instead of a vector. col <- listCols[[colIndex]] -colName <- dtypes[[colIndex]][[1]] if (length(col) <= 0) { - df[[colName]] <- col + df[[colIndex]] <- col } else { colType <- dtypes[[colIndex]][[2]] # Note that "binary" columns behave like complex types. if (!is.null(PRIMITIVE_TYPES[[colType]]) && colType != "binary") { vec <- do.call(c, col) stopifnot(class(vec) != "list") -df[[colName]] <- vec +df[[colIndex]] <- vec } else { -df[[colName]] <- col +df[[colIndex]] <- col } } } + names(df) <- names(x) --- End diff -- This is slightly different from 1.5. We will get exact same column names in local data.frame. In Spark 1.5 subsequent instances of the same name are appended with numbers. I am not sure which one is better. In fact I slightly prefer your suggested behavior. But just in case others want to chime in: cc @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCUMENTATION]fix typo
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10121#issuecomment-161521749 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11691][SQL] Allow to specify compressio...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9657#issuecomment-161521736 **[Test build #47118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47118/consoleFull)** for PR 9657 at commit [`657ba5a`](https://github.com/apache/spark/commit/657ba5a537a8880abd90480ee30cc28fe3b521bd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOC]fix typo
GitHub user microwishing opened a pull request: https://github.com/apache/spark/pull/10121 [DOC]fix typo this is to fix some typo in external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala You can merge this pull request into a Git repository by running: $ git pull https://github.com/microwishing/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10121.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10121 commit 46a049fcdfbba4be3ba499b6c68f8faf2d12a989 Author: microwishing Date: 2015-12-03T02:09:05Z fix typo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161521581 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161521582 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47116/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161521520 **[Test build #47116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47116/consoleFull)** for PR 10092 at commit [`a6b7dd9`](https://github.com/apache/spark/commit/a6b7dd95722d80fd68a009fbcf331a586a84fb1a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10120#issuecomment-161521319 **[Test build #47117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47117/consoleFull)** for PR 10120 at commit [`5d23a6a`](https://github.com/apache/spark/commit/5d23a6a77d0b5c1259190c1d27f4138ba1a4938d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12116][SPARKR][DOCS] document how to wo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10119#issuecomment-161520169 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12116][SPARKR][DOCS] document how to wo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10119#issuecomment-161520171 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47115/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12116][SPARKR][DOCS] document how to wo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10119#issuecomment-161519905 **[Test build #47115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47115/consoleFull)** for PR 10119 at commit [`2ce01f3`](https://github.com/apache/spark/commit/2ce01f3e4bf7f67d68d09af7a695e259f12178aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11691][SQL] Allow to specify compressio...
Github user zjffdu commented on the pull request: https://github.com/apache/spark/pull/9657#issuecomment-161519077 Never mind, I change back to 1.7.0 since 1.6 is in rc1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] [DO NOT MERGE] Try to log some useful th...
GitHub user yhuai opened a pull request: https://github.com/apache/spark/pull/10120 [SQL] [DO NOT MERGE] Try to log some useful things to catch the cause of SPARK-12089. This PR adds logs in different place and hopefully we can catch the cause of https://issues.apache.org/jira/browse/SPARK-12089. You can merge this pull request into a Git repository by running: $ git pull https://github.com/yhuai/spark SPARK-12089-log Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10120.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10120 commit 5d23a6a77d0b5c1259190c1d27f4138ba1a4938d Author: Yin Huai Date: 2015-12-03T05:00:09Z Try to log some useful things. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161515703 Just saw the comments and will change the names soon. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12116][SPARKR][DOCS] document how to wo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10119#issuecomment-161515603 **[Test build #47115 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47115/consoleFull)** for PR 10119 at commit [`2ce01f3`](https://github.com/apache/spark/commit/2ce01f3e4bf7f67d68d09af7a695e259f12178aa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161514977 - Removed all the constants whose `deserialized` values are true. - Update the comments of StorageLevel - Change the default storage levels of Kinesis level from `MEMORY_AND_DISK_2` to `MEMORY_AND_DISK_SER_2`. Please verify if my changes are OK. @mateiz @davies Thank you very much! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12091] [PYSPARK] Removal of the JAVA-sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10092#issuecomment-161514887 **[Test build #47116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47116/consoleFull)** for PR 10092 at commit [`a6b7dd9`](https://github.com/apache/spark/commit/a6b7dd95722d80fd68a009fbcf331a586a84fb1a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12116][SPARKR][DOCS] document how to wo...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/10119 [SPARK-12116][SPARKR][DOCS] document how to workaround function name conflicts with dplyr @shivaram You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rdocdplyrmasked Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10119.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10119 commit 2ce01f3e4bf7f67d68d09af7a695e259f12178aa Author: felixcheung Date: 2015-12-03T04:52:02Z add doc --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11439][ML] Optimization of creating spa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9756#issuecomment-161513697 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11439][ML] Optimization of creating spa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9756#issuecomment-161513700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47113/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11439][ML] Optimization of creating spa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9756#issuecomment-161513618 **[Test build #47113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47113/consoleFull)** for PR 9756 at commit [`89d84b8`](https://github.com/apache/spark/commit/89d84b845aa41d133b4ff64e02b97e3a5da965bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161513324 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161513325 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47114/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161513330 looks good --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161513272 **[Test build #47114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47114/consoleFull)** for PR 10118 at commit [`b3c654f`](https://github.com/apache/spark/commit/b3c654f78b650b6b5feb1a4ffe52d015320786cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161511330 **[Test build #47114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47114/consoleFull)** for PR 10118 at commit [`b3c654f`](https://github.com/apache/spark/commit/b3c654f78b650b6b5feb1a4ffe52d015320786cd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/10118#issuecomment-161511220 cc @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12019][SPARKR] Support character vector...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10034#issuecomment-161511097 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47112/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12019][SPARKR] Support character vector...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10034#issuecomment-161511094 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12019][SPARKR] Support character vector...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10034#issuecomment-161511041 **[Test build #47112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47112/consoleFull)** for PR 10034 at commit [`bc98d5b`](https://github.com/apache/spark/commit/bc98d5b1c23b5f8e253d51612fc335cff8e8a519). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12104][SPARKR] collect() does not handl...
GitHub user sun-rui opened a pull request: https://github.com/apache/spark/pull/10118 [SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/sun-rui/spark SPARK-12104 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10118.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10118 commit b3c654f78b650b6b5feb1a4ffe52d015320786cd Author: Sun Rui Date: 2015-12-03T04:12:55Z [SPARK-12104][SPARKR] collect() does not handle multiple columns with same name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12114][SQL]Bug fix for Column Pruning r...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10117#discussion_r46510956 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -246,6 +247,16 @@ object ColumnPruning extends Rule[LogicalPlan] { Project(projectList, Join(pruneJoinChild(left), pruneJoinChild(right), joinType, condition)) +// Eliminate unneeded attributes from either side of a Join. +case Project(projectList, Filter(predicates, Join(left, right, joinType, condition))) => --- End diff -- Is it a problem for all operator under `Filter`? like `Project<-Filter<-Join`, `Project<-Filter<-Aggregate`, `Project<-Filter<-Sort`, etc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11439][ML] Optimization of creating spa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9756#issuecomment-161509507 **[Test build #47113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47113/consoleFull)** for PR 9756 at commit [`89d84b8`](https://github.com/apache/spark/commit/89d84b845aa41d133b4ff64e02b97e3a5da965bc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12019][SPARKR] Support character vector...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10034#issuecomment-161509399 **[Test build #47112 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47112/consoleFull)** for PR 10034 at commit [`bc98d5b`](https://github.com/apache/spark/commit/bc98d5b1c23b5f8e253d51612fc335cff8e8a519). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12019][SPARKR] Support character vector...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10034#issuecomment-161509279 Updated to support both "abc,def" and c("abc", "def") - we would need to remove empty string anyway (""), so not much more work to split by ',' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11439][ML] Optimization of creating spa...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/9756#issuecomment-161508966 Also like +1 on having more of the R code in the tests comments so its easier to regenerate the next time we need to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10037#discussion_r46510599 --- Diff: R/pkg/R/generics.R --- @@ -796,9 +800,17 @@ setGeneric("initcap", function(x) { standardGeneric("initcap") }) #' @export setGeneric("instr", function(y, x) { standardGeneric("instr") }) -#' @rdname isNaN +#' @rdname is.nan #' @export -setGeneric("isNaN", function(x) { standardGeneric("isNaN") }) +setGeneric("is.nan") --- End diff -- please also add a test for base::is.nan if we are masking it. See https://github.com/yanboliang/spark/blob/spark-12044/R/pkg/inst/tests/test_sparkSQL.R#L931 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12044] [SparkR] Fix usage of isnan, isn...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/10037#discussion_r46510538 --- Diff: R/pkg/R/generics.R --- @@ -623,6 +623,10 @@ setGeneric("getItem", function(x, ...) { standardGeneric("getItem") }) #' @rdname column #' @export +setGeneric("isNaN", function(x) { standardGeneric("isNaN") }) --- End diff -- on rel note: see https://issues.apache.org/jira/browse/SPARK-11238 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11439][ML] Optimization of creating spa...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/9756#issuecomment-161508795 LGTM pending tests - maybe @mengxr or @srowen who are two of the more recent committers working in this file could take a look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCUMENTATION][MLLIB] typo in mllib doc
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10093#issuecomment-161508543 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCUMENTATION][MLLIB] typo in mllib doc
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10093#issuecomment-161508545 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47111/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCUMENTATION][MLLIB] typo in mllib doc
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10093#issuecomment-161508473 **[Test build #47111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47111/consoleFull)** for PR 10093 at commit [`024dd5e`](https://github.com/apache/spark/commit/024dd5e98128e0dd23f9317585f6e397794a474a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12032] [SQL] Re-order inner joins to do...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10073#discussion_r46510157 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -133,6 +132,45 @@ object ExtractEquiJoinKeys extends Logging with PredicateHelper { } /** + * A pattern that collects the filter and inner joins. + * + * Filter + *| + *inner Join + * /\> (Seq(plan0, plan1, plan2), conditions) + * Filter plan2 + *| + * inner join + * /\ + * plan0plan1 + * + * Note: This pattern currently only works for left-deep trees. + */ +object ExtractFiltersAndInnerJoins extends PredicateHelper { + + // flatten all inner joins, which are next to each other + def flattenJoin(plan: LogicalPlan): (Seq[LogicalPlan], Seq[Expression]) = plan match { +case Join(left, right, Inner, cond) => + val (plans, conditions) = flattenJoin(left) + (plans ++ Seq(right), conditions ++ cond.toSeq) + +case Filter(filterCondition, j @ Join(left, right, Inner, joinCondition)) => --- End diff -- maybe just `j @ Join(_, _, Inner, _))`, the `left`, `right` and `joinCondition` are not used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11958] [SPARK-11957] [ML] [Doc] SQLTran...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10006#issuecomment-161507440 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47110/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11569] [ML] Fix StringIndexer to handle...
Github user holdenk commented on the pull request: https://github.com/apache/spark/pull/9920#issuecomment-161507452 Sorry for my slow reply - looking at this it seems like you've updated the meaning of handleInvalid - it no longer serves its original purposes (unless I've missed something). This is probably not quite the best path forward - maybe something for handleNulls and keep the old handle invalid? I really like the thoroughness of the tests & I think the logic is pretty solid (just changing the meaning of things in the API is to avoided). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11958] [SPARK-11957] [ML] [Doc] SQLTran...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10006#issuecomment-161507438 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11958] [SPARK-11957] [ML] [Doc] SQLTran...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10006#issuecomment-161507351 **[Test build #47110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47110/consoleFull)** for PR 10006 at commit [`ab44e9a`](https://github.com/apache/spark/commit/ab44e9aa6dc3663757b2cbc0f39508b739dac4e8). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `public class JavaSQLTransformerExample `\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org