[GitHub] spark pull request: [SPARK-5595][SPARK-5603][SQL] Add a rule to do...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4373 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5595][SPARK-5603][SQL] Add a rule to do...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4373#issuecomment-73308129 Thanks! Merged to master and 1.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5648][SQL] support "alter ... unset tbl...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4424#issuecomment-73308025 [Test build #26938 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26938/consoleFull) for PR 4424 at commit [`6dd8bee`](https://github.com/apache/spark/commit/6dd8bee76f9dd1d2257fcd8994c2c6554495d478). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5657][Examples][PySpark] Add PySpark Av...
GitHub user staslos opened a pull request: https://github.com/apache/spark/pull/4434 [SPARK-5657][Examples][PySpark] Add PySpark Avro Output Format example There is an Avro Input Format example that shows how to read Avro data in PySpark, but nothing shows how to write from PySpark to Avro. The main challenge, a Converter needs an Avro schema to build a record, but current Spark API doesn't provide a way to supply extra parameters to custom converters. Provided workaround is possible. https://issues.apache.org/jira/browse/SPARK-5657 You can merge this pull request into a Git repository by running: $ git pull https://github.com/staslos/spark PySpark_Avro_Output_Format_example_Spark_1.3.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4434.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4434 commit ef026be7981c6d892e2d2e35e8b100c9def2dd6a Author: Stanislav Los Date: 2015-02-06T20:33:59Z SPARK-5657 Add PySpark Avro Output Format example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5324][SQL] Results of describe can't be...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4249#issuecomment-73307633 Thanks! Merged to master and 1.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5324][SQL] Results of describe can't be...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4249 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5648][SQL] support "alter ... unset tbl...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4424#issuecomment-73307280 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4414#issuecomment-73307326 Btw, did you mean 1.2.1? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5651][SQL] Support 'create db.table' in...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4427#issuecomment-73307227 I don't think this is valid. You use backticks to escape cases where you have invalid characters like `.` in your identifiers. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4414#issuecomment-73307181 Thanks for the review @JoshRosen. I'll tag the JIRA issue for backport. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5619][SQL] Support 'show roles' in Hive...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4397 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5619][SQL] Support 'show roles' in Hive...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4397#issuecomment-73306881 Thanks! Merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5619][SQL] Support 'show roles' in Hive...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4397#issuecomment-73306120 [Test build #26928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26928/consoleFull) for PR 4397 at commit [`f819b6c`](https://github.com/apache/spark/commit/f819b6c5a5b21ae19529f674a8f2ce960f43c2b1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5619][SQL] Support 'show roles' in Hive...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4397#issuecomment-73306129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26928/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4983]insert waiting time before tagging...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3986#issuecomment-73305908 [Test build #26927 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26927/consoleFull) for PR 3986 at commit [`13e257d`](https://github.com/apache/spark/commit/13e257d94a05c5e48bb1b2f5f6c8e2da195731a2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4905][STREAMING] FlumeStreamSuite fix.
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4371#issuecomment-73305923 [Test build #26937 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26937/consoleFull) for PR 4371 at commit [`af3ba14`](https://github.com/apache/spark/commit/af3ba14ffd8bb506c3ffbcc34d709bc395e8b61b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4983]insert waiting time before tagging...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3986#issuecomment-73305913 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26927/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-73305737 [Test build #26931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26931/consoleFull) for PR 4188 at commit [`8e91cc3`](https://github.com/apache/spark/commit/8e91cc387548b0f59b4ce9e1ff7b108110b190ba). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3233#issuecomment-73305769 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26932/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3233#issuecomment-73305760 [Test build #26932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26932/consoleFull) for PR 3233 at commit [`3f768e3`](https://github.com/apache/spark/commit/3f768e31e9d454522c6bb71be90259fadf4a7071). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-73305744 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26931/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4905][STREAMING] FlumeStreamSuite fix.
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/4371#issuecomment-73305408 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4432#issuecomment-73305358 Looks good, but not too familiar with this class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-73305341 In this case, you are only running SparkSubmit as the proxy user. Should we not have the executor code also run as the proxy user, so any writes from the app to HDFS shows the proxy user - or is that not the intent? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-73305254 [Test build #26930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26930/consoleFull) for PR 4405 at commit [`b6c947d`](https://github.com/apache/spark/commit/b6c947df7131b88455380115088ef7bf336a17f3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` "public class " + className + extendsText + " implements java.io.Serializable ` * ` case class RegisterExecutor(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-73305261 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26930/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4432#discussion_r24268769 --- Diff: mllib/src/test/java/org/apache/spark/mllib/regression/JavaStreamingLinearRegressionSuite.java --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.regression; + +import java.io.Serializable; +import java.util.List; + +import scala.Tuple2; + +import com.google.common.collect.Lists; +import static org.apache.spark.streaming.JavaTestUtils.*; --- End diff -- order --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5633 pyspark saveAsTextFile support for ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4403#issuecomment-73305089 LGTM pending Jenkins; thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5640] Synchronize ScalaReflection where...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4431 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5640] Synchronize ScalaReflection where...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4431#issuecomment-73304864 Thanks! Merged to master and 1.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5650][SQL] Support optional 'FROM' clau...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4426 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5650][SQL] Support optional 'FROM' clau...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4426#issuecomment-73304622 Thanks! Merged to master and 1.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4162#issuecomment-73304383 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26924/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4414#issuecomment-73304352 Actually, I'm going to hold off on the `branch-1.2` (1.2.2) commit for now, since there's a bit of divergence in that branch and I don't want to break anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5366][EC2] Check the mode of private ke...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4162#issuecomment-73304375 [Test build #26924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26924/consoleFull) for PR 4162 at commit [`01ed464`](https://github.com/apache/spark/commit/01ed46488f04f463b45f483bdd3517d135d23e52). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4414 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5628] Add version option to spark-ec2
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4414#issuecomment-73303739 LGTM. I'm going to pull this into `master` (1.4.0) and `branch-1.3` (1.3.0). I'll also commit it to `branch-1.2` (1.2.2), but for that I'll update the version number to match the existing number used in those branches. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73303658 [Test build #26936 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26936/consoleFull) for PR 4067 at commit [`bd919be`](https://github.com/apache/spark/commit/bd919be5817e29dad476213a0b3b407d28ee0f24). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4808] Configurable spillable memory thr...
Github user mingyukim commented on the pull request: https://github.com/apache/spark/pull/4420#issuecomment-73303667 Can you elaborate on the "memory size as an additional heuristic" idea? This is consistently causing OOMs in one of our workflows, which is exactly what spilling to disk is supposed to handle. I'm happy to work on it on my end if you have suggestions. A few ideas off the top of my head are, - Have a threshold on {currentMemory - myMemoryThreshold} value so it tries to spill if the difference gets big enough. - In fact, why not remove the entire threshold check just like how it was originally suggested in #3656? You can tweak how often the spill is done by setting a minimum on the amount of memory you request from ShuffleMemoryManager. Then, you're guaranteed that the spill files are not too small. You still get too many files? Well.. that's unavoidable. Your shuffle is really big, so you'd have to spill a lot. Otherwise, your JVM will OOM. Basically, I don't think trackMemoryThreshold and trackMemoryFrequency are the right way to control your spill frequency or spill file size, since it can lead to OOMs when each element is large. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24268035 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs => throw new SparkException(errs.mkString("\n")), ok => ok ) -new KafkaRDD[K, V, U, T, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) +new KafkaRDD[K, V, KD, VD, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) } - /** A batch-oriented interface for consuming from Kafka. - * Starting and ending offsets are specified in advance, - * so that you can control exactly-once semantics. + /** + * :: Experimental :: + * Create a RDD from Kafka using offset ranges for each topic and partition. This allows you + * specify the Kafka leader to connect to (to optimize fetching) and access the message as well + * as the metadata. + * * @param sc SparkContext object * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> - * configuration parameters. - * Requires "metadata.broker.list" or "bootstrap.servers" to be set with Kafka broker(s), - * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + *configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" + *to be set with Kafka broker(s) (NOT zookeeper servers) specified in + *host1:port1,host2:port2 form. * @param offsetRanges Each OffsetRange in the batch corresponds to a * range of offsets for a given Kafka topic/partition * @param leaders Kafka leaders for each offset range in batch - * @param messageHandler function for translating each message into the desired type + * @param messageHandler function for translating each message and metadata into the desired type */ @Experimental def createRDD[ K: ClassTag, V: ClassTag, -U <: Decoder[_]: ClassTag, -T <: Decoder[_]: ClassTag, -R: ClassTag] ( +KD <: Decoder[K]: ClassTag, +VD <: Decoder[V]: ClassTag, +R: ClassTag]( sc: SparkContext, kafkaParams: Map[String, String], offsetRanges: Array[OffsetRange], leaders: Array[Leader], messageHandler: MessageAndMetadata[K, V] => R - ): RDD[R] = { - +): RDD[R] = { val leaderMap = leaders .map(l => TopicAndPartition(l.topic, l.partition) -> (l.host, l.port)) .toMap -new KafkaRDD[K, V, U, T, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) +new KafkaRDD[K, V, KD, VD, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) } + /** - * This stream can guarantee that each message from Kafka is included in transformations - * (as opposed to output actions) exactly once, even in most failure situations. + * Create a RDD from Kafka using offset ranges for each topic and partition. * - * Points to note: - * - * Failure Recovery - You must checkpoint this stream, or save offsets yourself and provide them - * as the fromOffsets parameter on restart. - * Kafka must have sufficient log retention to obtain messages after failure. - * - * Getting offsets from the stream - see programming guide + * @param jsc JavaSparkContext object + * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> + *configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" + *to be set with Kafka broker(s) (NOT zookeeper servers) specified in + *host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[K, V, KD <: Decoder[K], VD <: Decoder[V]]( + jsc: JavaSparkContext, + keyClass: Class[K], + valueClass: Class[V], + keyDecoderClass: Class[KD], + valueDecoderClass: Class[VD], + kafkaParams: JMap[String, String], + offsetRanges: Array[OffsetRange] +): JavaPairRDD[K, V] = { +implicit val keyCmt: ClassTag[K] = ClassTag(keyClass) +implicit val valueCmt: ClassTag[V] = ClassTag(valueClass) +implicit val keyDecoderCmt: ClassTag[KD] = ClassTag(keyDecoderClass) +implicit val valueDecoderCmt: ClassTag[VD] = ClassTag(valueDecoderClass) +new JavaPairRDD(createRDD[K, V, KD, VD]( + jsc.sc, Map(kafkaParams.toSeq: _*), offsetRanges)) + } + + /** + * :: Experimental :: + * Create a RDD from Kafka using offset ranges for
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24268021 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -19,16 +19,35 @@ package org.apache.spark.streaming.kafka import kafka.common.TopicAndPartition -/** Something that has a collection of OffsetRanges */ +import org.apache.spark.annotation.Experimental + +/** + * :: Experimental :: + * Represents any object that has a collection of [[OffsetRange]]s. This can be used access the + * offset ranges in RDDs generated by the direct Kafka DStream (see + * [[KafkaUtils.createDirectStream()]]). --- End diff -- Good call. Let me add the references. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:Creating different log directories ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4311#issuecomment-73303308 In this particular case we might actually need separate PRs for 1.2 and the Master because the event logs are produced differently there. I wonder if this also applies to standalone mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4874] [CORE] Collect record count metri...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/4067#issuecomment-73303194 Jenkins, test this please. This LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:Creating different log directories ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4311#issuecomment-73303249 [Test build #26935 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26935/consoleFull) for PR 4311 at commit [`5d9eedf`](https://github.com/apache/spark/commit/5d9eedf1731f8e91fdb3ac16e40a6523c453375e). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:Creating different log directories ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4311#issuecomment-73303253 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26935/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:Creating different log directories ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4311#discussion_r24267784 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -88,6 +88,10 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, // Propagate the application ID so that YarnClusterSchedulerBackend can pick it up. System.setProperty("spark.yarn.app.id", appAttemptId.getApplicationId().toString()) + + //Propagate the attempt if, so that in case of event logging, different attempt's logs gets created in different directory --- End diff -- this line is too long and will fail tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5633 pyspark saveAsTextFile support for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4403#issuecomment-73303027 [Test build #26934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26934/consoleFull) for PR 4403 at commit [`94c014e`](https://github.com/apache/spark/commit/94c014e63652c075aa1b2db799429b9eee38cc92). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:Creating different log directories ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4311#issuecomment-73302969 [Test build #26935 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26935/consoleFull) for PR 4311 at commit [`5d9eedf`](https://github.com/apache/spark/commit/5d9eedf1731f8e91fdb3ac16e40a6523c453375e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24267609 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs => throw new SparkException(errs.mkString("\n")), ok => ok ) -new KafkaRDD[K, V, U, T, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) +new KafkaRDD[K, V, KD, VD, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) } - /** A batch-oriented interface for consuming from Kafka. - * Starting and ending offsets are specified in advance, - * so that you can control exactly-once semantics. + /** + * :: Experimental :: + * Create a RDD from Kafka using offset ranges for each topic and partition. This allows you + * specify the Kafka leader to connect to (to optimize fetching) and access the message as well + * as the metadata. + * * @param sc SparkContext object * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> - * configuration parameters. - * Requires "metadata.broker.list" or "bootstrap.servers" to be set with Kafka broker(s), - * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + *configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" + *to be set with Kafka broker(s) (NOT zookeeper servers) specified in + *host1:port1,host2:port2 form. * @param offsetRanges Each OffsetRange in the batch corresponds to a * range of offsets for a given Kafka topic/partition * @param leaders Kafka leaders for each offset range in batch - * @param messageHandler function for translating each message into the desired type + * @param messageHandler function for translating each message and metadata into the desired type */ @Experimental def createRDD[ K: ClassTag, V: ClassTag, -U <: Decoder[_]: ClassTag, -T <: Decoder[_]: ClassTag, -R: ClassTag] ( +KD <: Decoder[K]: ClassTag, +VD <: Decoder[V]: ClassTag, +R: ClassTag]( sc: SparkContext, kafkaParams: Map[String, String], offsetRanges: Array[OffsetRange], leaders: Array[Leader], messageHandler: MessageAndMetadata[K, V] => R - ): RDD[R] = { - +): RDD[R] = { val leaderMap = leaders .map(l => TopicAndPartition(l.topic, l.partition) -> (l.host, l.port)) .toMap -new KafkaRDD[K, V, U, T, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) +new KafkaRDD[K, V, KD, VD, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) } + /** - * This stream can guarantee that each message from Kafka is included in transformations - * (as opposed to output actions) exactly once, even in most failure situations. + * Create a RDD from Kafka using offset ranges for each topic and partition. * - * Points to note: - * - * Failure Recovery - You must checkpoint this stream, or save offsets yourself and provide them - * as the fromOffsets parameter on restart. - * Kafka must have sufficient log retention to obtain messages after failure. - * - * Getting offsets from the stream - see programming guide + * @param jsc JavaSparkContext object + * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> + *configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" + *to be set with Kafka broker(s) (NOT zookeeper servers) specified in + *host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[K, V, KD <: Decoder[K], VD <: Decoder[V]]( + jsc: JavaSparkContext, + keyClass: Class[K], + valueClass: Class[V], + keyDecoderClass: Class[KD], + valueDecoderClass: Class[VD], + kafkaParams: JMap[String, String], + offsetRanges: Array[OffsetRange] +): JavaPairRDD[K, V] = { +implicit val keyCmt: ClassTag[K] = ClassTag(keyClass) +implicit val valueCmt: ClassTag[V] = ClassTag(valueClass) +implicit val keyDecoderCmt: ClassTag[KD] = ClassTag(keyDecoderClass) +implicit val valueDecoderCmt: ClassTag[VD] = ClassTag(valueDecoderClass) +new JavaPairRDD(createRDD[K, V, KD, VD]( + jsc.sc, Map(kafkaParams.toSeq: _*), offsetRanges)) + } + + /** + * :: Experimental :: + * Create a RDD from Kafka using offset ranges for each
[GitHub] spark pull request: [SPARK-5656] Fail gracefully for large values ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4433#issuecomment-73302682 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26929/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5656] Fail gracefully for large values ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4433#issuecomment-73302672 [Test build #26929 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26929/consoleFull) for PR 4433 at commit [`a604816`](https://github.com/apache/spark/commit/a604816b25988f1200758b65a3ae15efbb684de7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24267569 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs => throw new SparkException(errs.mkString("\n")), ok => ok ) -new KafkaRDD[K, V, U, T, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) +new KafkaRDD[K, V, KD, VD, (K, V)](sc, kafkaParams, offsetRanges, leaders, messageHandler) } - /** A batch-oriented interface for consuming from Kafka. - * Starting and ending offsets are specified in advance, - * so that you can control exactly-once semantics. + /** + * :: Experimental :: + * Create a RDD from Kafka using offset ranges for each topic and partition. This allows you + * specify the Kafka leader to connect to (to optimize fetching) and access the message as well + * as the metadata. + * * @param sc SparkContext object * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> - * configuration parameters. - * Requires "metadata.broker.list" or "bootstrap.servers" to be set with Kafka broker(s), - * NOT zookeeper servers, specified in host1:port1,host2:port2 form. + *configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" + *to be set with Kafka broker(s) (NOT zookeeper servers) specified in + *host1:port1,host2:port2 form. * @param offsetRanges Each OffsetRange in the batch corresponds to a * range of offsets for a given Kafka topic/partition * @param leaders Kafka leaders for each offset range in batch - * @param messageHandler function for translating each message into the desired type + * @param messageHandler function for translating each message and metadata into the desired type */ @Experimental def createRDD[ K: ClassTag, V: ClassTag, -U <: Decoder[_]: ClassTag, -T <: Decoder[_]: ClassTag, -R: ClassTag] ( +KD <: Decoder[K]: ClassTag, +VD <: Decoder[V]: ClassTag, +R: ClassTag]( sc: SparkContext, kafkaParams: Map[String, String], offsetRanges: Array[OffsetRange], leaders: Array[Leader], messageHandler: MessageAndMetadata[K, V] => R - ): RDD[R] = { - +): RDD[R] = { val leaderMap = leaders .map(l => TopicAndPartition(l.topic, l.partition) -> (l.host, l.port)) .toMap -new KafkaRDD[K, V, U, T, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) +new KafkaRDD[K, V, KD, VD, R](sc, kafkaParams, offsetRanges, leaderMap, messageHandler) } + /** - * This stream can guarantee that each message from Kafka is included in transformations - * (as opposed to output actions) exactly once, even in most failure situations. + * Create a RDD from Kafka using offset ranges for each topic and partition. * - * Points to note: - * - * Failure Recovery - You must checkpoint this stream, or save offsets yourself and provide them - * as the fromOffsets parameter on restart. - * Kafka must have sufficient log retention to obtain messages after failure. - * - * Getting offsets from the stream - see programming guide + * @param jsc JavaSparkContext object + * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> + *configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" + *to be set with Kafka broker(s) (NOT zookeeper servers) specified in + *host1:port1,host2:port2 form. + * @param offsetRanges Each OffsetRange in the batch corresponds to a + * range of offsets for a given Kafka topic/partition + */ + @Experimental + def createRDD[K, V, KD <: Decoder[K], VD <: Decoder[V]]( + jsc: JavaSparkContext, + keyClass: Class[K], + valueClass: Class[V], + keyDecoderClass: Class[KD], + valueDecoderClass: Class[VD], + kafkaParams: JMap[String, String], + offsetRanges: Array[OffsetRange] +): JavaPairRDD[K, V] = { +implicit val keyCmt: ClassTag[K] = ClassTag(keyClass) +implicit val valueCmt: ClassTag[V] = ClassTag(valueClass) +implicit val keyDecoderCmt: ClassTag[KD] = ClassTag(keyDecoderClass) +implicit val valueDecoderCmt: ClassTag[VD] = ClassTag(valueDecoderClass) +new JavaPairRDD(createRDD[K, V, KD, VD]( + jsc.sc, Map(kafkaParams.toSeq: _*), offsetRanges)) + } + + /** + * :: Experimental :: + * Create a RDD from Kafka using offset ranges for each
[GitHub] spark pull request: [SPARK-5640] Synchronize ScalaReflection where...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4431#issuecomment-73302546 [Test build #26923 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26923/consoleFull) for PR 4431 at commit [`c5da21e`](https://github.com/apache/spark/commit/c5da21ee5a650dcb47117c85651254e4e6c0a5c5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5640] Synchronize ScalaReflection where...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4431#issuecomment-73302558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26923/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4350 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:Creating different log directories ...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4311#issuecomment-73302414 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-73302432 I'll do my best to look at it today---I hope! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5633 pyspark saveAsTextFile support for ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4403#issuecomment-73302426 Jenkins, this is ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5598][MLLIB] model save/load for ALS
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/4422#discussion_r24267412 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -136,3 +147,69 @@ class MatrixFactorizationModel( scored.top(num)(Ordering.by(_._2)) } } + +private object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] { + + import org.apache.spark.mllib.util.Loader._ + + override def load(sc: SparkContext, path: String): MatrixFactorizationModel = { +val (loadedClassName, formatVersion, metadata) = loadMetadata(sc, path) +val classNameV1_0 = SaveLoadV1_0.thisClassName +(loadedClassName, formatVersion) match { + case (className, "1.0") if className == classNameV1_0 => +SaveLoadV1_0.load(sc, path) + case _ => +throw new IOException("" + --- End diff -- I assume it's to make the lines below line up --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5644] [Core]Delete tmp dir when sc is s...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/4412#discussion_r24267365 --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala --- @@ -93,6 +93,14 @@ class SparkEnv ( // actorSystem.awaitTermination() // Note that blockTransferService is stopped by BlockManager since it is started by it. + +// If we only stop sc, but the driver process still run as a services then we need to delete +// the tmp dir, if not, it will create too many tmp dirs +try { + Utils.deleteRecursively(new File(sparkFilesDir)) --- End diff -- I agree; this seems unsafe. It would be a disaster if we accidentally deleted directories that we didn't create, so we can't delete any path that could point to the CWD. Instead, we might be able to either ensure that the CWD is a subfolder of a spark local directory (so it will be cleaned up as part of our baseDir cleanup) or just change `sparkFilesDir` to not download files to the CWD (e.g. create a temporary directory in both the driver and executors). Old versions of the `addFile` API contract said that files would be downloaded to the CWD, but we haven't made that promise since Spark 0.7-ish, I think; we only technically guarantee that SparkFIles.get will return the file paths. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4502] [SQL] Fix reads unnecessary neste...
Github user cenyuhai commented on a diff in the pull request: https://github.com/apache/spark/pull/4398#discussion_r24267316 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/BoundAttribute.scala --- @@ -53,7 +53,7 @@ object BindReferences extends Logging { sys.error(s"Couldn't find $a in ${input.mkString("[", ",", "]")}") } } else { - BoundReference(ordinal, a.dataType, a.nullable) + BoundReference(ordinal, input(ordinal).dataType, a.nullable) --- End diff -- before this, we use ' val ordinal = input.indexWhere(_.exprId == a.exprId)' to find the AttributeReference which is equal to 'a', but the dataType in 'a' is compele, the dataType in input(ordinal) has been cutted in file 'ParquetTableOperations', you can see the 'output' property in case class ParquetTableScan --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3490] Disable SparkUI for tests (backpo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3959#issuecomment-73302243 [Test build #26933 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26933/consoleFull) for PR 3959 at commit [`5425314`](https://github.com/apache/spark/commit/542531483312b77ed941c277f3e05c4ef1867534). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4350#discussion_r24267256 --- Diff: docs/running-on-yarn.md --- @@ -105,6 +105,13 @@ Most of the configs are the same for Spark on YARN as for other deployment modes + spark.executor.instances + 2 + +The number of executors. Don't set this when dynamic allocation is enabled as they are not compatible. --- End diff -- oh wait this is the YARN page. LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2945][YARN][Doc]add doc for spark.execu...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4350#discussion_r24267242 --- Diff: docs/running-on-yarn.md --- @@ -105,6 +105,13 @@ Most of the configs are the same for Spark on YARN as for other deployment modes + spark.executor.instances + 2 + +The number of executors. Don't set this when dynamic allocation is enabled as they are not compatible. --- End diff -- This is only used in YARN. I will add this when I merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: function hasShutdownDeleteTachyonDir should us...
Github user haoyuan commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73301858 Thanks @viper-kun Agree w/ @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5396] Syntax error in spark scripts on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4428#issuecomment-73301647 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26925/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5396] Syntax error in spark scripts on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4428#issuecomment-73301632 [Test build #26925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26925/consoleFull) for PR 4428 at commit [`ec18465`](https://github.com/apache/spark/commit/ec1846579bb0881615d442329101ff80ce61c13d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1825] Fixes cross-platform submit probl...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/899#issuecomment-73301694 Hey @zeodtr I believe this is fixed in #3924 would you mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: function hasShutdownDeleteTachyonDir should us...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4418#issuecomment-73301472 I think we should create a JIRA for this, if only to help us keep track of where this change is committed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-5613: Catch the ApplicationNotFoundExcep...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4392#issuecomment-73301321 Hey @kasjain can you open this against the master branch next time? It will be easier for us to back port stuff from there --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-3490] Disable SparkUI for tests (backpo...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3959#issuecomment-73301377 yes, the tests are still not passing I believe, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4361][Doc] Add more docs for Hadoop Con...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3225 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4502] [SQL] Fix reads unnecessary neste...
Github user cenyuhai commented on a diff in the pull request: https://github.com/apache/spark/pull/4398#discussion_r24266591 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/dataTypes.scala --- @@ -36,6 +36,8 @@ import org.apache.spark.util.Utils object DataType { + private val curId = new java.util.concurrent.atomic.AtomicLong() + def newTypeId = curId.getAndIncrement() --- End diff -- Now we use the AttributeReference to mark a column, but it is not suitable for nested columns, we need to remove some fields in DataType, it is hard to reconstruct the DataType, so I add a new id to uniquely identify the fields. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-73300317 To fix the YARN issue maybe we should do something specific there, like escaping the double quotes before passing them to YARN? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3381] [MLlib] Eliminate bins for unorde...
Github user MechCoder commented on the pull request: https://github.com/apache/spark/pull/4231#issuecomment-73300352 ping @jkbradley ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5598][MLLIB] model save/load for ALS
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4422#discussion_r24266304 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -136,3 +147,69 @@ class MatrixFactorizationModel( scored.top(num)(Ordering.by(_._2)) } } + +private object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] { + + import org.apache.spark.mllib.util.Loader._ + + override def load(sc: SparkContext, path: String): MatrixFactorizationModel = { +val (loadedClassName, formatVersion, metadata) = loadMetadata(sc, path) +val classNameV1_0 = SaveLoadV1_0.thisClassName +(loadedClassName, formatVersion) match { + case (className, "1.0") if className == classNameV1_0 => +SaveLoadV1_0.load(sc, path) + case _ => +throw new IOException("" + --- End diff -- Here can't you just omit the `"" +`? maybe it was just auto-inserted by the IDE on hitting return there. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-73299916 Hey @srowen it seems like this will break existing behavior though. What if I want to run an application with the following arguments ``` a "b c" d ``` and I want them to be parsed as `a`, `b c`, and `d` without the quotes? I don't see a way to do that now but maybe I'm missing something --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4188#issuecomment-73299811 [Test build #26931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26931/consoleFull) for PR 4188 at commit [`8e91cc3`](https://github.com/apache/spark/commit/8e91cc387548b0f59b4ce9e1ff7b108110b190ba). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-73299786 [Test build #26930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26930/consoleFull) for PR 4405 at commit [`b6c947d`](https://github.com/apache/spark/commit/b6c947df7131b88455380115088ef7bf336a17f3). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3233#issuecomment-73299761 [Test build #26932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26932/consoleFull) for PR 3233 at commit [`3f768e3`](https://github.com/apache/spark/commit/3f768e31e9d454522c6bb71be90259fadf4a7071). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4188#discussion_r24266009 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1283,6 +1291,7 @@ private[spark] object Utils extends Logging { if (inWord || inDoubleQuote || inSingleQuote) { endWord() } +println("+++ split command to " + buf) --- End diff -- Oof, sorry, can't believe I left that in! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4432#issuecomment-73298114 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26922/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5601][MLLIB] make streaming linear algo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4432#issuecomment-73298105 [Test build #26922 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26922/consoleFull) for PR 4432 at commit [`1f662b3`](https://github.com/apache/spark/commit/1f662b376a87f2f226759a5e97f8b9afe27c55d7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4267 [CORE] Failing to launch jobs on Sp...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/4188#discussion_r24265176 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -1283,6 +1291,7 @@ private[spark] object Utils extends Logging { if (inWord || inDoubleQuote || inSingleQuote) { endWord() } +println("+++ split command to " + buf) --- End diff -- probably shouldn't print this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24265028 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -36,14 +36,12 @@ import kafka.utils.VerifiableProperties * Starting and ending offsets are specified in advance, * so that you can control exactly-once semantics. * @param kafkaParams Kafka http://kafka.apache.org/documentation.html#configuration";> - * configuration parameters. - * Requires "metadata.broker.list" or "bootstrap.servers" to be set with Kafka broker(s), - * NOT zookeeper servers, specified in host1:port1,host2:port2 form. - * @param batch Each KafkaRDDPartition in the batch corresponds to a - * range of offsets for a given Kafka topic/partition + * configuration parameters. Requires "metadata.broker.list" or "bootstrap.servers" to be set + * with Kafka broker(s) specified in host1:port1,host2:port2 form. --- End diff -- The only reason we were writing "not zookeepers" is to make the difference with the earlier stream clear, for people who want to switch from the old one or the new. That applies to the public API. This is internal private API. I can add it back, no issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5656] Fail gracefully for large values ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4433#issuecomment-73297460 [Test build #26929 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26929/consoleFull) for PR 4433 at commit [`a604816`](https://github.com/apache/spark/commit/a604816b25988f1200758b65a3ae15efbb684de7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4361][Doc] Add more docs for Hadoop Con...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3225#discussion_r24264861 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -630,7 +634,10 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging { * necessary info (e.g. file name for a filesystem-based dataset, table name for HyperTable), * using the older MapReduce API (`org.apache.hadoop.mapred`). * - * @param conf JobConf for setting up the dataset + * @param conf JobConf for setting up the dataset. Note: This will be put into a Broadcast. --- End diff -- Nice find. It seems perfectly reasonable from the user's perspective to just save `sc.hadoopConfiguration` into a val and use it for many things. That's probably what I would have done if I didn't know about the nuances here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4361][Doc] Add more docs for Hadoop Con...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3225#issuecomment-73296838 I think it's safe to say that we won't implement the alternative behavior that @JoshRosen suggested by the release. For this reason I think we should at least document these unexpected behavior for 1.3 in addition to delaying the fix till later. I'm going to merge this into master and 1.3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5652][Mllib] Use broadcasted weights in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4429 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5656] Fail gracefully for large values ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4433#issuecomment-73296645 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5652][Mllib] Use broadcasted weights in...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4429#issuecomment-73296580 Merged into master and branch-1.3. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4361][Doc] Add more docs for Hadoop Con...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3225#discussion_r24264456 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -242,7 +242,11 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging { // the bound port to the cluster manager properly ui.foreach(_.bind()) - /** A default Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse. */ + /** A default Hadoop Configuration for the Hadoop code (e.g. file systems) that we reuse. --- End diff -- really small nit but this should be scaladocs instead of javadocs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/4385#issuecomment-73295741 @florianverhein This is looking good. Have you tested this against a fork named `spark-ec2` as well as a fork named something else? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5656] Fail gracefully for large values ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4433#issuecomment-73295691 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5619][SQL] Support 'show roles' in Hive...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4397#issuecomment-73295760 [Test build #26928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/26928/consoleFull) for PR 4397 at commit [`f819b6c`](https://github.com/apache/spark/commit/f819b6c5a5b21ae19529f674a8f2ce960f43c2b1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5611] [EC2] Allow spark-ec2 repo and br...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/4385#discussion_r24264051 --- Diff: ec2/spark_ec2.py --- @@ -1007,6 +1023,11 @@ def real_main(): print >> stderr, "ebs-vol-num cannot be greater than 8" sys.exit(1) +# Prevent breaking ami_prefix --- End diff -- If we're validating this input, perhaps we should also check that the repo string starts with "https://github.com";. Sounds good? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5555] Enable UISeleniumSuite tests
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4334 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5619][SQL] Support 'show roles' in Hive...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4397#issuecomment-73295369 ok to test @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5656] Fail gracefully for large values ...
GitHub user mbittmann opened a pull request: https://github.com/apache/spark/pull/4433 [SPARK-5656] Fail gracefully for large values of k and/or n that will ex... ...ceed max int. Large values of k and/or n in EigenValueDecomposition.symmetricEigs will result in array initialization to a value larger than Integer.MAX_VALUE in the following: var v = new Array[Double](n * ncv) You can merge this pull request into a Git repository by running: $ git pull https://github.com/mbittmann/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4433 commit a604816b25988f1200758b65a3ae15efbb684de7 Author: bittmannm Date: 2015-02-06T19:12:51Z [SPARK-5656] Fail gracefully for large values of k and/or n that will exceed max int. Large values of k and/or n in EigenValueDecomposition.symmetricEigs will result in array initialization to a value larger than Integer.MAX_VALUE in the following: var v = new Array[Double](n * ncv) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5555] Enable UISeleniumSuite tests
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4334#issuecomment-73295273 I'm merging this into `master` (1.4.0) and `branch-1.3` (1.3.0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org