[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63021620 @mengxr I got same result with you (using your test code), I will update the results in description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3262#issuecomment-63021308 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23354/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3262#issuecomment-63021306 [Test build #23354 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23354/consoleFull) for PR 3262 at commit [`1eda9e4`](https://github.com/apache/spark/commit/1eda9e4921617bc71acf2bb502cf3a22ee43c41f). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3239#issuecomment-63021309 [Test build #23355 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23355/consoleFull) for PR 3239 at commit [`dfdb3d9`](https://github.com/apache/spark/commit/dfdb3d957a17e10911eb144ca992077db7837ec2). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3262#issuecomment-63021287 [Test build #23354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23354/consoleFull) for PR 3262 at commit [`1eda9e4`](https://github.com/apache/spark/commit/1eda9e4921617bc71acf2bb502cf3a22ee43c41f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3262#issuecomment-63021026 /cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/3197#issuecomment-63021064 +1 on statusTracker --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63020961 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23351/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397][Core] Reorganize 'implicit's to i...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/3262 [SPARK-4397][Core] Reorganize 'implicit's to improve the API convenience This PR moved `implicit`s to `package object` and `companion object` to enable the Scala compiler search them automatically without explicit importing. It should not break any API. I compiled the following codes with Spark 1.1.0: ```scala import org.apache.spark.{SparkContext, SparkConf} import org.apache.spark.SparkContext._ object ImplicitBackforwardCompatibilityApp { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("ImplicitBackforwardCompatibilityApp") val sc = new SparkContext(conf) val rdd = sc.parallelize(1 to 100).map(i => (i, i)) val rdd2 = rdd.groupByKey() // rddToPairRDDFunctions val rdd3 = rdd2.sortByKey() // rddToOrderedRDDFunctions val s1 = rdd3.map(_._1).stats() // numericRDDToDoubleRDDFunctions println(s1) val s2 = rdd3.map(_._1.toDouble).stats() // doubleRDDToDoubleRDDFunctions println(s2) val f = rdd2.countAsync() // rddToAsyncRDDActions println(f.get()) rdd2.map { case (k, v) => (k, v.size) } saveAsSequenceFile("/tmp/test_path") // rddToSequenceFileRDDFunctions val a1 = sc.accumulator(123.4) // DoubleAccumulatorParam a1.add(1.0) println(a1.value) val a2 = sc.accumulator(123) // IntAccumulatorParam a2.add(3) println(a2.value) val a3 = sc.accumulator(123L) // LongAccumulatorParam a3.add(11L) println(a3.value) val a4 = sc.accumulator(123F) // FloatAccumulatorParam a4.add(1.1F) println(a4.value) sc.stop() } } ``` And run it with this PR. It ran correctly. However, for `WritableConverter`, I cannot make it work without `import`. Thoughts? You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-4397 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3262.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3262 commit 1eda9e4921617bc71acf2bb502cf3a22ee43c41f Author: zsxwing Date: 2014-11-14T07:35:02Z Reorganize 'implicit's to improve the API convenience --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3197#discussion_r20346415 --- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala --- @@ -140,3 +103,10 @@ private[spark] trait SparkStatusAPI { this: SparkContext => } } } + +private[spark] object SparkStatusAPI { --- End diff -- Can't we just make the constructor package private? It is really awkward to me that you have to create a factory for this. If you really want a factory, I'd use something other than apply. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63020954 [Test build #23351 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23351/consoleFull) for PR 3193 at commit [`51649f5`](https://github.com/apache/spark/commit/51649f5e5b29ab8db1c6c3fd91c6f625124ab327). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RDDRangeSampler(RDDSamplerBase):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3239#issuecomment-63020796 Updated the doc - it seems like there's actually not a ton more to say, but let me know if I missed anything. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63019228 @davies Did you only measure the `rdd.sample(...).count()`? Sampling 1 million took about 0.6s without replacement and 2.5s with replacement on my computer. I think we use the same macbook model or yours is better:) Maybe part of the time in your case was spent on broadcasting the rdd. Could you try the following: ~~~ from pyspark.mllib.random import RandomRDDs rdd = RandomRDDs.uniformRDD(sc, 1 << 20, 1).cache() rdd.count() rdd.sample(True, 0.9).count() ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3255#discussion_r20345609 --- Diff: python/pyspark/context.py --- @@ -191,7 +192,13 @@ def _do_init(self, master, appName, sparkHome, pyFiles, environment, batchSize, self._temp_dir = \ self._jvm.org.apache.spark.util.Utils.createTempDir(local_dir).getAbsolutePath() + # profiling stats collected for each PythonRDD +if self._conf.get("spark.python.profile", "false") == "true": +self.profiler = profiler if profiler else BasicProfiler +else: +self.profiler = None + self._profile_stats = [] --- End diff -- Maybe we could also move `_profile_stats` into Profiler, then the interface of Profiler will simpler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3255#discussion_r20345514 --- Diff: python/pyspark/profiler.py --- @@ -0,0 +1,108 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +>>> from pyspark.context import SparkContext +>>> from pyspark.conf import SparkConf +>>> from pyspark.profiler import BasicProfiler +>>> class MyCustomProfiler(BasicProfiler): --- End diff -- In order to have this as an example in API docs, it need to be moved into BasicProfiler Also, import BasicProfiler into pyspark/__init__.py you can build the API docs by ``` $ cd python/docs/ $ make html $ open _build/html/index.html ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor cleanup of comments, errors and ov...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3257#issuecomment-63017979 [Test build #23353 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23353/consoleFull) for PR 3257 at commit [`d8b5abc`](https://github.com/apache/spark/commit/d8b5abcd61b6f96be23f21c89baaf926fb0cf185). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-63017550 [Test build #23352 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23352/consoleFull) for PR 2991 at commit [`5461f1c`](https://github.com/apache/spark/commit/5461f1c43b0e98aa7b583f14569eefd833b19df0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-63017519 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3255#discussion_r20345324 --- Diff: python/pyspark/profiler.py --- @@ -0,0 +1,108 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +>>> from pyspark.context import SparkContext +>>> from pyspark.conf import SparkConf +>>> from pyspark.profiler import BasicProfiler +>>> class MyCustomProfiler(BasicProfiler): +... @staticmethod +... def show_profiles(profilers): +... print "My custom profiles" +... +>>> conf = SparkConf().set("spark.python.profile", "true") +>>> sc = SparkContext('local', 'test', conf=conf, profiler=MyCustomProfiler) +>>> sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10) +[0, 2, 4, 6, 8, 10, 12, 14, 16, 18] +>>> sc.show_profiles() +My custom profiles +>>> sc.stop() +""" + + +import cProfile +import pstats +import os +from pyspark.accumulators import PStatsParam + + +class BasicProfiler(object): +""" + +:: DeveloperApi :: --- End diff -- Yes :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3255#discussion_r20345305 --- Diff: python/pyspark/profiler.py --- @@ -0,0 +1,108 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +>>> from pyspark.context import SparkContext +>>> from pyspark.conf import SparkConf +>>> from pyspark.profiler import BasicProfiler +>>> class MyCustomProfiler(BasicProfiler): +... @staticmethod +... def show_profiles(profilers): +... print "My custom profiles" --- End diff -- indent with 4 spaces here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63017353 @mengxr I had simplified RDDSample by removing numpy, the reason has been updated in the description of this PR, please re-review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10 and...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3239#issuecomment-63017250 Hey Sandy - I think this looks good. I wasn't able to get it to succeed locally. but I think something is messed up with my local environment since even the master build isn't working. Could you add the relevant documentation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63016854 [Test build #23351 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23351/consoleFull) for PR 3193 at commit [`51649f5`](https://github.com/apache/spark/commit/51649f5e5b29ab8db1c6c3fd91c6f625124ab327). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1977][MLLIB] use immutable BitSet in AL...
Github user aaronlin commented on the pull request: https://github.com/apache/spark/pull/925#issuecomment-63015981 twitter/chill#185 fixed in chill v0.4.0, but spark still depends on chill v0.3.6 in maven. http://mvnrepository.com/artifact/org.apache.spark/spark-core_2.10/1.1.0 Can anyone help to fix it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3261#discussion_r20344792 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -23,6 +23,16 @@ class Rating(object): --- End diff -- Also saw a result on performance: http://stackoverflow.com/questions/2646157/what-is-the-fastest-to-access-struct-like-object-in-python ~~~ namedtuple.a : 0.473686933517 namedtuple[0] : 0.180409193039 struct.a : 0.180846214294 struct[0] : 1.32191514969 ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/3261#discussion_r20344712 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -23,6 +23,16 @@ class Rating(object): --- End diff -- Where to put `int(user)`, `int(product)`, and `float(rating)` then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-63015296 The implementation looks good to me. @JoshRosen Do you want to take another pass? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3197#discussion_r20344507 --- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala --- @@ -140,3 +103,10 @@ private[spark] trait SparkStatusAPI { this: SparkContext => } } } + +private[spark] object SparkStatusAPI { --- End diff -- The goal here was to hide this class's constructor from users so that we're free to change it later. I think that making constructors part of public APIs is a bad idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3197#discussion_r20344484 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkStatusAPI.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.java + +import org.apache.spark.{SparkStageInfo, SparkJobInfo, SparkContext} + +/** + * Low-level status reporting APIs for monitoring job and stage progress. + * + * These APIs intentionally provide very weak consistency semantics; consumers of these APIs should + * be prepared to handle empty / missing information. For example, a job's stage ids may be known + * but the status API may not have any information about the details of those stages, so + * `getStageInfo` could potentially return `null` for a valid stage id. + * + * To limit memory usage, these APIs only provide information on recent jobs / stages. These APIs + * will provide information for the last `spark.ui.retainedStages` stages and + * `spark.ui.retainedJobs` jobs. + */ +class JavaSparkStatusAPI private (sc: SparkContext) { --- End diff -- There's one subtle difference: some of the Java methods return nullable values instead of Options. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3197#discussion_r20344165 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -228,6 +229,8 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging { private[spark] val jobProgressListener = new JobProgressListener(conf) listenerBus.addListener(jobProgressListener) + val statusAPI = SparkStatusAPI(this) --- End diff -- +1 just status would be better i think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3197#discussion_r20344140 --- Diff: core/src/main/scala/org/apache/spark/SparkStatusAPI.scala --- @@ -140,3 +103,10 @@ private[spark] trait SparkStatusAPI { this: SparkContext => } } } + +private[spark] object SparkStatusAPI { --- End diff -- why bother having this? we can just do new --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/3197#discussion_r20344154 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaSparkStatusAPI.scala --- @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.api.java + +import org.apache.spark.{SparkStageInfo, SparkJobInfo, SparkContext} + +/** + * Low-level status reporting APIs for monitoring job and stage progress. + * + * These APIs intentionally provide very weak consistency semantics; consumers of these APIs should + * be prepared to handle empty / missing information. For example, a job's stage ids may be known + * but the status API may not have any information about the details of those stages, so + * `getStageInfo` could potentially return `null` for a valid stage id. + * + * To limit memory usage, these APIs only provide information on recent jobs / stages. These APIs + * will provide information for the last `spark.ui.retainedStages` stages and + * `spark.ui.retainedJobs` jobs. + */ +class JavaSparkStatusAPI private (sc: SparkContext) { --- End diff -- can we conslidate the java and the scala class? it seems to me you are only using arrays, so it should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Several progress API improvements / refactorin...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3197#issuecomment-63013790 Okay I talked offline with @kayousterhout and the best name we could come up with was the following: ``` class SparkStatusTracker ... val statusTracker = new SparkStatusTracker(this) ``` IMO this is nicer than the current name since `API` is sort of implicit in the fact that this is an exposed class (i.e. in some sense everything is an API). The name "Tracker" implies that this is an object that actively is tracking changes. So this is my favorite option. I also thing `SparkStatus` and `val status` is alright. Both of these I prefer to the current naming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Scala 2.11
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/3181 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2624 add datanucleus jars to the contain...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/3238#discussion_r20343870 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -223,6 +224,29 @@ private[spark] trait ClientBase extends Logging { } } } + +/** + * Do the same for datanucleus jars, if they exist in spark home. Find all datanucleus-* jars, + * copy them to the remote fs, and add them to the class path. + */ +val sparkHomeOpt = sparkConf.getOption("spark.home").orElse(sys.env.get("SPARK_HOME")) +for (sparkHome <- sparkHomeOpt) { + val libs = sparkHome + Path.SEPARATOR + "lib" + val jars = new File(libs).listFiles(new FilenameFilter() { +override def accept(dir: File, name: String) = name.startsWith("datanucleus-") + }) + // copy to remote and add to classpath + jars.foreach { jar => --- End diff -- Isn't it because of licensing? Datanucleus is LGPL or similar? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3204#issuecomment-63010646 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23350/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3204#issuecomment-63010643 [Test build #23350 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23350/consoleFull) for PR 3204 at commit [`c4ed549`](https://github.com/apache/spark/commit/c4ed549f8ef6cc22dce50be2ad418ee9a9211b19). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/3261#discussion_r20342470 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -23,6 +23,16 @@ class Rating(object): --- End diff -- I think this can be simplified as ``` class Rating(namedtuple('user', 'product', 'rating')): ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3261#issuecomment-63007855 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23349/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3261#issuecomment-63007848 [Test build #23349 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23349/consoleFull) for PR 3261 at commit [`d3bd7d4`](https://github.com/apache/spark/commit/d3bd7d41fa1623e5eb368bc6af3711769d1a27e7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-63007410 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23348/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-63007406 [Test build #23348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23348/consoleFull) for PR 1567 at commit [`89e37d8`](https://github.com/apache/spark/commit/89e37d82d72ac614af0efcd810ac4b9b034d4253). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class GroupExpression(children: Seq[Expression]) extends Expression ` * `case class Explosive(` * `trait GroupingSets extends UnaryNode ` * `case class GroupingSet(` * `case class Cube(` * `case class Rollup(` * `case class Explosive(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3204#issuecomment-63005533 [Test build #23350 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23350/consoleFull) for PR 3204 at commit [`c4ed549`](https://github.com/apache/spark/commit/c4ed549f8ef6cc22dce50be2ad418ee9a9211b19). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user udnay commented on the pull request: https://github.com/apache/spark/pull/3255#issuecomment-63005535 I believe I took care of your comments/concerns. Could you have another look when you get a chance? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4387][PySpark] Refactoring python profi...
Github user udnay commented on a diff in the pull request: https://github.com/apache/spark/pull/3255#discussion_r20341026 --- Diff: python/pyspark/profiler.py --- @@ -0,0 +1,108 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +""" +>>> from pyspark.context import SparkContext +>>> from pyspark.conf import SparkConf +>>> from pyspark.profiler import BasicProfiler +>>> class MyCustomProfiler(BasicProfiler): +... @staticmethod +... def show_profiles(profilers): +... print "My custom profiles" +... +>>> conf = SparkConf().set("spark.python.profile", "true") +>>> sc = SparkContext('local', 'test', conf=conf, profiler=MyCustomProfiler) +>>> sc.parallelize(list(range(1000))).map(lambda x: 2 * x).take(10) +[0, 2, 4, 6, 8, 10, 12, 14, 16, 18] +>>> sc.show_profiles() +My custom profiles +>>> sc.stop() +""" + + +import cProfile +import pstats +import os +from pyspark.accumulators import PStatsParam + + +class BasicProfiler(object): +""" + +:: DeveloperApi :: --- End diff -- @davies Is this how to mark it as a developer API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4214. With dynamic allocation, avoid out...
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3204#issuecomment-63005126 Updated patch addresses review comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3112#issuecomment-63004489 [Test build #23347 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23347/consoleFull) for PR 3112 at commit [`ed1a25c`](https://github.com/apache/spark/commit/ed1a25c85b2c80802f29700f363b9ef05721b395). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3112#issuecomment-63004493 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23347/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3261#issuecomment-63004032 [Test build #23349 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23349/consoleFull) for PR 3261 at commit [`d3bd7d4`](https://github.com/apache/spark/commit/d3bd7d41fa1623e5eb368bc6af3711769d1a27e7). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3243#issuecomment-63003925 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23346/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3243#issuecomment-63003920 [Test build #23346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23346/consoleFull) for PR 3243 at commit [`4653378`](https://github.com/apache/spark/commit/4653378fb6addfdf4fb21e4e75570c163d601bfb). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4396] allow lookup by index in Python's...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/3261 [SPARK-4396] allow lookup by index in Python's Rating In PySpark, ALS can take an RDD of (user, product, rating) tuples as input. However, model.predict outputs an RDD of Rating. So on the input side, users can use r[0], r[1], r[2], while on the output side, users have to use r.user, r.product, r.rating. We should allow lookup by index in Rating. @davies You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark SPARK-4396 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3261 commit d3bd7d41fa1623e5eb368bc6af3711769d1a27e7 Author: Xiangrui Meng Date: 2014-11-14T02:51:20Z allow lookup by index in Python's Rating --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-63002982 [Test build #23345 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23345/consoleFull) for PR 2991 at commit [`5461f1c`](https://github.com/apache/spark/commit/5461f1c43b0e98aa7b583f14569eefd833b19df0). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-63002988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23345/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2663] [SQL] Support the Grouping Set
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1567#issuecomment-63002602 [Test build #23348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23348/consoleFull) for PR 1567 at commit [`89e37d8`](https://github.com/apache/spark/commit/89e37d82d72ac614af0efcd810ac4b9b034d4253). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4313][WebUI][Yarn] Fix link issue of th...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3183#issuecomment-63001688 @JoshRosen not only executor id, but also any string will appear in the URL, should pay attention to `%`. However, I don't know a proper place to add such general docs. Other suggestion to this PR, or it's fine to merge? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3259#issuecomment-63001443 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23342/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3259#issuecomment-63001434 [Test build #23342 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23342/consoleFull) for PR 3259 at commit [`3200c33`](https://github.com/apache/spark/commit/3200c33363d8daed187ecd10f7b5fc370d44f349). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3260#issuecomment-63000880 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23343/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3260#issuecomment-63000877 [Test build #23343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23343/consoleFull) for PR 3260 at commit [`9a5e171`](https://github.com/apache/spark/commit/9a5e17166c5f8c75f067846ec5f515db0857f1ea). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class InSet(value: Expression, hset: Set[Any])` * `case class In(attribute: String, values: Array[Any]) extends Filter` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4379][Core] Change Exception to SparkEx...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3241#issuecomment-63000329 I'm sorry that I should have been clearer when I said the breaking change. I'm worried about the following case: ``` try { rdd.checkpoint() } catch { case e: SparkException => // do work A case e: Exception => do work B } ``` It breaks such case. However, I think few people will write such code. Therefore, does Spark view such change as a breaking change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4379][Core] Change Exception to SparkEx...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3241#issuecomment-6373 Exception won't be in the method signature. ```scala scala> import java.io.IOException import java.io.IOException scala> class A { | @throws[IOException] | def foo() { | throw new IOException("error!") | } | | def bar(): Unit = { | foo() | } | } defined class A scala> :javap -private -c A Compiled from "" public class A extends java.lang.Object{ public void foo() throws java.io.IOException; Code: 0: new #9; //class java/io/IOException 3: dup 4: ldc #11; //String error! 6: invokespecial #15; //Method java/io/IOException."":(Ljava/lang/String;)V 9: athrow public void bar(); Code: 0: aload_0 1: invokevirtual #20; //Method foo:()V 4: return public A(); Code: 0: aload_0 1: invokespecial #22; //Method java/lang/Object."":()V 4: return } ``` In the `bar()`, the instruction is `1: invokevirtual #20; //Method foo:()V`. The method is still `foo:()V`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10 and...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3239#issuecomment-62999337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23339/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4375. no longer require -Pscala-2.10 and...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3239#issuecomment-62999331 [Test build #23339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23339/consoleFull) for PR 3239 at commit [`587f671`](https://github.com/apache/spark/commit/587f6713d0a4a0e6727807ff432e334bf08eeb4a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3112#issuecomment-62998569 @JoshRosen I have reverted the dot which I think is produced in modify comments. And the blank between `!` and `args.isEmpty` in `ApplicationMasterArguments` is unnecessary so I keep the change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-62998448 Hi TD, this test is so flaky, it fails several times in my local test: ``` - block addition, block to batch allocation and cleanup with write ahead log *** FAILED *** (21 milliseconds) [info] java.io.FileNotFoundException: File /tmp/1415929501402-0/receivedBlockMetadata/log-0-1000 does not exist. [info] at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397) [info] at org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:324) [info] at org.apache.spark.streaming.util.WriteAheadLogSuite$.getLogFilesInDirectory(WriteAheadLogSuite.scala:344) [info] at org.apache.spark.streaming.ReceivedBlockTrackerSuite.getWriteAheadLogFiles(ReceivedBlockTrackerSuite.scala:226) [info] at org.apache.spark.streaming.ReceivedBlockTrackerSuite$$anonfun$4.apply$mcV$sp(ReceivedBlockTrackerSuite.scala:171) [info] at org.apache.spark.streaming.ReceivedBlockTrackerSuite$$anonfun$4.apply(ReceivedBlockTrackerSuite.scala:96) [info] at org.apache.spark.streaming.ReceivedBlockTrackerSuite$$anonfun$4.apply(ReceivedBlockTrackerSuite.scala:96) [info] at ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3112#issuecomment-62998382 [Test build #23347 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23347/consoleFull) for PR 3112 at commit [`ed1a25c`](https://github.com/apache/spark/commit/ed1a25c85b2c80802f29700f363b9ef05721b395). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1812] Scala 2.11 support.
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/3111 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3258#issuecomment-62997903 [Test build #23340 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23340/consoleFull) for PR 3258 at commit [`15e9a98`](https://github.com/apache/spark/commit/15e9a98e75928915b9a2c2c1c02d88bba3756485). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3258#issuecomment-62997907 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23340/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4390][SQL] Handle NaN cast to decimal c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3256#issuecomment-62997844 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23341/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4390][SQL] Handle NaN cast to decimal c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3256#issuecomment-62997840 [Test build #23341 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23341/consoleFull) for PR 3256 at commit [`4c3ba46`](https://github.com/apache/spark/commit/4c3ba4617716bbc3bee95ff258ace2661b60f136). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3243#issuecomment-62997771 [Test build #23346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23346/consoleFull) for PR 3243 at commit [`4653378`](https://github.com/apache/spark/commit/4653378fb6addfdf4fb21e4e75570c163d601bfb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3722][Docs]minor improvement and fix in...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/2579#issuecomment-62997806 @tgravescs Is it ok to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-62997454 Lets see if this passes jenkins, I hadnt tried that yet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-62996965 [Test build #23345 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23345/consoleFull) for PR 2991 at commit [`5461f1c`](https://github.com/apache/spark/commit/5461f1c43b0e98aa7b583f14569eefd833b19df0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3243#issuecomment-62996707 [Test build #23344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23344/consoleFull) for PR 3243 at commit [`e9145e8`](https://github.com/apache/spark/commit/e9145e8ac6798bb9e2587e2eb67da6209456840f). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NettyBlockTransferService(conf: SparkConf, securityManager: SecurityManager, numCores: Int)` * `public class JavaSimpleTextClassificationPipeline ` * `case class LabeledDocument(id: Long, text: String, label: Double)` * `case class Document(id: Long, text: String)` * `abstract class EdgeRDD[ED, VD](` * `abstract class VertexRDD[VD](` * `abstract class Estimator[M <: Model[M]] extends PipelineStage with Params ` * `abstract class Evaluator extends Identifiable ` * `abstract class Model[M <: Model[M]] extends Transformer ` * `abstract class PipelineStage extends Serializable with Logging ` * `class Pipeline extends Estimator[PipelineModel] ` * `abstract class Transformer extends PipelineStage with Params ` * `class LogisticRegression extends Estimator[LogisticRegressionModel] with LogisticRegressionParams ` * `class HashingTF extends UnaryTransformer[Iterable[_], Vector, HashingTF] ` * `class StandardScaler extends Estimator[StandardScalerModel] with StandardScalerParams ` * `class Tokenizer extends UnaryTransformer[String, Seq[String], Tokenizer] ` * `class Param[T] (` * `class DoubleParam(parent: Params, name: String, doc: String, defaultValue: Option[Double] = None)` * `class IntParam(parent: Params, name: String, doc: String, defaultValue: Option[Int] = None)` * `class FloatParam(parent: Params, name: String, doc: String, defaultValue: Option[Float] = None)` * `class LongParam(parent: Params, name: String, doc: String, defaultValue: Option[Long] = None)` * `class BooleanParam(parent: Params, name: String, doc: String, defaultValue: Option[Boolean] = None)` * `case class ParamPair[T](param: Param[T], value: T)` * `trait Params extends Identifiable with Serializable ` * `class CrossValidator extends Estimator[CrossValidatorModel] with CrossValidatorParams with Logging ` * `class ParamGridBuilder ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3243#issuecomment-62996710 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23344/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4092] [CORE] Fix InputMetrics for coale...
Github user ash211 commented on the pull request: https://github.com/apache/spark/pull/3120#issuecomment-62996573 @ksakellis it looks like this has a merge conflict now -- would you mind updating this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3243#issuecomment-62996516 [Test build #23344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23344/consoleFull) for PR 3243 at commit [`e9145e8`](https://github.com/apache/spark/commit/e9145e8ac6798bb9e2587e2eb67da6209456840f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark Core] SPARK-4380 Edit spilling log from...
Github user shenh062326 commented on a diff in the pull request: https://github.com/apache/spark/pull/3243#discussion_r20337096 --- Diff: core/src/main/scala/org/apache/spark/util/collection/Spillable.scala --- @@ -105,7 +105,7 @@ private[spark] trait Spillable[C] { */ @inline private def logSpillage(size: Long) { val threadId = Thread.currentThread().getId -logInfo("Thread %d spilling in-memory map of %d MB to disk (%d time%s so far)" -.format(threadId, size / (1024 * 1024), _spillCount, if (_spillCount > 1) "s" else "")) +logInfo("Thread %d spilling in-memory map of %d B to disk (%d time%s so far)" --- End diff -- Thanks Srowen, change to Utils.bytesToString. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3259#discussion_r20336546 --- Diff: core/src/main/scala/org/apache/spark/network/nio/ConnectionManager.scala --- @@ -913,8 +918,10 @@ private[nio] class ConnectionManager( } } +val timoutTaskHandle = ackTimeoutMonitor.newTimeout(timeoutTask, ackTimeout, TimeUnit.SECONDS) --- End diff -- timout? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3259#issuecomment-62995861 LGTM aside from typo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4079] [CORE] Consolidates Errors if a C...
Github user ksakellis commented on the pull request: https://github.com/apache/spark/pull/3119#issuecomment-62995656 @pwendell Can you please trigger the jenkins tests for this pr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3260#issuecomment-62995573 [Test build #23343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23343/consoleFull) for PR 3260 at commit [`9a5e171`](https://github.com/apache/spark/commit/9a5e17166c5f8c75f067846ec5f515db0857f1ea). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4394][SQL] Data Sources API Improvement...
GitHub user marmbrus opened a pull request: https://github.com/apache/spark/pull/3260 [SPARK-4394][SQL] Data Sources API Improvements This PR adds two features to the data sources API: - Support for pushing down `IN` filters - The ability for relations to optionally provide information about their `sizeInBytes`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/marmbrus/spark sourcesImprovements Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3260.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3260 commit 2a04ab3deaa989738ef77b9e70dd00bba6ae4d1e Author: Michael Armbrust Date: 2014-11-14T00:59:46Z Simplify implementation of InSet. commit 416f167cb58edc088c449ea65f327fe4f8ed9e74 Author: Michael Armbrust Date: 2014-11-14T01:00:36Z Support for IN in data sources API. commit 99c0e6b1672ed8ec6fb40d9f90f887592b7eac46 Author: Michael Armbrust Date: 2014-11-14T01:01:02Z Add support for sizeInBytes. commit 9a5e17166c5f8c75f067846ec5f515db0857f1ea Author: Michael Armbrust Date: 2014-11-14T01:03:23Z Use method instead of configuration directly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-62994962 [Test build #23337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23337/consoleFull) for PR 3193 at commit [`78bf997`](https://github.com/apache/spark/commit/78bf997f13c6f08129671a9d6a3484620d5b37a2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class RDDRangeSampler(RDDSamplerBase):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-62994968 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23337/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-62994910 [Test build #519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/519/consoleFull) for PR 3193 at commit [`657de2d`](https://github.com/apache/spark/commit/657de2d8a536459157dfc535116428d7ce268297). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3259#issuecomment-62994235 [Test build #23342 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23342/consoleFull) for PR 3259 at commit [`3200c33`](https://github.com/apache/spark/commit/3200c33363d8daed187ecd10f7b5fc370d44f349). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3259#issuecomment-62994078 /cc @andrewor14 and @rxin for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4393] Fix memory leak in ConnectionMana...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/3259 [SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTasks; use HashedWheelTimer This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs. This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case. Thanks to @cristianopris for narrowing down this issue! You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark connection-manager-timeout-bugfix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3259.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3259 commit f847dd4a4a8e7f92de879de9d5c9eb31743f8a26 Author: Josh Rosen Date: 2014-11-14T00:13:15Z Don't capture entire message in ACK timeout task. The old code caused memory leaks. commit 3200c33363d8daed187ecd10f7b5fc370d44f349 Author: Josh Rosen Date: 2014-11-14T00:45:41Z Use Netty HashedWheelTimer --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor cleanup of comments, errors and ov...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3257#issuecomment-62993447 [Test build #23336 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23336/consoleFull) for PR 3257 at commit [`2fdf903`](https://github.com/apache/spark/commit/2fdf903d24d4c7320cbb2b76f592082bac321a0c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Minor cleanup of comments, errors and ov...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3257#issuecomment-62993455 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23336/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4062][Streaming]Add ReliableKafkaReceiv...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/2991#issuecomment-62993250 OK, I will, thanks a lot, greatly appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4327] [PySpark] Python API for RDD.rand...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3193#issuecomment-62992769 [Test build #518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/518/consoleFull) for PR 3193 at commit [`657de2d`](https://github.com/apache/spark/commit/657de2d8a536459157dfc535116428d7ce268297). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class NettyBlockTransferService(conf: SparkConf, securityManager: SecurityManager, numCores: Int)` * `abstract class EdgeRDD[ED, VD](` * `abstract class VertexRDD[VD](` * `class RDDRangeSampler(RDDSamplerBase):` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3258#issuecomment-62992058 [Test build #23340 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23340/consoleFull) for PR 3258 at commit [`15e9a98`](https://github.com/apache/spark/commit/15e9a98e75928915b9a2c2c1c02d88bba3756485). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4390][SQL] Handle NaN cast to decimal c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3256#issuecomment-62991877 [Test build #23341 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23341/consoleFull) for PR 3256 at commit [`4c3ba46`](https://github.com/apache/spark/commit/4c3ba4617716bbc3bee95ff258ace2661b60f136). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3258#issuecomment-62991118 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23338/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4391][SQL] Configure parquet filters us...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3258#issuecomment-62991114 [Test build #23338 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/23338/consoleFull) for PR 3258 at commit [`75afd39`](https://github.com/apache/spark/commit/75afd39ba2a034fb67792c2773ba53dd92e92a71). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org