[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052493 @pwendell @scwf What I mean is that com.esotericsoftware is again shaded in hive as org.apache.hive.com.esotericsoftware. I think that's the reason why the original hive package work against spark. But the spark-project:hive-exec does not include the shaded org.apache.hive.com.esotericsoftware, and need to relink which cause the version confliction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052515 Okay I think the issue is pretty tough. Unfortunately hive is directly using the shaded objenesis classes. However, Spark needs Kryo 2.21 which depends on the original objenesis classes. Here is the hive code that uses it: https://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L186 So we can't just remove kryo that hive uses. This is pretty ugly. One solution might be to update chill in Spark so that Spark is using the same Kryo version as Hive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4102] Remove unused ShuffleReader.stop(...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2966 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052281 @pwendell, right in hive 0.13.1 it use the shaded ```com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy``` in kryo 2.22. So if we exclude it, we will get classnotfound error, because in kryo 2.21(spark chill depends on) in spark does not have this class(the class in kryo 2.21 is org.objenesis.strategy.InstantiatorStrategy) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052124 @pwendell com.esotericsoftware is already shaded in hive. Will it work if we keep it in hive-exec.jar? Please advice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3968 Use parquet-mr filter2 api in spark...
Github user saucam commented on the pull request: https://github.com/apache/spark/pull/2841#issuecomment-61052070 yes. In task side metadata strategy, the tasks are spawned first, and each task will then read the metadata and drop the row groups. So if I am using yarn, and data is huge (metadata is large) , the memory will be consumed on the yarn side , but in case of client side metadata strategy, whole of the metadata will be read before the tasks are spawned, on a single node. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052076 Another thing to notice is that Kryo 2.21 is a really weird release. [Kryo 2.21 POM](https://repo1.maven.org/maven2/com/esotericsoftware/kryo/kryo/2.21/kryo-2.21.pom) suggests that Objenesis classes are relocated to package `com.esotericsoftware.shaded.org.objenesis`, but classes within the Maven artifact jar file still reside in package `org.objenesis`. Also, Kryo GitHub repo doesn't provide 2.21 release download and the version number in the POM of [kryo-2.21 tag](https://github.com/EsotericSoftware/kryo/blob/kryo-2.21/pom.xml#L13) is actually `2.21-SNAPSHOT`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052007 @scwf the hive classes only link against kryo... they don't link against objenesis directly. As long as kryo did not make a binary-incompatible change between 2.21 and 2.22, it should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61052028 ```com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy``` is in kryo 2.22 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61051983 actually the most recent failures, it is using kryo 2.21 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61051898 @pwendellï¼spark depends on kryo 2.21 which not shaded objenesis while hive 0.13 depends on kryo 2.22 and it shaded objenesis. So excluding will not fix the problem because in hive can not find the shaded class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61051870 Based on the most recent failures, it seems like somehow the test classpath is still using kryo 2.22. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61051691 Just to make it more intuitive, made a dependency graph to illustrate the issue: ![dependency-hell](http://tinyurl.com/q5opqe2) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61051681 @scwf I checked dev/run-tests, it does invoke python/run-tests. Didn't you also run it locally and succeed, or I miss anything? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61051413 The problem here is that Hive 0.13 upgrades the Kryo version from 2.21 to 2.22. Spark previously depends on Kryo 2.22 via chill. In Kryo 2.22 they made a build change where they started inlining the objenesis dependency via shading. This patch somehow causes Spark to compile against Kryo 2.21 and run against Kryo 2.22, which is the root cause of the errors. My suggestion was to just exclude Kryo from Hive, hoping that it would result in us just keeping Kryo 2.21 and that Hive could deal with it. We might need to exclude it in other places than hive-exec. That could be the issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4094][CORE] checkpoint should stil...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-61051336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22522/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4094][CORE] checkpoint should stil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-61051328 [Test build #22522 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22522/consoleFull) for PR 2956 at commit [`a942bfa`](https://github.com/apache/spark/commit/a942bfa41be317cb68fe69ea1becd3059619a909). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH i...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/2711#issuecomment-61050455 Very good, thanks, @andrewor14 @vanzin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1720][SPARK-1719] Add the value of LD_L...
Github user witgo closed the pull request at: https://github.com/apache/spark/pull/1031 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61050366 @scwf Hmm, you mean the dev/run-test does not run pyspark? I locally run dev/run-test today and months ago, and didn't met pyspark error. How can I invoke pyspark test locally? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH i...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2711 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH i...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/2711#issuecomment-61049248 Alright thanks, I'm merging this into master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61048834 @zhzhan, original hive failed pyspark, see #3004 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61048818 [Test build #22523 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22523/consoleFull) for PR 3003 at commit [`47b144f`](https://github.com/apache/spark/commit/47b144f66badf9484966d3f2c74ccdb594350751). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3466] Limit size of results that a driv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/3003#issuecomment-61048612 @mateiz I had re-implemented it, now it checks the size result before sending from executor and fetching in driver, please review again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2931#issuecomment-61048438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22521/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2931#issuecomment-61048431 [Test build #22521 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22521/consoleFull) for PR 2931 at commit [`ed5fbf0`](https://github.com/apache/spark/commit/ed5fbf0765136da963f6a8447f1ff69191825392). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class WriteAheadLogBackedBlockRDDPartition(` * `class WriteAheadLogBackedBlockRDD[T: ClassTag](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19588522 --- Diff: examples/src/main/python/mllib/word2vec.py --- @@ -0,0 +1,48 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This example uses text8 file from http://mattmahoney.net/dc/text8.zip +# The file was unziped and split into multiple lines using +# grep -o -E '\w+(\W+\w+){0,15}' text8 > text8_lines +# This was done so that the example can be run in local mode --- End diff -- It's better to including download and unzip. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19588505 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,40 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighlight %} + +{% highlight python %} +# This example uses text8 file from http://mattmahoney.net/dc/text8.zip +# The file was unziped and split into multiple lines using +# grep -o -E '\w+(\W+\w+){0,15}' text8 > text8_lines +# This was done so that the example can be run in local mode + +import sys + +from pyspark import SparkContext +from pyspark.mllib.feature import Word2Vec + +USAGE = ("bin/spark-submit --driver-memory 4g " --- End diff -- it should look like the scala one, I think the following should be enough: ``` from pyspark import SparkContext from pyspark.mllib.feature import Word2Vec sc = SparkContext(appName='Word2Vec') inp = sc.textFile("text8_lines").map(lambda row: row.split(" ")) word2vec = Word2Vec() model = word2vec.fit(inp) synonyms = model.findSynonyms('china', 40) for word, cosine_distance in synonyms: print "{}: {}".format(word, cosine_distance) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4094][CORE] checkpoint should stil...
Github user liyezhang556520 commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-61047597 Since checkpoint will be done recursively on RDD parents, we need to avoid one RDD traversed for multiple times. E.g. for following lineage: A -- B -- C -- D -- E `-- F --' When we call E.count(), we shall avoid traverse A and B for twice, since there are two path to these two nodes: EDCBA and EDFBA. Or else the traverse time will increase exponentially. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4094][CORE] checkpoint should stil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2956#issuecomment-61047499 [Test build #22522 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22522/consoleFull) for PR 2956 at commit [`a942bfa`](https://github.com/apache/spark/commit/a942bfa41be317cb68fe69ea1becd3059619a909). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61046800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22514/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61046798 [Test build #22514 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22514/consoleFull) for PR 2983 at commit [`69dba42`](https://github.com/apache/spark/commit/69dba425dd28877212e359887d8c6c86f527e4b8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DecimalType(DataType):` * `case class UnscaledValue(child: Expression) extends UnaryExpression ` * `case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression ` * `case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)` * `case class PrecisionInfo(precision: Int, scale: Int)` * `case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType ` * `final class Decimal extends Ordered[Decimal] with Serializable ` * ` trait DecimalIsConflicted extends Numeric[Decimal] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61046731 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22520/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61046728 [Test build #22520 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22520/consoleFull) for PR 2940 at commit [`f192f47`](https://github.com/apache/spark/commit/f192f47d0e916e2b4b425581a4a76b7aaf782328). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user tianyi commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61046528 Hi, @marmbrus @liancheng I had rebased this PR after [#2762](https://github.com/apache/spark/pull/2762). Any more comments on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4149][SQL] ISO 8601 support for json da...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3012#issuecomment-61046541 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22518/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4149][SQL] ISO 8601 support for json da...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3012#issuecomment-61046537 [Test build #22518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22518/consoleFull) for PR 3012 at commit [`c62b7e2`](https://github.com/apache/spark/commit/c62b7e2b924ab2a9d9c21580be1e077a24b8eb5d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61046333 [Test build #22516 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22516/consoleFull) for PR 2940 at commit [`df5f320`](https://github.com/apache/spark/commit/df5f3204afb1f3b6566df3dbed5f45b371c1ae67). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61046335 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22516/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61046263 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22519/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61046262 [Test build #22519 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22519/consoleFull) for PR 2542 at commit [`b708fc7`](https://github.com/apache/spark/commit/b708fc7636143562b950fda5fda778e1cd447ae1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61045963 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22517/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61045960 [Test build #22517 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22517/consoleFull) for PR 2542 at commit [`b708fc7`](https://github.com/apache/spark/commit/b708fc7636143562b950fda5fda778e1cd447ae1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61045831 I'm combing shading and dependency among Hive, Spark, Chill, Kryo and Objenesis. Will give a summary later. I don't think we can fix all the problem unless relationships among these key components are crystal clear... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61045639 I am not expert on this. Looks like esotericsoftware is already shaded in hive. Is it helpful if org.spark-prjoect:hive-exec include the shaded jar? Since the original hive jar works and no conflicts. com.esotericsoftware org.apache.hive.com.esotericsoftware --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4148][PySpark] fix seed distribution an...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3010#issuecomment-61045548 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22513/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4148][PySpark] fix seed distribution an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3010#issuecomment-61045545 [Test build #22513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22513/consoleFull) for PR 3010 at commit [`c1bacd9`](https://github.com/apache/spark/commit/c1bacd9f46fe5559d4affa74dd986c79cced1611). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4150][PySpark] return self in rdd.setNa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3011#issuecomment-61045471 [Test build #22515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22515/consoleFull) for PR 3011 at commit [`4ac3bbd`](https://github.com/apache/spark/commit/4ac3bbdba145d5f5bd3a40906c4ca08daee4d9a8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4150][PySpark] return self in rdd.setNa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3011#issuecomment-61045477 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22515/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-61045263 [Test build #22512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22512/consoleFull) for PR 2994 at commit [`0a45f1a`](https://github.com/apache/spark/commit/0a45f1ab5ba5f9440a78e47e48b48f0321d440c1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class KafkaWriter[T: ClassTag](@transient dstream: DStream[T]) extends Serializable with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-61045266 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22512/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2931#issuecomment-61044847 Apart from the readability, does one have a performance benefit over the other? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2931#issuecomment-61044841 [Test build #22521 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22521/consoleFull) for PR 2931 at commit [`ed5fbf0`](https://github.com/apache/spark/commit/ed5fbf0765136da963f6a8447f1ff69191825392). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2931#issuecomment-61044672 @rxin I updated. Only part i am not in agreement is the preferred location logic. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2931#discussion_r19587466 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDDSuite.scala --- @@ -0,0 +1,155 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.streaming.rdd + +import java.io.File + +import scala.util.Random + +import com.google.common.io.Files +import org.apache.hadoop.conf.Configuration +import org.scalatest.{BeforeAndAfterAll, FunSuite} + +import org.apache.spark.{SparkConf, SparkContext} +import org.apache.spark.storage.{BlockId, BlockManager, StorageLevel, StreamBlockId} +import org.apache.spark.streaming.util.{WriteAheadLogFileSegment, WriteAheadLogWriter} + +class WriteAheadLogBackedBlockRDDSuite extends FunSuite with BeforeAndAfterAll { + val conf = new SparkConf() +.setMaster("local[2]") +.setAppName(this.getClass.getSimpleName) + val hadoopConf = new Configuration() + + var sparkContext: SparkContext = null + var blockManager: BlockManager = null + var dir: File = null + + override def beforeAll(): Unit = { +sparkContext = new SparkContext(conf) +blockManager = sparkContext.env.blockManager +dir = Files.createTempDir() + } + + override def afterAll(): Unit = { +// Copied from LocalSparkContext, simpler than to introduced test dependencies to core tests. +sparkContext.stop() +dir.delete() +System.clearProperty("spark.driver.port") + } + + test("Read data available in block manager and write ahead log") { +testRDD(5, 5) + } + + test("Read data available only in block manager, not in write ahead log") { +testRDD(5, 0) + } + + test("Read data available only in write ahead log, not in block manager") { +testRDD(0, 5) + } + + test("Read data available only in write ahead log, and test storing in block manager") { +testRDD(0, 5, testStoreInBM = true) + } + + test("Read data with partially available in block manager, and rest in write ahead log") { +testRDD(3, 2) + } + + /** + * Test the WriteAheadLogBackedRDD, by writing some partitions of the data to block manager + * and the rest to a write ahead log, and then reading reading it all back using the RDD. + * It can also test if the partitions that were read from the log were again stored in + * block manager. + * @param numPartitionssInBM Number of partitions to write to the Block Manager + * @param numPartitionsInWAL Number of partitions to write to the Write Ahead Log + * @param testStoreInBM Test whether blocks read from log are stored back into block manager + */ + private def testRDD( + numPartitionssInBM: Int, + numPartitionsInWAL: Int, + testStoreInBM: Boolean = false +) { +val numBlocks = numPartitionssInBM + numPartitionsInWAL +val data = Seq.tabulate(numBlocks) { _ => Seq.fill(10) { scala.util.Random.nextString(50) } } --- End diff -- Nice! Right! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2931#discussion_r19587457 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.streaming.rdd + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration + +import org.apache.spark._ +import org.apache.spark.rdd.BlockRDD +import org.apache.spark.storage.{BlockId, StorageLevel} +import org.apache.spark.streaming.util.{HdfsUtils, WriteAheadLogFileSegment, WriteAheadLogRandomReader} + +/** + * Partition class for [[org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD]]. + * It contains information about the id of the blocks having this partition's data and + * the segment of the write ahead log that backs the partition. + * @param index index of the partition + * @param blockId id of the block having the partition data + * @param segment segment of the write ahead log having the partition data + */ +private[streaming] +class WriteAheadLogBackedBlockRDDPartition( +val index: Int, +val blockId: BlockId, +val segment: WriteAheadLogFileSegment + ) extends Partition + + +/** + * This class represents a special case of the BlockRDD where the data blocks in + * the block manager are also backed by segments in write ahead logs. For reading + * the data, this RDD first looks up the blocks by their ids in the block manager. + * If it does not find them, it looks up the corresponding file segment. + * + * @param sc SparkContext + * @param hadoopConfig Hadoop configuration + * @param blockIds Ids of the blocks that contains this RDD's data + * @param segments Segments in write ahead logs that contain this RDD's data + * @param storeInBlockManager Whether to store in the block manager after reading from the segment + * @param storageLevel storage level to store when storing in block manager + * (applicable when storeInBlockManager = true) + */ +private[streaming] +class WriteAheadLogBackedBlockRDD[T: ClassTag]( +@transient sc: SparkContext, +@transient hadoopConfig: Configuration, +@transient override val blockIds: Array[BlockId], --- End diff -- For that matter, the `val` in the following lines were not needed either. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2931#discussion_r19587361 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.streaming.rdd + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration + +import org.apache.spark._ +import org.apache.spark.rdd.BlockRDD +import org.apache.spark.storage.{BlockId, StorageLevel} +import org.apache.spark.streaming.util.{HdfsUtils, WriteAheadLogFileSegment, WriteAheadLogRandomReader} + +/** + * Partition class for [[org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD]]. + * It contains information about the id of the blocks having this partition's data and + * the segment of the write ahead log that backs the partition. + * @param index index of the partition + * @param blockId id of the block having the partition data + * @param segment segment of the write ahead log having the partition data + */ +private[streaming] +class WriteAheadLogBackedBlockRDDPartition( +val index: Int, +val blockId: BlockId, +val segment: WriteAheadLogFileSegment + ) extends Partition + + +/** + * This class represents a special case of the BlockRDD where the data blocks in + * the block manager are also backed by segments in write ahead logs. For reading + * the data, this RDD first looks up the blocks by their ids in the block manager. + * If it does not find them, it looks up the corresponding file segment. + * + * @param sc SparkContext + * @param hadoopConfig Hadoop configuration + * @param blockIds Ids of the blocks that contains this RDD's data + * @param segments Segments in write ahead logs that contain this RDD's data + * @param storeInBlockManager Whether to store in the block manager after reading from the segment + * @param storageLevel storage level to store when storing in block manager + * (applicable when storeInBlockManager = true) + */ +private[streaming] +class WriteAheadLogBackedBlockRDD[T: ClassTag]( +@transient sc: SparkContext, +@transient hadoopConfig: Configuration, +@transient override val blockIds: Array[BlockId], +@transient val segments: Array[WriteAheadLogFileSegment], +val storeInBlockManager: Boolean, +val storageLevel: StorageLevel + ) extends BlockRDD[T](sc, blockIds) { + + require( +blockIds.length == segments.length, +s"Number of block ids (${blockIds.length}) must be " + + s"the same as number of segments (${segments.length}})!") + + // Hadoop configuration is not serializable, so broadcast it as a serializable. + private val broadcastedHadoopConf = new SerializableWritable(hadoopConfig) + + override def getPartitions: Array[Partition] = { +assertValid() +Array.tabulate(blockIds.size) { i => + new WriteAheadLogBackedBlockRDDPartition(i, blockIds(i), segments(i)) } + } + + /** + * Gets the partition data by getting the corresponding block from the block manager. + * If the block does not exist, then the data is read from the corresponding segment + * in write ahead log files. + */ + override def compute(split: Partition, context: TaskContext): Iterator[T] = { +assertValid() +val hadoopConf = broadcastedHadoopConf.value +val blockManager = SparkEnv.get.blockManager +val partition = split.asInstanceOf[WriteAheadLogBackedBlockRDDPartition] +val blockId = partition.blockId +blockManager.get(blockId) match { + case Some(block) => // Data is in Block Manager +val iterator = block.data.asInstanceOf[Iterator[T]] +logDebug(s"Read partition data of $this from block manager, block $blockId") +iterator + case None => // Data
[GitHub] spark pull request: [EC2] Factor out Mesos spark-ec2 branch
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/3008#issuecomment-61044027 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2931#discussion_r19587321 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -0,0 +1,124 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.streaming.rdd + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration + +import org.apache.spark._ +import org.apache.spark.rdd.BlockRDD +import org.apache.spark.storage.{BlockId, StorageLevel} +import org.apache.spark.streaming.util.{HdfsUtils, WriteAheadLogFileSegment, WriteAheadLogRandomReader} + +/** + * Partition class for [[org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD]]. + * It contains information about the id of the blocks having this partition's data and + * the segment of the write ahead log that backs the partition. + * @param index index of the partition + * @param blockId id of the block having the partition data + * @param segment segment of the write ahead log having the partition data + */ +private[streaming] +class WriteAheadLogBackedBlockRDDPartition( +val index: Int, +val blockId: BlockId, +val segment: WriteAheadLogFileSegment + ) extends Partition + + +/** + * This class represents a special case of the BlockRDD where the data blocks in + * the block manager are also backed by segments in write ahead logs. For reading + * the data, this RDD first looks up the blocks by their ids in the block manager. + * If it does not find them, it looks up the corresponding file segment. + * + * @param sc SparkContext + * @param hadoopConfig Hadoop configuration + * @param blockIds Ids of the blocks that contains this RDD's data + * @param segments Segments in write ahead logs that contain this RDD's data + * @param storeInBlockManager Whether to store in the block manager after reading from the segment + * @param storageLevel storage level to store when storing in block manager + * (applicable when storeInBlockManager = true) + */ +private[streaming] +class WriteAheadLogBackedBlockRDD[T: ClassTag]( +@transient sc: SparkContext, +@transient hadoopConfig: Configuration, +@transient override val blockIds: Array[BlockId], +@transient val segments: Array[WriteAheadLogFileSegment], +val storeInBlockManager: Boolean, +val storageLevel: StorageLevel + ) extends BlockRDD[T](sc, blockIds) { + + require( +blockIds.length == segments.length, +s"Number of block ids (${blockIds.length}) must be " + + s"the same as number of segments (${segments.length}})!") + + // Hadoop configuration is not serializable, so broadcast it as a serializable. + private val broadcastedHadoopConf = new SerializableWritable(hadoopConfig) + + override def getPartitions: Array[Partition] = { +assertValid() +Array.tabulate(blockIds.size) { i => + new WriteAheadLogBackedBlockRDDPartition(i, blockIds(i), segments(i)) } + } + + /** + * Gets the partition data by getting the corresponding block from the block manager. + * If the block does not exist, then the data is read from the corresponding segment + * in write ahead log files. + */ + override def compute(split: Partition, context: TaskContext): Iterator[T] = { +assertValid() +val hadoopConf = broadcastedHadoopConf.value +val blockManager = SparkEnv.get.blockManager +val partition = split.asInstanceOf[WriteAheadLogBackedBlockRDDPartition] +val blockId = partition.blockId +blockManager.get(blockId) match { + case Some(block) => // Data is in Block Manager +val iterator = block.data.asInstanceOf[Iterator[T]] +logDebug(s"Read partition data of $this from block manager, block $blockId") +iterator + case None => //
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61043967 [Test build #22520 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22520/consoleFull) for PR 2940 at commit [`f192f47`](https://github.com/apache/spark/commit/f192f47d0e916e2b4b425581a4a76b7aaf782328). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4027][Streaming] HDFSBasedBlockRDD to r...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/2931#discussion_r19587299 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/HDFSBackedBlockRDD.scala --- @@ -0,0 +1,92 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.streaming.rdd + +import scala.reflect.ClassTag + +import org.apache.hadoop.conf.Configuration + +import org.apache.spark.rdd.BlockRDD +import org.apache.spark.storage.{BlockId, StorageLevel} +import org.apache.spark.streaming.util.{WriteAheadLogFileSegment, HdfsUtils, WriteAheadLogRandomReader} +import org.apache.spark._ + +private[streaming] +class HDFSBackedBlockRDDPartition( +val blockId: BlockId, +val index: Int, +val segment: WriteAheadLogFileSegment + ) extends Partition + +private[streaming] +class HDFSBackedBlockRDD[T: ClassTag]( +@transient sc: SparkContext, +@transient hadoopConfiguration: Configuration, +@transient override val blockIds: Array[BlockId], +@transient val segments: Array[WriteAheadLogFileSegment], +val storeInBlockManager: Boolean, +val storageLevel: StorageLevel + ) extends BlockRDD[T](sc, blockIds) { + + require(blockIds.length == segments.length, +"Number of block ids must be the same as number of segments!") + + // Hadoop Configuration is not serializable, so broadcast it as a serializable. + val broadcastedHadoopConf = sc.broadcast(new SerializableWritable(hadoopConfiguration)) + + override def getPartitions: Array[Partition] = { +assertValid() +(0 until blockIds.size).map { i => + new HDFSBackedBlockRDDPartition(blockIds(i), i, segments(i)) +}.toArray + } + + override def compute(split: Partition, context: TaskContext): Iterator[T] = { +assertValid() +val hadoopConf = broadcastedHadoopConf.value.value +val blockManager = SparkEnv.get.blockManager +val partition = split.asInstanceOf[HDFSBackedBlockRDDPartition] +val blockId = partition.blockId +blockManager.get(blockId) match { + // Data is in Block Manager, grab it from there. + case Some(block) => +block.data.asInstanceOf[Iterator[T]] + // Data not found in Block Manager, grab it from HDFS + case None => +logInfo("Reading partition data from write ahead log " + partition.segment.path) +val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf) +val dataRead = reader.read(partition.segment) +reader.close() +// Currently, we support storing the data to BM only in serialized form and not in +// deserialized form +if (storeInBlockManager) { + blockManager.putBytes(blockId, dataRead, storageLevel) +} +dataRead.rewind() +blockManager.dataDeserialize(blockId, dataRead).asInstanceOf[Iterator[T]] +} + } + + override def getPreferredLocations(split: Partition): Seq[String] = { +val partition = split.asInstanceOf[HDFSBackedBlockRDDPartition] +val locations = getBlockIdLocations() +locations.getOrElse(partition.blockId, --- End diff -- Isnt it something that Josh suggested more intuitive? All the alternatives are clearly in one line. And it does not have redundant code as `case Some(loc) => loc` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61043719 [Test build #22519 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22519/consoleFull) for PR 2542 at commit [`b708fc7`](https://github.com/apache/spark/commit/b708fc7636143562b950fda5fda778e1cd447ae1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4149][SQL] ISO 8601 support for json da...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3012#issuecomment-61043716 [Test build #22518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22518/consoleFull) for PR 3012 at commit [`c62b7e2`](https://github.com/apache/spark/commit/c62b7e2b924ab2a9d9c21580be1e077a24b8eb5d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61043663 Two potential workaround for this: 1 change kryo version in hive to fix the conflict 2 to shade chill other idea? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61043455 [Test build #22516 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22516/consoleFull) for PR 2940 at commit [`df5f320`](https://github.com/apache/spark/commit/df5f3204afb1f3b6566df3dbed5f45b371c1ae67). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61043457 [Test build #22517 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22517/consoleFull) for PR 2542 at commit [`b708fc7`](https://github.com/apache/spark/commit/b708fc7636143562b950fda5fda778e1cd447ae1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4028][Streaming] ReceivedBlockHandler i...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2940#issuecomment-61043515 For reference to others, I spoke @pwendell and @JoshRosen offline and decided that a slightly modified version of suggestion 3 (in my earlier comment) is the best middle ground that addresses all the concerns. What I have done is add a trait `ReceivedBlockStoreResult`. `ReceivedBlockHandler.storeBlock` returns a `ReceivedBlockStoreResult` object, the contents of that object is not of any concern to `ReceiverSupervisorImpl` and simply passed on. Implementations of `ReceivedBlockHandler` all return `ReceivedBlockStoreResult`, so no generic typing. This keeps the complexity low, while keeping `ReceiverSupervisorImpl` code generic, and addressing Patrick's concern of `Option[Any]` being non-intuitive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4149][SQL] ISO 8601 support for json da...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/3012 [SPARK-4149][SQL] ISO 8601 support for json date time strings This implement the feature @davies mentioned in https://github.com/apache/spark/pull/2901#discussion-diff-19313312 You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark iso8601 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3012.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3012 commit c62b7e2b924ab2a9d9c21580be1e077a24b8eb5d Author: Daoyuan Wang Date: 2014-10-30T04:06:09Z json data timestamp ISO8601 support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4137] [EC2] Don't change working dir on...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/2988#discussion_r19587131 --- Diff: ec2/spark_ec2.py --- @@ -718,12 +726,16 @@ def get_num_disks(instance_type): return 1 -# Deploy the configuration file templates in a given local directory to -# a cluster, filling in any template parameters with information about the -# cluster (e.g. lists of masters and slaves). Files are only deployed to -# the first master instance in the cluster, and we expect the setup -# script to be run on that instance to copy them to other nodes. def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules): +""" +Deploy the configuration file templates in a given local directory to --- End diff -- Yeah, I thought I'd make this the first change toward having all the function descriptions be in docstrings, but for consistency's sake you're right--it should be a comment on top. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61043371 Seems we can not upgrade kryo in spark, since the latest chill depend on 2.21. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2542#issuecomment-61043355 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: hive 0.13 test issue
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3004#issuecomment-61043157 [Test build #22510 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22510/consoleFull) for PR 3004 at commit [`a433434`](https://github.com/apache/spark/commit/a433434910b0d69b32f82e91bd47ded564d490b1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4137] [EC2] Don't change working dir on...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/2988#issuecomment-61043164 Functionality LGTM. I left a minor style question for @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: hive 0.13 test issue
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3004#issuecomment-61043160 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22510/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1720][SPARK-1719] use LD_LIBRARY_PATH i...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/2711#issuecomment-61043113 @andrewor14 I've tested in Linux (yarn, mesos) and Mac OS X(standalone). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4137] [EC2] Don't change working dir on...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/2988#discussion_r19586967 --- Diff: ec2/spark_ec2.py --- @@ -718,12 +726,16 @@ def get_num_disks(instance_type): return 1 -# Deploy the configuration file templates in a given local directory to -# a cluster, filling in any template parameters with information about the -# cluster (e.g. lists of masters and slaves). Files are only deployed to -# the first master instance in the cluster, and we expect the setup -# script to be run on that instance to copy them to other nodes. def deploy_files(conn, root_dir, opts, master_nodes, slave_nodes, modules): +""" +Deploy the configuration file templates in a given local directory to --- End diff -- Should we change this style given that other functions in this file have comments on top ? Any thoughts @JoshRosen ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4150][PySpark] return self in rdd.setNa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3011#issuecomment-61042785 [Test build #22515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22515/consoleFull) for PR 3011 at commit [`4ac3bbd`](https://github.com/apache/spark/commit/4ac3bbdba145d5f5bd3a40906c4ca08daee4d9a8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4150][PySpark] return self in rdd.setNa...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/3011 [SPARK-4150][PySpark] return self in rdd.setName Then we can do `rdd.setName('abc').cache().count()`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark rdd-setname Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3011.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3011 commit 4ac3bbdba145d5f5bd3a40906c4ca08daee4d9a8 Author: Xiangrui Meng Date: 2014-10-30T03:51:40Z return self in rdd.setName --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61042108 [Test build #22514 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22514/consoleFull) for PR 2983 at commit [`69dba42`](https://github.com/apache/spark/commit/69dba425dd28877212e359887d8c6c86f527e4b8). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61042000 I am testing with just upgrade kryo in spark and do not exclude hive's kryo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4148][PySpark] fix seed distribution an...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3010#issuecomment-61041820 [Test build #22513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22513/consoleFull) for PR 3010 at commit [`c1bacd9`](https://github.com/apache/spark/commit/c1bacd9f46fe5559d4affa74dd986c79cced1611). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4148][PySpark] fix seed distribution an...
GitHub user mengxr opened a pull request: https://github.com/apache/spark/pull/3010 [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample The current way of seed distribution makes the sequences sampled from partition i and i+1 offset by 1. ~~~ In [14]: import random In [15]: r1 = random.Random(10) In [16]: r1.randint(0, 1) Out[16]: 1 In [17]: r1.random() Out[17]: 0.4288890546751146 In [18]: r1.random() Out[18]: 0.5780913011344704 In [19]: r2 = random.Random(10) In [20]: r2.randint(0, 1) Out[20]: 1 In [21]: r2.randint(0, 1) Out[21]: 0 In [22]: r2.random() Out[22]: 0.5780913011344704 ~~~ You can merge this pull request into a Git repository by running: $ git pull https://github.com/mengxr/spark SPARK-4148 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3010.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3010 commit c1bacd9f46fe5559d4affa74dd986c79cced1611 Author: Xiangrui Meng Date: 2014-10-30T03:22:13Z fix seed distribution and add some tests for rdd.sample --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-61041536 [Test build #22512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22512/consoleFull) for PR 2994 at commit [`0a45f1a`](https://github.com/apache/spark/commit/0a45f1ab5ba5f9440a78e47e48b48f0321d440c1). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4137] [EC2] Don't change working dir on...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/2988#issuecomment-61041277 @shivaram I took your suggestion and tested to make sure `spark-ec2` still creates a functioning EC2 cluster. This is ready for another review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Factor out Mesos spark-ec2 branch
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3008#issuecomment-61041154 cc @shivaram @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-61040608 I talked to people working on Kafka, and they assure me it is thread-safe. Also see this: https://github.com/apache/flume/blob/trunk/flume-ng-channels/flume-kafka-channel/src/main/java/org/apache/flume/channel/kafka/KafkaChannel.java There is a single producer that is written to by various threads. See the corresponding test where it is written from multiple threads. I have run it in loops several times on travis, never seen a threading issue. By creating a Producer per partition, this issue is avoided anyway. For now, we can keep it simple by creating a producer per partition - if we see this is a problem, we can revert to the ProducerCache. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61040483 Just exclude kryo is not enough, should we reshade hive 0.13.1 jar? @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61040412 yeah, in hive-0.13.1 https://github.com/apache/hive/blob/release-0.13.1/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java#L186 using ```com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy```, while in com.twitter:chill_2.10:0.3.6 using ```org.objenesis.strategy.InstantiatorStrategy``` ``` class EmptyScalaKryoInstantiator extends KryoInstantiator { override def newKryo = { val k = new KryoBase k.setRegistrationRequired(false) k.setInstantiatorStrategy(new org.objenesis.strategy.StdInstantiatorStrategy) k } } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Delete jetty 6.1.26 from spark package
Github user KaiXinXiaoLei commented on the pull request: https://github.com/apache/spark/pull/2989#issuecomment-61040188 Using the maven-dependency-plugin to build spark, I get the dependency tree for spark. I find the jetty 6 is introduced by hdfs ,yarn, flume and hbase. From the info of dependency tree, here I just give the information about Jetty 6. The Jetty 6 is brought by hdfs when building spark-core: [INFO] +- org.apache.hadoop:hadoop-client:jar:2.4.1:compile [INFO] | +- org.apache.hadoop:hadoop-hdfs:jar:2.4.1:compile [INFO] | | \- org.mortbay.jetty:jetty-util:jar:6.1.26:compile The Jetty 6 is brought by yarn when building spark-yarn: [INFO] +- org.apache.hadoop:hadoop-yarn-server-web-proxy:jar:2.4.1:compile [INFO] | +- org.apache.hadoop:hadoop-yarn-server-common:jar:2.4.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | \- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] +- org.apache.hadoop:hadoop-yarn-client:jar:2.4.1:compile [INFO] | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile The Jetty 6 is brought by flume when building spark-streaming-flume: [INFO] +- org.apache.spark:spark-streaming-flume-sink_2.10:jar:1.2.0-SNAPSHOT:compile [INFO] | \- org.apache.flume:flume-ng-core:jar:1.4.0:compile [INFO] | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile [INFO] | +- org.mortbay.jetty:jetty:jar:6.1.26:compile The Jetty 6 is brought by hbase when building spark-examples: [INFO] +- org.apache.hbase:hbase:jar:0.94.6:compile [INFO] | +- org.mortbay.jetty:jetty:jar:6.1.26:compile [INFO] | +- org.mortbay.jetty:jetty-util:jar:6.1.26:compile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61039848 In assembly jar the class is here ```/org/objenesis/strategy/InstantiatorStrategy.class``` it seems the class name here wrong, should be ```org.objenesis.strategy.InstantiatorStrategy``` but not ```com.esotericsoftware.shaded.``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Factor out Mesos spark-ec2 branch
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3008#issuecomment-61039757 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22507/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Factor out Mesos spark-ec2 branch
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3008#issuecomment-61039753 [Test build #22507 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22507/consoleFull) for PR 3008 at commit [`10a6089`](https://github.com/apache/spark/commit/10a6089422fa81cb496363d13e428e33e58008a4). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61039529 Still failed and get this error ``` .. Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.shaded.org.objenesis.strategy.InstantiatorStrategy ``` i am checking whether this class in assembly jar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61039238 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22511/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61039237 [Test build #22511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22511/consoleFull) for PR 2983 at commit [`3360a0e`](https://github.com/apache/spark/commit/3360a0ecb8e383a6d1ae9f023fe343af8418db90). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class DecimalType(DataType):` * `case class UnscaledValue(child: Expression) extends UnaryExpression ` * `case class MakeDecimal(child: Expression, precision: Int, scale: Int) extends UnaryExpression ` * `case class MutableLiteral(var value: Any, dataType: DataType, nullable: Boolean = true)` * `case class PrecisionInfo(precision: Int, scale: Int)` * `case class DecimalType(precisionInfo: Option[PrecisionInfo]) extends FractionalType ` * `final class Decimal extends Ordered[Decimal] with Serializable ` * ` trait DecimalIsConflicted extends Numeric[Decimal] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3930] [SPARK-3933] Support fixed-precis...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2983#issuecomment-61039162 [Test build #22511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22511/consoleFull) for PR 2983 at commit [`3360a0e`](https://github.com/apache/spark/commit/3360a0ecb8e383a6d1ae9f023fe343af8418db90). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3838][examples][mllib][python] Word2Vec...
Github user anantasty commented on a diff in the pull request: https://github.com/apache/spark/pull/2952#discussion_r19585598 --- Diff: docs/mllib-feature-extraction.md --- @@ -162,6 +162,40 @@ for((synonym, cosineSimilarity) <- synonyms) { } {% endhighlight %} + +{% highlight python %} +# This example uses text8 file from http://mattmahoney.net/dc/text8.zip +# The file was unziped and split into multiple lines using +# grep -o -E '\w+(\W+\w+){0,15}' text8 > text8_lines +# This was done so that the example can be run in local mode + +import sys + +from pyspark import SparkContext +from pyspark.mllib.feature import Word2Vec + +USAGE = ("bin/spark-submit --driver-memory 4g " --- End diff -- @davies simplify the docs should i just remove the Usage line and the creation of the context? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61038986 [Test build #22508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22508/consoleFull) for PR 2685 at commit [`18fb1ff`](https://github.com/apache/spark/commit/18fb1fff1c2a097604b573fffba92b9a7a3f3e8f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3826][SQL]enable hive-thriftserver to s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2685#issuecomment-61038997 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22508/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org