[GitHub] spark issue #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash Functi...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14498 Based on the previous discussions in the other PRs, it sounds like these Hive-specific test cases are not very useful. Do we still need to port them back? ``` "auto_join19", "auto_join22", "auto_join25", "auto_join26", "auto_join27", "auto_join28", "auto_join30", "auto_join31", "auto_join_nulls", "auto_join_reordering_values", "correlationoptimizer1", "correlationoptimizer2", "correlationoptimizer3", "correlationoptimizer4", "multiMapJoin1", "orc_dictionary_threshold", "udf_hash" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r86724596 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.util.Random + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT} +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.param.shared.HasSeed +import org.apache.spark.ml.util._ +import org.apache.spark.sql.types.StructType + +/** + * :: Experimental :: + * + * Model produced by [[MinHash]], where multiple hash functions are stored. Each hash function is + * a perfect hash function: + *`h_i(x) = (x * k_i mod prime) mod numEntries` + * where `k_i` is the i-th coefficient, and both `x` and `k_i` are from `Z_prime^*` + * + * Reference: + * [[https://en.wikipedia.org/wiki/Perfect_hash_function Wikipedia on Perfect Hash Function]] + * + * @param numEntries The number of entries of the hash functions. + * @param randCoefficients An array of random coefficients, each used by one hash function. + */ +@Experimental +@Since("2.1.0") +class MinHashModel private[ml] ( +override val uid: String, +@Since("2.1.0") val numEntries: Int, +@Since("2.1.0") val randCoefficients: Array[Int]) + extends LSHModel[MinHashModel] { + + @Since("2.1.0") + override protected[ml] val hashFunction: Vector => Vector = { +elems: Vector => + require(elems.numNonzeros > 0, "Must have at least 1 non zero entry.") + val elemsList = elems.toSparse.indices.toList + val hashValues = randCoefficients.map({ randCoefficient: Int => + elemsList.map({elem: Int => +(1 + elem) * randCoefficient.toLong % MinHash.prime % numEntries + }).min.toDouble + }) + Vectors.dense(hashValues) + } + + @Since("2.1.0") + override protected[ml] def keyDistance(x: Vector, y: Vector): Double = { +val xSet = x.toSparse.indices.toSet +val ySet = y.toSparse.indices.toSet +val intersectionSize = xSet.intersect(ySet).size.toDouble +val unionSize = xSet.size + ySet.size - intersectionSize +assert(unionSize > 0, "The union of two input sets must have at least 1 elements") +1 - intersectionSize / unionSize + } + + @Since("2.1.0") + override protected[ml] def hashDistance(x: Vector, y: Vector): Double = { +// Since it's generated by hashing, it will be a pair of dense vectors. +x.toDense.values.zip(y.toDense.values).map(pair => math.abs(pair._1 - pair._2)).min --- End diff -- Make sense. `hashDistance` for MinHash should just be binary. I will make another PR to fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15795: [SPARK-18081] Add user guide for Locality Sensitive Hash...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15795 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash Functi...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14498 If this is no longer WIP, please update the title. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15047 Do we need a test suite for comparing the generated hash value is identical to the value by Hive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15795: [SPARK-18081] Add user guide for Locality Sensiti...
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/15795 [SPARK-18081] Add user guide for Locality Sensitive Hashing(LSH) ## What changes were proposed in this pull request? The user guide for LSH is added to ml-features.md, with several scala/java examples in spark-examples. ## How was this patch tested? Doc has been generated through Jekyll, and checked through manual inspection. You can merge this pull request into a Git repository by running: $ git pull https://github.com/Yunni/spark SPARK-18081-lsh-guide Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15795.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15795 commit 8c7971bdcff9eeedc9a97936b4c2e0aac93c4edf Author: YunniDate: 2016-11-07T07:23:36Z [SPARK-18081] Add user guide to LSH --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13259: [SPARK-15480][UI][Streaming]show missed InputInfo in str...
Github user mwws commented on the issue: https://github.com/apache/spark/pull/13259 @zsxwing This patch has been pended for a long time, could you help to review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15769 **[Test build #3417 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3417/consoleFull)** for PR 15769 at commit [`243b8ba`](https://github.com/apache/spark/commit/243b8ba106dd3a604a9053612d9d38482792a3db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68266/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15769 I've disabled the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15745: [SPARK-18207][SQL] Fix a compilation error due to...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15745#discussion_r86723676 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -1658,4 +1658,49 @@ class DataFrameSuite extends QueryTest with SharedSQLContext { val df = spark.createDataFrame(spark.sparkContext.makeRDD(rows), schema) assert(df.filter($"array1" === $"array2").count() == 1) } + + test("SPARK-18207: Compute hash for wider table") { +import org.apache.spark.sql.types.{StructType, StringType} --- End diff -- I think this test is more like an end-to-end test. The hashing is not obvious as we seen. Can we add unit test? In `HashExpressionsSuite`, I think? So you can test `HiveHash` too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #68266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68266/consoleFull)** for PR 15659 at commit [`6540964`](https://github.com/apache/spark/commit/6540964e4e584479a67965f03c8c5fbf59f4e132). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash Functi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14498 **[Test build #68268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68268/consoleFull)** for PR 14498 at commit [`05390ad`](https://github.com/apache/spark/commit/05390ade4f9af7ecc66d365683657d2689e56157). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/14498#discussion_r86723137 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/test/TestHive.scala --- @@ -487,24 +487,6 @@ private[hive] class TestHiveQueryExecution( } } - -private[hive] class TestHiveFunctionRegistry extends SimpleFunctionRegistry { --- End diff -- Yeah, I think so --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15668: [SPARK-18137][SQL]Fix RewriteDistinctAggregates Unresolv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15668 **[Test build #68267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68267/consoleFull)** for PR 15668 at commit [`8a6dd8d`](https://github.com/apache/spark/commit/8a6dd8daf11f7a0c29b3afc04706ccddc390a1bf). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15793: [SPARK-18296][SQL] Use consistent naming for expr...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15793 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r86719955 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/MinHash.scala --- @@ -0,0 +1,194 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.ml.feature + +import scala.util.Random + +import org.apache.hadoop.fs.Path + +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.ml.linalg.{Vector, Vectors, VectorUDT} +import org.apache.spark.ml.param.ParamMap +import org.apache.spark.ml.param.shared.HasSeed +import org.apache.spark.ml.util._ +import org.apache.spark.sql.types.StructType + +/** + * :: Experimental :: + * + * Model produced by [[MinHash]], where multiple hash functions are stored. Each hash function is + * a perfect hash function: + *`h_i(x) = (x * k_i mod prime) mod numEntries` + * where `k_i` is the i-th coefficient, and both `x` and `k_i` are from `Z_prime^*` + * + * Reference: + * [[https://en.wikipedia.org/wiki/Perfect_hash_function Wikipedia on Perfect Hash Function]] + * + * @param numEntries The number of entries of the hash functions. + * @param randCoefficients An array of random coefficients, each used by one hash function. + */ +@Experimental +@Since("2.1.0") +class MinHashModel private[ml] ( +override val uid: String, +@Since("2.1.0") val numEntries: Int, +@Since("2.1.0") val randCoefficients: Array[Int]) + extends LSHModel[MinHashModel] { + + @Since("2.1.0") + override protected[ml] val hashFunction: Vector => Vector = { +elems: Vector => + require(elems.numNonzeros > 0, "Must have at least 1 non zero entry.") + val elemsList = elems.toSparse.indices.toList + val hashValues = randCoefficients.map({ randCoefficient: Int => + elemsList.map({elem: Int => +(1 + elem) * randCoefficient.toLong % MinHash.prime % numEntries + }).min.toDouble + }) + Vectors.dense(hashValues) + } + + @Since("2.1.0") + override protected[ml] def keyDistance(x: Vector, y: Vector): Double = { +val xSet = x.toSparse.indices.toSet +val ySet = y.toSparse.indices.toSet +val intersectionSize = xSet.intersect(ySet).size.toDouble +val unionSize = xSet.size + ySet.size - intersectionSize +assert(unionSize > 0, "The union of two input sets must have at least 1 elements") +1 - intersectionSize / unionSize + } + + @Since("2.1.0") + override protected[ml] def hashDistance(x: Vector, y: Vector): Double = { +// Since it's generated by hashing, it will be a pair of dense vectors. +x.toDense.values.zip(y.toDense.values).map(pair => math.abs(pair._1 - pair._2)).min --- End diff -- Does this even make sense for `MinHash`? For the `RandomProjection` class I understand that the absolute difference between their hash values is a measure of their similarity, but for `MinHash` I don't think it is. It is true that dissimilar items have a lower likelihood of hash collisions, but it should not be true that they have a low likelihood to hash to buckets near each other. We use this `hashDistance` to ensure that we get enough near-neighbor candidates, but I don't see how this `hashDistance` corresponds to similarity in the case where there are no zero distance elements. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15793: [SPARK-18296][SQL] Use consistent naming for expression ...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15793 Thanks - merging in master/branch-2.1. I've disabled the flaky test also (via a commit push). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15794: [SPARK-15659][YARN] Fail if SparkContext run a new Threa...
Github user smallyard commented on the issue: https://github.com/apache/spark/pull/15794 https://issues.apache.org/jira/browse/SPARK-18297 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15794: [SPARK-15659][YARN] Fail if SparkContext run a new Threa...
Github user smallyard commented on the issue: https://github.com/apache/spark/pull/15794 @AmplabJenkins Sorry, the patch don't fix this bug. ApplicationMaster has some problem. ``` mainMethod.invoke(null, userArgs.toArray) finish(FinalApplicationStatus.SUCCEEDED, ApplicationMaster.EXIT_SUCCESS) ``` main method was finished, but sub thread may be not finished. so should not invoke finish to shutdown dirver thread. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15769 @rxin I've addressed your comment. Seems the test case is still failing... @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15684: [SPARK-18171][MESOS] Show correct framework address in m...
Github user lins05 commented on the issue: https://github.com/apache/spark/pull/15684 @zsxwing @mgummelt could you help review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15044: [SQL][SPARK-17490] Optimize SerializeFromObject() for a ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15044 LGTM, can you also address https://github.com/apache/spark/pull/15044#discussion_r86665473? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15770 ping @yanboliang --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15745: [SPARK-18207][SQL] Fix a compilation error due to HashEx...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15745 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15745: [SPARK-18207][SQL] Fix a compilation error due to HashEx...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68264/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15745: [SPARK-18207][SQL] Fix a compilation error due to HashEx...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15745 **[Test build #68264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68264/consoleFull)** for PR 15745 at commit [`6a57ba5`](https://github.com/apache/spark/commit/6a57ba564f81ba9f5f04a94f2ca516fa0c441fd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15769 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15769 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68265/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15769 **[Test build #68265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68265/consoleFull)** for PR 15769 at commit [`243b8ba`](https://github.com/apache/spark/commit/243b8ba106dd3a604a9053612d9d38482792a3db). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15792: [SPARK-18295][SQL] Make to_json function null safe (matc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15792 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68261/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15792: [SPARK-18295][SQL] Make to_json function null safe (matc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15792 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15792: [SPARK-18295][SQL] Make to_json function null safe (matc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15792 **[Test build #68261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68261/consoleFull)** for PR 15792 at commit [`ce0edda`](https://github.com/apache/spark/commit/ce0eddae4ee03002642c60cd21cc858ab4ae12a2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86714156 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,73 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This script attempt to determine the correct setting for SPARK_HOME given +# that Spark may have been installed on the system with pip. + +from __future__ import print_function +import os +import sys + + +def _find_spark_home(): +"""Find the SPARK_HOME.""" +# If the enviroment has SPARK_HOME set trust it. +if "SPARK_HOME" in os.environ: +return os.environ["SPARK_HOME"] + +def is_spark_home(path): +"""Takes a path and returns true if the provided path could be a reasonable SPARK_HOME""" +return (os.path.isfile(os.path.join(path, "bin/spark-submit")) and +(os.path.isdir(os.path.join(path, "jars")) or + os.path.isdir(os.path.join(path, "assembly" + +paths = ["../", os.path.join(os.path.dirname(sys.argv[0]), "../")] --- End diff -- I guess we could normalize the path of the pwd first and then use os.path.dirname on it. Is this something you think would make a difference? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user Yunni commented on the issue: https://github.com/apache/spark/pull/15148 @sethah I think you are right. OR-amplification is only applied inside NN search and similarity join through `hashDistance` and `explode`. `transform` itself does not apply amplifications. Sorry to miss this. I will clarify this in the user guide, and I am happy for the PR you send to fix the documentation. @jkbradley @MLnick --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #68266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68266/consoleFull)** for PR 15659 at commit [`6540964`](https://github.com/apache/spark/commit/6540964e4e584479a67965f03c8c5fbf59f4e132). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15659 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15793: [SPARK-18296][SQL] Use consistent naming for expression ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15793 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15793: [SPARK-18296][SQL] Use consistent naming for expression ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15793 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68262/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15793: [SPARK-18296][SQL] Use consistent naming for expression ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15793 **[Test build #68262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68262/consoleFull)** for PR 15793 at commit [`33f5498`](https://github.com/apache/spark/commit/33f5498bd83c0ef9e53cc235b227e85a22fe8573). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class BitwiseExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper ` * `class CollectionExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper ` * `class MathExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper ` * `class MiscExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper ` * `class NullExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper ` * ` case class DoubleData(a: java.lang.Double, b: java.lang.Double)` * ` case class NullDoubles(a: java.lang.Double)` * `class MathFunctionsSuite extends QueryTest with SharedSQLContext ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15255 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68263/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15255 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15255 **[Test build #68263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68263/consoleFull)** for PR 15255 at commit [`741d59c`](https://github.com/apache/spark/commit/741d59c2565f70404409cc4a8afcf002148c3d74). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68260/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #68260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68260/consoleFull)** for PR 15659 at commit [`6540964`](https://github.com/apache/spark/commit/6540964e4e584479a67965f03c8c5fbf59f4e132). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/15659#discussion_r86712504 --- Diff: python/pyspark/find_spark_home.py --- @@ -0,0 +1,73 @@ +#!/usr/bin/python + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This script attempt to determine the correct setting for SPARK_HOME given +# that Spark may have been installed on the system with pip. + +from __future__ import print_function +import os +import sys + + +def _find_spark_home(): +"""Find the SPARK_HOME.""" +# If the enviroment has SPARK_HOME set trust it. +if "SPARK_HOME" in os.environ: +return os.environ["SPARK_HOME"] + +def is_spark_home(path): +"""Takes a path and returns true if the provided path could be a reasonable SPARK_HOME""" +return (os.path.isfile(os.path.join(path, "bin/spark-submit")) and +(os.path.isdir(os.path.join(path, "jars")) or + os.path.isdir(os.path.join(path, "assembly" + +paths = ["../", os.path.join(os.path.dirname(sys.argv[0]), "../")] --- End diff -- To be clear it's going to the parent of the pwd and dirname respectively. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14215: [SPARK-16544][SQL][WIP] Support for conversion fr...
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/14215 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14215: [SPARK-16544][SQL][WIP] Support for conversion from comp...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14215 Hm, I am trying to make another clean version but it seems taking a bit of time. I will close this and open again when I am ready. Please feel free to take over this meanwhile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15659 Or @davies if you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/11105 So the restricturing to keep binary at compatibility resulted in a few more changes than I was expecting but what are your thoughts on the restructured approach @squito? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15659 Ping to @rxin & @matiez --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12257: [SPARK-14483][WEBUI] Display user name for each job and ...
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/12257 The screen shots look great, and you're probably right about a better solution. I'm not sure if a checkbox is much better though, especially since theres no precedent for it on these pages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15742: [SPARK-16808][Core] History Server main page does...
Github user vijoshi commented on a diff in the pull request: https://github.com/apache/spark/pull/15742#discussion_r86711698 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/HistoryServer.scala --- @@ -143,6 +143,12 @@ class HistoryServer( appCache.stop() } + // For testing - override stop timeout used by jetty + private[history] def setStopTimeout(timeout: Long): Unit = { --- End diff -- this was 30 secs additional for just the one test i added. if more such tests are added the time would add up - so i felt the option to bypass the wait was useful to have. assuming more of the UI could transition to ajax etc I could move this utility function up into the base `WebUI` class so it's available more generally for future UI tests - not just history server? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15794: [SPARK-15659][YARN] Fail if SparkContext run a ne...
Github user smallyard closed the pull request at: https://github.com/apache/spark/pull/15794 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15769 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15769 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68258/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15769 **[Test build #68258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68258/consoleFull)** for PR 15769 at commit [`cfcd823`](https://github.com/apache/spark/commit/cfcd823225b3c1b0464b4b571c78ecf969a88f25). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 @karlhigley Thanks for your detailed response. From the amplification section on [Wikipedia](https://en.wikipedia.org/wiki/Locality-sensitive_hashing#Amplification), it is pretty clear to me that this implementation is not doing OR/AND amplification. `outputDim` is just the number of concatenated random hash functions (`k` in the wiki article). For now we can clarify some of this a bit better in the documentation, and perhaps in the future we can extend this implementation to use optional AND/OR amplification. I can work on a PR for it this week, unless there are any objections. @jkbradley @Yunni @MLnick ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15787: [SPARK-18286][ML][WIP] Add Scala/Java/Python examples fo...
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15787 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15769: [SPARK-18191][CORE] Port RDD API to use commit protocol
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15769 **[Test build #68265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68265/consoleFull)** for PR 15769 at commit [`243b8ba`](https://github.com/apache/spark/commit/243b8ba106dd3a604a9053612d9d38482792a3db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15794: [SPARK-15659][YARN] Fail if SparkContext run a new Threa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15794 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15794: [SPARK-15659][YARN] Fail if SparkContext run a ne...
GitHub user smallyard opened a pull request: https://github.com/apache/spark/pull/15794 [SPARK-15659][YARN] Fail if SparkContext run a new Thread in yarn-clu⦠## What changes were proposed in this pull request? Fix Program SparkContext can't run in user thread in yarn-cluster mode. logs ``` 16/11/02 11:16:47 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread 16/11/02 11:16:47 INFO yarn.ApplicationMaster: Waiting for spark context initialization 16/11/02 11:16:47 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 16/11/02 11:16:47 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0 ``` Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. â¦ster mode. ``` public static void main(String[] args) { Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Thread(new Runnable() { @Override public void run() { SparkConf conf = new SparkConf(); conf.setAppName("SparkDemo"); JavaSparkContext sparkContext = new JavaSparkContext(conf); JavaRDD array = sparkContext.parallelize(Lists.newArrayList("1", "2", "3", "4")); System.out.println(array.count()); } }), 0, 5000, TimeUnit.MILLISECONDS); } ``` Fix this program can't run correctly. You can merge this pull request into a Git repository by running: $ git pull https://github.com/smallyard/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15794 commit 2c63972354c62221d504eb0d65d819b0adde707d Author: smallyardDate: 2016-11-07T03:43:47Z [SPARK-15659][YARN] Fail if SparkContext run a new Thread in yarn-cluster mode. ``` public static void main(String[] args) { Executors.newSingleThreadScheduledExecutor().scheduleAtFixedRate(new Thread(new Runnable() { @Override public void run() { SparkConf conf = new SparkConf(); conf.setAppName("SparkDemo"); JavaSparkContext sparkContext = new JavaSparkContext(conf); JavaRDD array = sparkContext.parallelize(Lists.newArrayList("1", "2", "3", "4")); System.out.println(array.count()); } }), 0, 5000, TimeUnit.MILLISECONDS); } ``` Fix this program can't run correctly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15745: [SPARK-18207][SQL] Fix a compilation error due to HashEx...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15745 **[Test build #68264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68264/consoleFull)** for PR 15745 at commit [`6a57ba5`](https://github.com/apache/spark/commit/6a57ba564f81ba9f5f04a94f2ca516fa0c441fd0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68259/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15659 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15659: [SPARK-1267][SPARK-18129] Allow PySpark to be pip instal...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15659 **[Test build #68259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68259/consoleFull)** for PR 15659 at commit [`fb62a8a`](https://github.com/apache/spark/commit/fb62a8ae7a6a7ad1bff239f2a09e2170dda383a8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15793: [SPARK-18296][SQL] Use consistent naming for expression ...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15793 LGTM pending test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15255 **[Test build #68263 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68263/consoleFull)** for PR 15255 at commit [`741d59c`](https://github.com/apache/spark/commit/741d59c2565f70404409cc4a8afcf002148c3d74). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15255: [SPARK-17680] [SQL] [TEST] Added a Testcase for Verifyin...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15255 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15233: [SPARK-17659] [SQL] Partitioned View is Not Suppo...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/15233 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14883: [SPARK-17319] [SQL] Move addJar from HiveSessionS...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14883 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14793: [SPARK-17221] [SQL] Build File-based Test Cases f...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14793 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14618: [SPARK-17030] [SQL] Remove/Cleanup HiveMetastoreC...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14618 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14430: [SPARK-16825] [SQL] Replace hive.default.fileform...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14430 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13770: [SPARK-16054] [SQL] Verification of Multiple Data...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/13770 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14228: [SPARK-16583] [SQL] [WIP] Improve Partition Pruni...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14228 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14322: [SPARK-16689] [SQL] FileSourceStrategy: Pruning P...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/14322 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13728: [SPARK-16010] [SQL] Code Refactoring, Test Case I...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/13728 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15793: [SPARK-18296][SQL] Use consistent naming for expression ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15793 **[Test build #68262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68262/consoleFull)** for PR 15793 at commit [`33f5498`](https://github.com/apache/spark/commit/33f5498bd83c0ef9e53cc235b227e85a22fe8573). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15793: [SPARK-18296][SQL] Use consistent naming for expr...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/15793 [SPARK-18296][SQL] Use consistent naming for expression test suites ## What changes were proposed in this pull request? We have an undocumented naming convention to call expression unit tests ExpressionsSuite, and the end-to-end tests FunctionsSuite. It'd be great to make all test suites consistent with this naming convention. ## How was this patch tested? This is a test-only naming change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-18296 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15793 commit 33f5498bd83c0ef9e53cc235b227e85a22fe8573 Author: Reynold XinDate: 2016-11-07T03:13:30Z [SPARK-18296][SQL] Use consistent naming for expression test suites --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14136: [SPARK-16282][SQL] Implement percentile SQL function.
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/14136 ping @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15484: [SPARK-17868][SQL] Do not use bitmasks during parsing an...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/15484 Does this version looks good now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15791: Merge pull request #1 from apache/master
Github user taiyangdixia commented on the issue: https://github.com/apache/spark/pull/15791 sorryï¼I just want to merge the new updates to my fork repository --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15767: [SPARK-18269][SQL] CSV datasource should read null prope...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15767 Thank you @rxin! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15688: [SPARK-18173][SQL] data source tables should supp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15688 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #14562: [SPARK-16973][SQL] remove the buffer offsets in I...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/14562 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15792: [SPARK-18295][SQL] Make to_json function null safe (matc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15792 **[Test build #68261 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68261/consoleFull)** for PR 15792 at commit [`ce0edda`](https://github.com/apache/spark/commit/ce0eddae4ee03002642c60cd21cc858ab4ae12a2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15688 Thanks - merging in master/branch-2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15688: [SPARK-18173][SQL] data source tables should supp...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15688#discussion_r86707724 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala --- @@ -1628,29 +1628,56 @@ class DDLSuite extends QueryTest with SharedSQLContext with BeforeAndAfterEach { test("truncate table - datasource table") { import testImplicits._ -val data = (1 to 10).map { i => (i, i) }.toDF("width", "length") +val data = (1 to 10).map { i => (i, i) }.toDF("width", "length") // Test both a Hive compatible and incompatible code path. Seq("json", "parquet").foreach { format => withTable("rectangles") { data.write.format(format).saveAsTable("rectangles") assume(spark.table("rectangles").collect().nonEmpty, "bad test; table was empty to begin with") + sql("TRUNCATE TABLE rectangles") assert(spark.table("rectangles").collect().isEmpty) + +// not supported since the table is not partitioned +assertUnsupported("TRUNCATE TABLE rectangles PARTITION (width=1)") } } + } -withTable("rectangles", "rectangles2") { - data.write.saveAsTable("rectangles") - data.write.partitionBy("length").saveAsTable("rectangles2") + test("truncate partitioned table - datasource table") { +import testImplicits._ - // not supported since the table is not partitioned - assertUnsupported("TRUNCATE TABLE rectangles PARTITION (width=1)") +val data = (1 to 10).map { i => (i % 3, i % 5, i) }.toDF("width", "length", "height") +withTable("partTable") { + data.write.partitionBy("width", "length").saveAsTable("partTable") // supported since partitions are stored in the metastore - sql("TRUNCATE TABLE rectangles2 PARTITION (width=1)") - assert(spark.table("rectangles2").collect().isEmpty) + sql("TRUNCATE TABLE partTable PARTITION (width=1, length=1)") + assert(spark.table("partTable").filter($"width" === 1).collect().nonEmpty) + assert(spark.table("partTable").filter($"width" === 1 && $"length" === 1).collect().isEmpty) +} + +withTable("partTable") { + data.write.partitionBy("width", "length").saveAsTable("partTable") + // support partial partition spec + sql("TRUNCATE TABLE partTable PARTITION (width=1)") + assert(spark.table("partTable").collect().nonEmpty) + assert(spark.table("partTable").filter($"width" === 1).collect().isEmpty) +} + +withTable("partTable") { + data.write.partitionBy("width", "length").saveAsTable("partTable") + // do nothing if no partition is matched for the given partial partition spec + sql("TRUNCATE TABLE partTable PARTITION (width=100)") + assert(spark.table("partTable").count() == data.count()) + + // do nothing if no partition is matched for the given non-partial partition spec + // TODO: This behaviour is different from Hive, we should decide whether we need to follow + // Hive's behaviour or stick with our existing behaviour later. --- End diff -- I actually think Hive's behavior makes sense here. If I'm giving you an exact match, you should warn me if there is an issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15688: [SPARK-18173][SQL] data source tables should support tru...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15688 @cloud-fan can you create a follow-up pr to switch over to Hive's behavior? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #11673: [SPARK-13853][SQL] QueryPlan sub-classes should o...
Github user cloud-fan closed the pull request at: https://github.com/apache/spark/pull/11673 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15792: [SPARK-18295][SQL] Make to_json function null safe
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15792 cc @marmbrus do you mind if I ask to take a look please? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15767: [SPARK-18269][SQL] CSV datasource should read nul...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15767 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15792: [SPARK-18295][SQL] Make to_json function null saf...
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15792 [SPARK-18295][SQL] Make to_json function null safe ## What changes were proposed in this pull request? This PR proposes to match up the behaviour of `to_json` to `from_json` function for null-safety. Currently, it throws `NullPointException` but this PR fixes this to produce `null` instead. with the data below: ```scala import spark.implicits._ val df = Seq(Some(Tuple1(Tuple1(1))), None).toDF("a") df.show() ``` ``` ++ | a| ++ | [1]| |null| ++ ``` the codes below ```scala import org.apache.spark.sql.functions._ df.select(to_json($"a")).show() ``` produces.. **Before** throws `NullPointException` as below: ``` java.lang.NullPointerException at org.apache.spark.sql.catalyst.json.JacksonGenerator.org$apache$spark$sql$catalyst$json$JacksonGenerator$$writeFields(JacksonGenerator.scala:138) at org.apache.spark.sql.catalyst.json.JacksonGenerator$$anonfun$write$1.apply$mcV$sp(JacksonGenerator.scala:194) at org.apache.spark.sql.catalyst.json.JacksonGenerator.org$apache$spark$sql$catalyst$json$JacksonGenerator$$writeObject(JacksonGenerator.scala:131) at org.apache.spark.sql.catalyst.json.JacksonGenerator.write(JacksonGenerator.scala:193) at org.apache.spark.sql.catalyst.expressions.StructToJson.eval(jsonExpressions.scala:544) at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:142) at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:48) at org.apache.spark.sql.catalyst.expressions.InterpretedProjection.apply(Projection.scala:30) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) ``` **After** ``` +---+ |structtojson(a)| +---+ | {"_1":1}| | null| +---+ ``` ## How was this patch tested? Unit test in `JsonExpressionsSuite.scala` and `JsonFunctionsSuite.scala`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark SPARK-18295 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15792.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15792 commit 5c534727b0d72015104c242e369d7edc5b0fe910 Author: hyukjinkwonDate: 2016-11-07T02:34:28Z Make to_json expression/function null safe commit ce0eddae4ee03002642c60cd21cc858ab4ae12a2 Author: hyukjinkwon Date: 2016-11-07T02:53:51Z Clean up the test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15767: [SPARK-18269][SQL] CSV datasource should read null prope...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15767 Merging in master/branch-2.1. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15791: Merge pull request #1 from apache/master
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15791 I think this PR was created by mistake, can you please delete it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15791: Merge pull request #1 from apache/master
Github user taiyangdixia closed the pull request at: https://github.com/apache/spark/pull/15791 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15791: Merge pull request #1 from apache/master
GitHub user taiyangdixia opened a pull request: https://github.com/apache/spark/pull/15791 Merge pull request #1 from apache/master ## What changes were proposed in this pull request? merge on 2016.11.7 ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. UPDATE You can merge this pull request into a Git repository by running: $ git pull https://github.com/taiyangdixia/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15791.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15791 commit ca05ceccde9aff5029da0bb45ba67cea2cc403cd Author: taiyangdixiaDate: 2016-08-18T08:13:41Z Merge pull request #1 from apache/master UPDATE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org