[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135637877 [Test build #41720 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41720/console) for PR 8494 at commit [`bfd40d9`](https://github.com/apache/spark/commit/bfd40d999b6530bc04fc03ea6591c0093e10e534). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8455#discussion_r38171523 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -30,6 +30,7 @@ import org.apache.spark.sql.{DataFrame, Row} /** + * :: Experimental :: --- End diff -- This is also necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8455#issuecomment-135637836 LGTM except the comment above. I'll merge it after 1.5. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135638024 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135640246 I'd follow postgres here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8057][Core]Call TaskAttemptContext.getT...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/6599#issuecomment-135598700 I think that we should also backport this to branch-1.4. +1 since we fix it in 1.5.0. Just confirmed this one didn't have conflicts with branch-1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135603859 [Test build #41717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41717/consoleFull) for PR 8464 at commit [`7dcd502`](https://github.com/apache/spark/commit/7dcd502fc7278978fab5a233f4a81fefcca8bf72). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38166499 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { --- End diff -- I think we still need `filter`, or `map` for these iterator trees. @rxin is there anything I misunderstand for the `LocalNode` design? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8505][SparkR] Add settings to kick `lin...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/7883#issuecomment-135615906 Alright I'm going to merge this as its better to do so before more breaking style changes get in. Will watch Jenkins for the next couple of hours to make sure things are fine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8813][SQL] Combine files when there're ...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/8125#discussion_r38167285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/CombineSmallFile.scala --- @@ -0,0 +1,43 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.sources + +import org.apache.hadoop.fs.{FileStatus, FileSystem, Path} +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.SQLContext + +object CombineSmallFile { + def combineWithFiles[T](rdd: RDD[T], sqlContext: SQLContext, inputFiles: Array[FileStatus]) + : RDD[T] = { +if (sqlContext.conf.combineSmallFile) { + val totalLen = inputFiles.map { file = +if (file.isDir) 0L else file.getLen + }.sum + val numPartitions = (totalLen / sqlContext.conf.splitSize + 1).toInt + rdd.coalesce(numPartitions) --- End diff -- I think this is a very hack way to solve this problem. As we can not tell how the the data source to be split, even for Hadoop, the split size just a hint, use that for computing the partition number probably too risky for a generic data process framework. And the `RDD.coalesce` actually will combine the splits in a arbitrary way, it's probably causes the data skew, as we most likely combine the large partitions into a a single task. IMO, I'd like to deep investigate how Hive to combine the small partitions, by using the `CombineHiveInputFormat` or `HiveInputFormat`, which seems has a strategy to select the partitions according to both input format, and also keep the balance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135616119 Thanks @lresende -- LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135620132 [Test build #41718 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/console) for PR 8343 at commit [`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135620007 [Test build #41720 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41720/consoleFull) for PR 8494 at commit [`bfd40d9`](https://github.com/apache/spark/commit/bfd40d999b6530bc04fc03ea6591c0093e10e534). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135621721 [Test build #41722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41722/consoleFull) for PR 8495 at commit [`4758a87`](https://github.com/apache/spark/commit/4758a87ea3b74914ffd2870e1a736472944c4a04). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135624869 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135629824 [Test build #41716 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41716/console) for PR 8493 at commit [`a14dba5`](https://github.com/apache/spark/commit/a14dba5233526f844a68d77c5d765d98b0534e2a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135629856 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135629857 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41716/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10082] [MLlib] Validate i, j in apply D...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8271#issuecomment-135638595 I think this requires some micro-benchmark. I want to see the overhead of additional two checks. We can also test a single `require` statement that contains both checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135639349 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9741][SQL] Approximate Count Distinct u...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8362#issuecomment-135639351 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41719/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135639367 Merging this in master branch-1.5. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135595941 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135595930 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135601511 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10326] [yarn] Fix app submission on win...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8493#issuecomment-135601503 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10327][SQL] Cache Table is not working ...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/8494#issuecomment-135618510 cc @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8455#issuecomment-135637446 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9679][ML][PYSPARK] Add Python API for S...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8118#discussion_r38171484 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -29,14 +29,14 @@ import org.apache.spark.sql.types.{ArrayType, StringType, StructField, StructTyp /** * stop words list */ -private object StopWords { +protected[spark] object StopWords { --- End diff -- `private[spark]` should be the same but appears more often --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135637490 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41725/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8455#issuecomment-135637464 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135637488 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8455#issuecomment-135637651 [Test build #41728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41728/consoleFull) for PR 8455 at commit [`2c0a4d0`](https://github.com/apache/spark/commit/2c0a4d0e2cd6da8371bab064c83e8e155aa5183f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38164639 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/local/LimitNode.scala --- @@ -0,0 +1,45 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.Attribute + + +case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode { + + private[this] var count = 0 + + override def output: Seq[Attribute] = child.output + + override def open(): Unit = child.open() --- End diff -- LocalNode cannot be reused, just like Iterator. So it's not necessary to reset it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135604744 [Test build #41711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41711/console) for PR 7878 at commit [`cf58c49`](https://github.com/apache/spark/commit/cf58c49c3be31c8e33639ba68eca16398f98c7f6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135604771 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41711/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135604767 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8952] [SPARKR] - Wrap normalizePath cal...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8343#issuecomment-135617402 [Test build #41718 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41718/consoleFull) for PR 8343 at commit [`472c767`](https://github.com/apache/spark/commit/472c76714c25b909e281d8079b7ead6c152d4512). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR] [SPARK-10328] Fix generic for na.omit
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8495#issuecomment-135620933 @yu-iskw I also found a minor bug in lint-r that I just fixed. Please let me know if that is good. With this change lint-r passes on my machine --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10260][ML] Add @Since annotation to ml....
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/8455#discussion_r38169722 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -30,8 +30,11 @@ import org.apache.spark.sql.{DataFrame, Row} /** + * :: Experimental :: * Common params for KMeans and KMeansModel */ +@Since(1.5.0) +@Experimental --- End diff -- Both `Since` and `Experimental` are not necessary because this is package private. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10323] [SQL] fix nullability of In/InSe...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/8492#issuecomment-135634760 From PostgresSQL: ``` If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values. ``` It's more consistent in PostgresSQL, I'd like to follow it. cc @rxin @marmbrus --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135635298 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-9545, SPARK-9547: Use Maven in PRB if ti...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/7878#issuecomment-135635256 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Core] whitespace fixes in RangePartitioner
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8480#issuecomment-135366372 Yes, I think that commit was to pass style checks though. I assume this doesn't fail anything? I mean, I don't mind just merging this, but in my personal opinion I'd like to lightly push back on very small non-functional changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10257][MLlib] Removes Guava from all sp...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8451#issuecomment-135366571 The other sibling PRs look good and I can merge them. This looks good after the `Strings` change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10256][ML] Removes guava dependency fro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8447 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10254][ML]Removes Guava dependencies in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8445 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10255][ML] Removes Guava dependencies f...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8446 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135368814 [Test build #41676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41676/console) for PR 8464 at commit [`62b8d24`](https://github.com/apache/spark/commit/62b8d2411d5f3be1460f68c02e6af6e3ab10fdce). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class LimitNode(limit: Int, child: LocalNode) extends UnaryLocalNode ` * `case class UnionNode(children: Seq[LocalNode]) extends LocalNode ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135368942 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135368946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41676/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9613] [HOTFIX] Fix usage of JavaConvert...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8479#issuecomment-135369577 [Test build #1696 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1696/console) for PR 8479 at commit [`b6c17e7`](https://github.com/apache/spark/commit/b6c17e7daad09096e0bed94e677226b61d349bc1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class LogisticRegressionModel @Since(1.3.0) (` * `class SVMModel @Since(1.1.0) (` * `class GaussianMixtureModel @Since(1.3.0) (` * `class KMeansModel @Since(1.1.0) (@Since(1.0.0) val clusterCenters: Array[Vector])` * `class PowerIterationClusteringModel @Since(1.3.0) (` * `class StreamingKMeansModel @Since(1.2.0) (` * `class StreamingKMeans @Since(1.2.0) (` * `class BinaryClassificationMetrics @Since(1.3.0) (` * `class MulticlassMetrics @Since(1.1.0) (predictionAndLabels: RDD[(Double, Double)]) ` * `class MultilabelMetrics @Since(1.2.0) (predictionAndLabels: RDD[(Array[Double], Array[Double])]) ` * `class RegressionMetrics @Since(1.2.0) (` * `class ChiSqSelectorModel @Since(1.3.0) (` * `class ChiSqSelector @Since(1.3.0) (` * `class ElementwiseProduct @Since(1.4.0) (` * `class IDF @Since(1.2.0) (@Since(1.2.0) val minDocFreq: Int) ` * `class Normalizer @Since(1.1.0) (p: Double) extends VectorTransformer ` * `class PCA @Since(1.4.0) (@Since(1.4.0) val k: Int) ` * `class StandardScaler @Since(1.1.0) (withMean: Boolean, withStd: Boolean) extends Logging ` * `class StandardScalerModel @Since(1.3.0) (` * `class FPGrowthModel[Item: ClassTag] @Since(1.3.0) (` * ` class FreqItemset[Item] @Since(1.3.0) (` * ` class FreqSequence[Item] @Since(1.5.0) (` * `class PrefixSpanModel[Item] @Since(1.5.0) (` * `class DenseMatrix @Since(1.3.0) (` * `class SparseMatrix @Since(1.3.0) (` * `class DenseVector @Since(1.0.0) (` * `class SparseVector @Since(1.0.0) (` * `class BlockMatrix @Since(1.3.0) (` * `class CoordinateMatrix @Since(1.0.0) (` * `class IndexedRowMatrix @Since(1.0.0) (` * `class RowMatrix @Since(1.0.0) (` * `class PoissonGenerator @Since(1.1.0) (` * `class ExponentialGenerator @Since(1.3.0) (` * `class GammaGenerator @Since(1.3.0) (` * `class LogNormalGenerator @Since(1.3.0) (` * `case class Rating @Since(0.8.0) (` * `class MatrixFactorizationModel @Since(0.8.0) (` * `abstract class GeneralizedLinearModel @Since(1.0.0) (` * `class IsotonicRegressionModel @Since(1.3.0) (` * `case class LabeledPoint @Since(1.0.0) (` * `class LassoModel @Since(1.1.0) (` * `class LinearRegressionModel @Since(1.1.0) (` * `class RidgeRegressionModel @Since(1.1.0) (` * `class MultivariateGaussian @Since(1.3.0) (` * `case class BoostingStrategy @Since(1.4.0) (` * `class Strategy @Since(1.3.0) (` * `class DecisionTreeModel @Since(1.0.0) (` * `class Node @Since(1.2.0) (` * `class Predict @Since(1.2.0) (` * `class RandomForestModel @Since(1.2.0) (` * `class GradientBoostedTreesModel @Since(1.2.0) (` * `abstract class SetOperation(left: LogicalPlan, right: LogicalPlan) extends BinaryNode ` * `case class Union(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right) ` * `case class Intersect(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)` * `case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(left, right)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/8464#discussion_r38080354 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/local/LocalNodeTest.scala --- @@ -0,0 +1,192 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the License); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an AS IS BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +package org.apache.spark.sql.execution.local + +import scala.util.control.NonFatal + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow} +import org.apache.spark.sql.catalyst.util._ +import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.sql.types.StructType + +class LocalNodeTest extends SparkFunSuite { + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer( + input: DataFrame, + nodeFunction: LocalNode = LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + input :: Nil, + nodes = nodeFunction(nodes.head), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the LocalNode and makes sure the answer matches the expected result. + * @param left the left input data to be used. + * @param right the right input data to be used. + * @param nodeFunction a function which accepts the input LocalNode and uses it to instantiate + * the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def checkAnswer2( + left: DataFrame, + right: DataFrame, + nodeFunction: (LocalNode, LocalNode) = LocalNode, + expectedAnswer: Seq[Row], + sortAnswers: Boolean = true): Unit = { +doCheckAnswer( + left :: right :: Nil, + nodes = nodeFunction(nodes(0), nodes(1)), + expectedAnswer, + sortAnswers) + } + + /** + * Runs the `LocalNode`s and makes sure the answer matches the expected result. + * @param input the input data to be used. + * @param nodeFunction a function which accepts a sequence of input `LocalNode`s and uses them to + * instantiate the local physical operator that's being tested. + * @param expectedAnswer the expected result in a [[Seq]] of [[Row]]s. + * @param sortAnswers if true, the answers will be sorted by their toString representations prior + *to being compared. + */ + protected def doCheckAnswer( +input: Seq[DataFrame], +nodeFunction: Seq[LocalNode] = LocalNode, +expectedAnswer: Seq[Row], +sortAnswers: Boolean = true): Unit = { +LocalNodeTest.checkAnswer( + input.map(dataFrameToSeqScanNode), nodeFunction, expectedAnswer, sortAnswers) match { + case Some(errorMessage) = fail(errorMessage) + case None = +} + } + + protected def dataFrameToSeqScanNode(df: DataFrame): SeqScanNode = { +val output = df.queryExecution.sparkPlan.output +val converter = + CatalystTypeConverters.createToCatalystConverter(StructType.fromAttributes(output)) +new SeqScanNode( + output, + df.collect().map(r = converter(r).asInstanceOf[InternalRow])) --- End diff -- Cool. Fixed it. --- If your
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes typo in non-public H...
GitHub user liancheng opened a pull request: https://github.com/apache/spark/pull/8481 [SPARK-SQL] [MINOR] Fixes typo in non-public HiveContext method name You can merge this pull request into a Git repository by running: $ git pull https://github.com/liancheng/spark hive-context-typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8481.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8481 commit ff85c949087fb9e64858f918c5c831672ae562ed Author: Cheng Lian l...@databricks.com Date: 2015-08-27T10:03:39Z Fixes typo in non-public HiveContext method name --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135370380 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135370354 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes typo in non-public H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135370363 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes typo in non-public H...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135370350 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9613] [HOTFIX] Fix usage of JavaConvert...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8479 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9986][SPARK-9991][SPARK-9993][SQL]Creat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8464#issuecomment-135371382 [Test build #41681 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41681/consoleFull) for PR 8464 at commit [`22e7bc0`](https://github.com/apache/spark/commit/22e7bc0b9882b637bb06ee39a66d3ece789042fa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2365] Add IndexedRDD, an efficient upda...
Github user zerosign commented on the pull request: https://github.com/apache/spark/pull/1297#issuecomment-135373104 Hi Ankur, Any update on this pull request ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135304152 The normalization is not done by StructObjectInspector or OrcStructObjectInspector, but in `SemanticAnalyzer` of Hive. I've checked with Hive, even the orc column names are in capital, Hive works well, the only thing I am not sure is about the column pruning and predicate push down, seems explain extended select xx of Hive doesn't give those information, maybe @zhzhan can give some comments on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10251][CORE] some common types are not ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8465 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/7520#discussion_r38065116 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala --- @@ -253,7 +260,7 @@ private[orc] case class OrcTableScan( maybeStructOI.map { soi = val (fieldRefs, fieldOrdinals) = nonPartitionKeyAttrs.map { case (attr, ordinal) = - soi.getStructFieldRef(attr.name.toLowerCase) - ordinal --- End diff -- If don't do the normalization, is this the only place we need to change? Since both `StructObjectInspector` and `OrcStructObjectInspector` are working for the same purpose. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135308659 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135308665 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] Instead of StandardStructObj...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135308292 [Test build #41671 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41671/console) for PR 7520 at commit [`055cd09`](https://github.com/apache/spark/commit/055cd09a09fff47cf43578a19ac78b77610231ce). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10251][CORE] some common types are not ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8465#issuecomment-135304194 Thanks - I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135373483 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135373539 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135373540 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41680/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135373462 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135374072 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135373958 [Test build #41677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41677/console) for PR 7520 at commit [`a389746`](https://github.com/apache/spark/commit/a38974647ac75a359ae7495af39b93152a437d72). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135376204 [Test build #41683 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41683/consoleFull) for PR 8458 at commit [`02c64eb`](https://github.com/apache/spark/commit/02c64eb93b75d9ac0e2a12d8dd5a8c1ed5d143f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...
GitHub user romansew opened a pull request: https://github.com/apache/spark/pull/8482 SPARK-10314 [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x n⦠SPARK-10314 [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x not found exception when parallelism is big than data split size You can merge this pull request into a Git repository by running: $ git pull https://github.com/jd-ode/spark branch-1.4 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8482.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8482 commit 1da81ab8cd6e7e45e1b2d03352ecbbb1635f644c Author: wangxiaoyu8 wangxiao...@jd.com Date: 2015-08-27T10:41:15Z SPARK-10314 [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x not found exception when parallelism is big than data split size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10314 [CORE]RDD persist to OFF_HEAP tach...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8482#issuecomment-135378312 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove spark.akka.failure-detect...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135384552 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove spark.akka.failure-detect...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135384520 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135387576 /cc @JoshRosen @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135388498 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135374076 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41677/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-SQL] [MINOR] Fixes some typos in HiveCo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8481#issuecomment-135374158 [Test build #41682 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41682/consoleFull) for PR 8481 at commit [`2b414e4`](https://github.com/apache/spark/commit/2b414e4b4c8ecb9183d8497c5d5cc1c16bcde470). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135375355 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135375375 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135376437 [Test build #41683 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41683/console) for PR 8458 at commit [`02c64eb`](https://github.com/apache/spark/commit/02c64eb93b75d9ac0e2a12d8dd5a8c1ed5d143f2). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135376440 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41683/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10049][SPARKR][WIP] Support collecting ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8458#issuecomment-135376439 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: REMOVE spark.akka.failure-detector.threshold
GitHub user CodingCat opened a pull request: https://github.com/apache/spark/pull/8483 REMOVE spark.akka.failure-detector.threshold You can merge this pull request into a Git repository by running: $ git pull https://github.com/CodingCat/spark SPARK_10315 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8483.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8483 commit 70c8f7be4b3d080aa29ae9b37e1e45c6a204bb9c Author: CodingCat zhunans...@gmail.com Date: 2015-08-27T11:01:50Z REMOVE spark.akka.failure-detector.threshold --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135386761 [Test build #41684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41684/consoleFull) for PR 8483 at commit [`70c8f7b`](https://github.com/apache/spark/commit/70c8f7be4b3d080aa29ae9b37e1e45c6a204bb9c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/8484 [SPARK-10065][SQL] Avoid triple copying of var-length objects in Array in tungsten projection JIRA: https://issues.apache.org/jira/browse/SPARK-10065 Currently we do unnecessary copying of objects in the array. We should avoid them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 avoid-triple-obj-copying Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8484.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8484 commit 9e118c228e57b5a78dd1c370f261cd40a42ec1d3 Author: Liang-Chi Hsieh vii...@appier.com Date: 2015-08-27T11:06:29Z Avoid triple copying of var-length objects in Array in tungsten projection. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user CodingCat commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135388747 @srowen , mind taking a look ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135388439 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135389447 [Test build #41685 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41685/consoleFull) for PR 8484 at commit [`9e118c2`](https://github.com/apache/spark/commit/9e118c228e57b5a78dd1c370f261cd40a42ec1d3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135392719 [Test build #41684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41684/console) for PR 8483 at commit [`70c8f7b`](https://github.com/apache/spark/commit/70c8f7be4b3d080aa29ae9b37e1e45c6a204bb9c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135392859 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9170][SQL] User-provided columns should...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/7520#issuecomment-135392980 One thing to note is that, case sensitivity of Spark SQL is configurable ([see here] [1]). So I don't think we should make `StructType` completely case insensitive (yet case preserving). If I understand this issue correctly, the root problem here is that, while writing schema information to physical ORC files, our current approach isn't case preserving. As suggested by @chenghao-intel, when saving a DataFrame as Hive metastore tables using ORC, Spark SQL 1.5 now saves it in a Hive compatible approach, so that we can read the data back using Hive. This implies that, changes made in this PR should also be compatible with Hive. After investigating Hive's behavior for a while, I got some interesting findings. Snippets below were executed against Hive 1.2.1 (with a PostgreSQL metastore) and Spark SQL 1.5-SNAPSHOT ([revision 05c] [2]). Firstly, let's prepare a Hive ORC table: ``` hive CREATE TABLE orc_test STORED AS ORC AS SELECT 1 AS CoL; ... hive SELECT col FROM orc_test; OK 1 Time taken: 0.056 seconds, Fetched: 1 row(s) hive SELECT COL FROM orc_test; OK 1 Time taken: 0.056 seconds, Fetched: 1 row(s) hive DESC orc_test; OK col int Time taken: 0.047 seconds, Fetched: 1 row(s) ``` So Hive is neither case sensitive nor case preserving. We can further prove this by checking metastore table `COLUMN_V2`: ``` metastore_hive121 SELECT * FROM COLUMNS_V2 +-+---+---+-+---+ | CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX | |-+---+---+-+---| | 22 |null | col | int | 0 | +-+---+---+-+---+ ``` (I cleared my local Hive warehouse, so the only column record here is the one created above.) Now let's read the physical ORC files directly using Spark: ``` scala sqlContext.read.orc(hdfs://localhost:9000/user/hive/warehouse_hive121/orc_test).printSchema() root |-- _col0: integer (nullable = true) scala sqlContext.read.orc(hdfs://localhost:9000/user/hive/warehouse_hive121/orc_test).show() +-+ |_col0| +-+ |1| +-+ ``` Huh? Why it's `_col0` instead of `col`? Let's inspect the physical ORC file written by Hive: ``` $ hive --orcfiledump /user/hive/warehouse_hive121/orc_test/00_0 Structure for /user/hive/warehouse_hive121/orc_test/00_0 File Version: 0.12 with HIVE_8732 15/08/27 19:07:15 INFO orc.ReaderImpl: Reading ORC rows from /user/hive/warehouse_hive121/orc_test/00_0 with {include: null, offset: 0, length: 9223372036854775807} 15/08/27 19:07:15 INFO orc.RecordReaderFactory: Schema is not specified on read. Using file schema. Rows: 1 Compression: ZLIB Compression size: 262144 Type: struct_col0:int !!! ... ``` Surprise! So, when writing ORC files, *Hive doesn't even preserve the column names*. Conclusions: 1. Making `StructType` completely case insensitive is unacceptable. 1. Concrete column names written into ORC files by Spark SQL don't affect interoperability with Hive. 1. It would be good for Spark SQL to be case preserving when writing ORC files. And I think this is the task this PR should aim. [1]: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/SQLConf.scala#L247-L249 [2]: https://github.com/apache/spark/commit/bb1640529725c6c38103b95af004f8bd905c --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10315] remove document on spark.akka.fa...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8483#issuecomment-135392861 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41684/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135394838 [Test build #41685 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/41685/console) for PR 8484 at commit [`9e118c2`](https://github.com/apache/spark/commit/9e118c228e57b5a78dd1c370f261cd40a42ec1d3). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10065][SQL] Avoid triple copying of var...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8484#issuecomment-135394871 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org