[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8959#discussion_r41173657 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1183,11 +1194,22 @@ object Client extends Logging { private def getUserClasspath( --- End diff -- (I see the other one calls this; both could be merged into a single method, though.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10669] [Docs] Link to each language's A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8977#issuecomment-145609285 [Test build #1842 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1842/console) for PR 8977 at commit [`5e298fb`](https://github.com/apache/spark/commit/5e298fb610ba35136cfdc6a2eeea5e9abe1b81fc). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10852][PySpark][SQL] Override built-in ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/8934#issuecomment-145609363 @viirya It will be great we can fix it magically. I'm worried that the current approach will introduce some performance regressions. As we always have a way to workaround it using `row["count"]` (similar to escape column names in SQL), so it's not a blocker for uses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145609327 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8747#issuecomment-145609517 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10917] [SQL] improve performance of com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8971#issuecomment-145611558 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10917] [SQL] improve performance of com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8971#issuecomment-145611452 [Test build #43241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43241/console) for PR 8971 at commit [`59bb2f9`](https://github.com/apache/spark/commit/59bb2f969cef9079606e2918289ce33db3201db4). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10917] [SQL] improve performance of com...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8971#issuecomment-145611560 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43241/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145613690 [Test build #43244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43244/consoleFull) for PR 8920 at commit [`05f9009`](https://github.com/apache/spark/commit/05f9009f4ec42b7ccf26519a6bb9746cb9a2ccd6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1267][PYSPARK] Adds pip installer for p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8318#discussion_r41175012 --- Diff: python/pyspark/__init__.py --- @@ -36,6 +36,31 @@ Finer-grained cache persistence levels. """ +import os +import sys + +import xml.etree.ElementTree as ET + +if (os.environ.get("SPARK_HOME", "not found") == "not found"): +raise ImportError("Environment variable SPARK_HOME is undefined.") + +spark_home = os.environ['SPARK_HOME'] +pom_xml_file_path = os.path.join(spark_home, 'pom.xml') --- End diff -- There is no pom file inside the released bin package, I think we should look for another way to find out the Spark version. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...
Github user travishegner commented on a diff in the pull request: https://github.com/apache/spark/pull/8780#discussion_r41172742 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -140,7 +140,12 @@ object DecimalType extends AbstractDataType { } private[sql] def bounded(precision: Int, scale: Int): DecimalType = { -DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE)) --- End diff -- I will take your word for the risk involved, I am very new to this project. From a layman's perspective, it seems that doing some basic checks when instantiating the type would make the type more robust. If I understand correctly a `precision <= 0` is not allowed, so this patch returns a /default/ decimal. Similarly, a `scale > precision` is not allowed, so it returns a decimal with the scale truncated to the size of the precision. My thoughts are that this will catch unexpected inputs and still behave in an expected way. Users instantiating these decimals in ways are intended will still get the same type back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10917] [SQL] improve performance of com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8971#issuecomment-145607323 [Test build #43241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43241/consoleFull) for PR 8971 at commit [`59bb2f9`](https://github.com/apache/spark/commit/59bb2f969cef9079606e2918289ce33db3201db4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10852][PySpark][SQL] Override built-in ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8934#discussion_r41173178 --- Diff: python/pyspark/sql/types.py --- @@ -1189,6 +1189,16 @@ class Row(tuple):>>> Person("Alice", 11) Row(name='Alice', age=11) + +Some special column names such as aggregated column count, should --- End diff -- These kind of tests should be in sql/tests.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145610834 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145610772 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8747#issuecomment-145610401 [Test build #43243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43243/consoleFull) for PR 8747 at commit [`d7f941d`](https://github.com/apache/spark/commit/d7f941d4edc6e3165790f2546fc3e7f378f04250). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145606571 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SQL] Push down string filters to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-145606676 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SQL] Push down string filters to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-145606518 [Test build #43238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43238/console) for PR 8956 at commit [`eb134b9`](https://github.com/apache/spark/commit/eb134b993720a42154c430e508847f852882c5c1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class StringFilter(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1267][PYSPARK] Adds pip installer for p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8318#discussion_r41173642 --- Diff: python/pyspark/__init__.py --- @@ -36,6 +36,31 @@ Finer-grained cache persistence levels. """ +import os +import sys + +import xml.etree.ElementTree as ET + +if (os.environ.get("SPARK_HOME", "not found") == "not found"): --- End diff -- `if os.environ.get("SPARK_HOME") is None:` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145619279 sorry typo, running scalastyle manually and will post updated patch shortly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145619261 [Test build #43244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43244/console) for PR 8920 at commit [`05f9009`](https://github.com/apache/spark/commit/05f9009f4ec42b7ccf26519a6bb9746cb9a2ccd6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145619405 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43244/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145619402 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10829][SQL]Filter combine partition key...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/8916#discussion_r41177762 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala --- @@ -62,7 +62,30 @@ private[sql] object DataSourceStrategy extends Strategy with Logging { // Scanning partitioned HadoopFsRelation case PhysicalOperation(projects, filters, l @ LogicalRelation(t: HadoopFsRelation)) if t.partitionSpec.partitionColumns.nonEmpty => - val selectedPartitions = prunePartitions(filters, t.partitionSpec).toArray + // We divide the filter expressions into 3 parts + val partitionColumnNames = t.partitionSpec.partitionColumns.map(_.name).toSet + val filterMap = filters.groupBy { f => +// TODO this is case-senstive +val referencedColumnNames = f.references.map(_.name).toSet +if (referencedColumnNames.subsetOf(partitionColumnNames)) { + // Only reference the partition key + 0 +} else if (referencedColumnNames.intersect(partitionColumnNames).isEmpty) { + // Not reference any partition key at all. can be push down + 1 +} else { + // Reference both partition key and attributes + 2 +} + } + // Only prunning the partition keys + val partitionFilters = filterMap.getOrElse(0, Nil) + // Only pushes down predicates that do not reference partition keys. + val pushedFilters = filterMap.getOrElse(1, Nil) + // Predicates with both partition keys and attributes + val combineFilters = filterMap.getOrElse(2, Nil) --- End diff -- Instead of using `groupBy`, can we just use `filter` 3 times to split these 3 kinds of filters? I think performance doesn't matter here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145620685 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145620682 [Test build #43245 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43245/console) for PR 8959 at commit [`e627adc`](https://github.com/apache/spark/commit/e627adc4c5e239ff52ec9c1e33fe57dfe8294b0d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10917] [SQL] improve performance of com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8971#issuecomment-145621181 [Test build #1843 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1843/consoleFull) for PR 8971 at commit [`59bb2f9`](https://github.com/apache/spark/commit/59bb2f969cef9079606e2918289ce33db3201db4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10900][Streaming]Add output operation e...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/8958#discussion_r41178108 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/OutputOperationInfo.scala --- @@ -0,0 +1,41 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streaming.scheduler + +import org.apache.spark.annotation.DeveloperApi + +/** + * :: DeveloperApi :: + * Class having information on output operations. + * @param id Id of this output operation. Different output operations have different ids in a batch. + * @param description the description of this output operation. + * @param startTime Clock time of when the output operation started processing + * @param endTime Clock time of when the output operation started processing + */ +@DeveloperApi +case class OutputOperationInfo( +id: Int, --- End diff -- This probably should have some reference to which batch this output operation belongs to. Batch time probably. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-145623644 [Test build #43247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43247/consoleFull) for PR 8563 at commit [`e6e5c86`](https://github.com/apache/spark/commit/e6e5c86d0aaae57c043b82734b274382945d52af). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145624942 [Test build #43246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43246/consoleFull) for PR 8959 at commit [`ac3ffcf`](https://github.com/apache/spark/commit/ac3ffcfca047804a2cbdea47064a6234ae6f9fac). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8654][SQL] Fix Analysis exception when ...
GitHub user dilipbiswal opened a pull request: https://github.com/apache/spark/pull/8983 [SPARK-8654][SQL] Fix Analysis exception when using NULL IN (...) In the analysis phase , while processing the rules for IN predicate, we compare the in-list types to the lhs expression type and generate cast operation if necessary. In the case of NULL [NOT] IN expr1 , we end up generating cast between in list types to NULL like cast (1 as NULL) which is not a valid cast. The fix is to not generate such a cast if the lhs type is a NullType instead we translate the expression to Literal(Null). You can merge this pull request into a Git repository by running: $ git pull https://github.com/dilipbiswal/spark spark_8654 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8983.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8983 commit 38f973bb124c63c1caabe14ee6e5cca7b764b15a Author: Dilip BiswalDate: 2015-10-02T23:20:56Z [SPARK-8654] Analysis exception when using NULL IN (...) : invalid cast In the analysis phase , while processing the rules for IN predicate, we compare the in-list types to the lhs expression type and generate cast operation if necessary. In the case of NULL [NOT] IN expr1 , we end up generating cast between in list types to NULL like cast (1 as NULL) which is not a valid cast. The fix is to not generate such a cast if the lhs type is a NullType instead we translate the expression to Literal(Null). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10709] [SQL] When loading a json datase...
Github user piggybox commented on the pull request: https://github.com/apache/spark/pull/8899#issuecomment-145629205 This is indeed confusing. I once used sqlContext.read.json() to read a path to a folder of JSON files and that worked, so I then tried the parent folder to read recursively and saw this error. Also I tried the parent path ended with '/*' or '/*/' as a glob I used in Hadoop and got the same error. I want to add that it's confusing also because in the log I can see things like " HadoopRDD: Input split: " followed by the path to one of actual JSON file, so that seems Spark does find the file and starts reading. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-145634505 [Test build #43247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43247/console) for PR 8563 at commit [`e6e5c86`](https://github.com/apache/spark/commit/e6e5c86d0aaae57c043b82734b274382945d52af). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-145634658 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43247/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10895][SQL] Push down string filters to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8956#issuecomment-145606678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43238/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145606547 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10852][PySpark][SQL] Override built-in ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8934#discussion_r41172960 --- Diff: python/pyspark/sql/types.py --- @@ -1209,6 +1219,12 @@ def __new__(self, *args, **kwargs): else: raise ValueError("No args or kwargs") +def __init__(self, *args, **kwargs): +if hasattr(self, "__fields__") and "count" in self.__fields__: --- End diff -- Should we check all the names of method? `self.__fields__` is an list, the `in` could be expensive. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145609903 LGTM pending tests and a minor cleanup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145619563 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145619542 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145620687 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43245/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6284][MESOS] Add mesos role, principal ...
Github user tnachen commented on the pull request: https://github.com/apache/spark/pull/4960#issuecomment-145625978 Hi @AndriiOmelianenko, I have a PR out to fix that here https://github.com/apache/spark/pull/8872 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145609320 [Test build #43242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43242/console) for PR 8959 at commit [`b860226`](https://github.com/apache/spark/commit/b860226c563fc86a84d2ed80353c6502852829f9). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/8959#discussion_r41173518 --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -1183,11 +1194,22 @@ object Client extends Logging { private def getUserClasspath( --- End diff -- This method is not needed anymore, is it? Only the one that takes a `SparkConf` directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8747#issuecomment-145609556 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145609329 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43242/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...
Github user travishegner commented on the pull request: https://github.com/apache/spark/pull/8780#issuecomment-145609065 @cloud-fan @bdolbeare @davies I'm certainly open to doing this in an oracle specific way if that is what is required. I was simply hoping to solve my problem while simultaneously making the whole project more robust. I completely understand if you don't believe that it's the right direction. Thanks for looking into it with me! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1267][PYSPARK] Adds pip installer for p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8318#discussion_r41175819 --- Diff: python/pyspark/pyspark_version.py --- @@ -0,0 +1,17 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +__version__ = '1.5.0' --- End diff -- I think it's error-prone to have multiple copy of version in different places, if someone forget to update his, PySpark will break (even within the repo). I'd vote for generate the version during generating PyPI package. If PySpark came along with Spark, we don't need this check (at least it shouldn't fail or slow). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1267][PYSPARK] Adds pip installer for p...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/8318#discussion_r41176383 --- Diff: python/setup.py --- @@ -0,0 +1,18 @@ +#!/usr/bin/env python + +from setuptools import setup + +exec(compile(open("pyspark/pyspark_version.py").read(), + "pyspark/pyspark_version.py", 'exec')) +VERSION = __version__ + +setup(name='pyspark', +version=VERSION, +description='Apache Spark Python API', +author='Spark Developers', +author_email='d...@spark.apache.org', +url='https://github.com/apache/spark/tree/master/python', +packages=['pyspark', 'pyspark.mllib', 'pyspark.ml', 'pyspark.sql', 'pyspark.streaming'], +install_requires=['numpy>=1.7', 'py4j==0.8.2.1', 'pandas'], --- End diff -- pyspark does not depend on numpy and pandas, only pyspark.mllib/ml do. pyspark.sql does not require pandas. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145620136 [Test build #43245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43245/consoleFull) for PR 8959 at commit [`e627adc`](https://github.com/apache/spark/commit/e627adc4c5e239ff52ec9c1e33fe57dfe8294b0d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10900][Streaming]Add output operation e...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/8958#discussion_r41178301 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/StreamingListenerSuite.scala --- @@ -140,6 +140,27 @@ class StreamingListenerSuite extends TestSuiteBase with Matchers { } } + test("output operation reporting") { +ssc = new StreamingContext("local[2]", "test", Milliseconds(1000)) +val inputStream = ssc.receiverStream(new StreamingListenerSuiteReceiver) +inputStream.foreachRDD(_.count()) +inputStream.foreachRDD(_.collect()) +inputStream.foreachRDD(_.count()) + +val collector = new OutputOperationInfoCollector +ssc.addStreamingListener(collector) + +ssc.start() +try { + eventually(timeout(30 seconds), interval(20 millis)) { --- End diff -- makes sense especially now that there is a output operation info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8654][SQL] Fix Analysis exception when ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8983#issuecomment-145630551 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-145634657 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-145623267 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8514] LU factorization on BlockMatrix
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8563#issuecomment-145623227 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145608874 [Test build #43242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43242/consoleFull) for PR 8959 at commit [`b860226`](https://github.com/apache/spark/commit/b860226c563fc86a84d2ed80353c6502852829f9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145623250 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10901] [YARN] spark.yarn.user.classpath...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8959#issuecomment-145623218 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8654][SQL] Fix Analysis exception when ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/8983#discussion_r41183045 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala --- @@ -135,4 +135,11 @@ class AnalysisSuite extends AnalysisTest { plan = testRelation.select(CreateStructUnsafe(Seq(a, (a + 1).as("a+1"))).as("col")) checkAnalysis(plan, plan) } + + test("SPARK-8654: invalid CAST in NULL IN(...) expression") { +val plan = Project(Alias(In(Literal(null), Seq(Literal(1), Literal(2))), "a")() :: Nil, + LocalRelation() +) +assertAnalysisSuccess(plan, false) --- End diff -- why change the default value of `caseSensitive`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10841][SQL] Add pushdown support of UDF...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/8922#issuecomment-145632836 "daily sql query" is not sufficiently descriptive. Please post actual benchmark results with code when making pull requests that claim to improve performance. It would also be good to evaluate the cost in degenerate cases. For example, I think you are adding an object allocation per input tuple when boxing for any queries that filter by UDF in parquet. Are you slowing down cases where the filter is not selective? If we want to improve the set of things that we push down, I don't think specializing for just UDFs in comparison operations is worth it given how much you are widening the API. Could we just have a single Function filter: ```scala case class FilterFunction(attribute: String, function: Any => Boolean) ``` or maybe some specialized variants: ```scala case class IntegerFilter(attribute: String, Int => Boolean) ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10337][SQL] fix hive views on non-hive-...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8990#discussion_r41225536 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1248,4 +1248,12 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { """.stripMargin), Row("b", 6.0) :: Row("a", 7.0) :: Nil) } } + + test("SPARK-10337: correctly handle hive views") { +withSQLConf("spark.sql.hive.nonNativeView" -> "true") { + sqlContext.range(1, 10).write.format("json").saveAsTable("jt") + sql("CREATE VIEW testView AS SELECT id FROM jt") + checkAnswer(sql("SELECT * FROM testView ORDER BY id"), (1 to 9).map(i => Row(i))) +} + } --- End diff -- Also, do we need to design more cases to test when this flag is true? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8848] [SQL] Refactors Parquet write pat...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8988#issuecomment-145745724 [Test build #43272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43272/consoleFull) for PR 8988 at commit [`6fd20f7`](https://github.com/apache/spark/commit/6fd20f70baa535b1772c1c30a3f651ea673560f2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10885][Streaming]Display the failed out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8950#issuecomment-145738954 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10337][SQL] fix hive views on non-hive-...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8990#discussion_r41225160 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -563,6 +580,77 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C } } + case view @ Token("TOK_CREATEVIEW", children) +if children.collect { case t @ Token("TOK_QUERY", _) => t }.nonEmpty => + + val Seq( +Some(viewNameParts), +Some(query), +maybeComment, +allowExisting, +maybeProperties, +maybeColumns, +maybePartCols + ) = getClauses( +Seq( + "TOK_TABNAME", + "TOK_QUERY", + "TOK_TABLECOMMENT", + "TOK_IFNOTEXISTS", + "TOK_TABLEPROPERTIES", + "TOK_TABCOLNAME", + "TOK_VIEWPARTCOLS"), +children) + + // If the view is partitioned, we let hive handle it. + if (maybePartCols.isDefined) { +NativePlaceholder + } else { +val (db, viewName) = extractDbNameTableName(viewNameParts) + +val originalText = context.getTokenRewriteStream + .toString(query.getTokenStartIndex, query.getTokenStopIndex) + +val schema = maybeColumns.map { cols => + BaseSemanticAnalyzer.getColumns(cols, true).asScala.map { field => +HiveColumn(field.getName, field.getType, field.getComment) + } +}.getOrElse(Seq.empty[HiveColumn]) + +val properties = scala.collection.mutable.Map.empty[String, String] + +maybeProperties.foreach { + case Token("TOK_TABLEPROPERTIES", list :: Nil) => +properties ++= getProperties(list) +} + +maybeComment.foreach { + case Token("TOK_TABLECOMMENT", child :: Nil) => +val comment = BaseSemanticAnalyzer.unescapeSQLString(child.getText) +if (comment ne null) { + properties += ("comment" -> comment) +} +} + +val tableDesc = HiveTable( + specifiedDatabase = db, + name = viewName, + schema = schema, + partitionColumns = Seq.empty[HiveColumn], + properties = properties.toMap, + serdeProperties = Map[String, String](), + tableType = VirtualView, + location = None, + inputFormat = None, + outputFormat = None, + serde = None, + viewText = Some(originalText)) + +val sql = context.getTokenRewriteStream + .toString(view.getTokenStartIndex, view.getTokenStopIndex) --- End diff -- Let's add a comment at here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10941] [SQL] Refactor AggregateFunction...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8973#issuecomment-145743801 [Test build #43271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43271/consoleFull) for PR 8973 at commit [`e34e22e`](https://github.com/apache/spark/commit/e34e22ef25dabcf5ee03fd51631a3d8f1a227070). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10885][Streaming]Display the failed out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8950#issuecomment-145738897 [Test build #43270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43270/console) for PR 8950 at commit [`ca68ac8`](https://github.com/apache/spark/commit/ca68ac858462fff107d8a5ce7a5af3cf9416aca3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10885][Streaming]Display the failed out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8950#issuecomment-145738955 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43270/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10337][SQL] fix hive views on non-hive-...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8990#discussion_r41225256 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -563,6 +580,77 @@ https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C } } + case view @ Token("TOK_CREATEVIEW", children) +if children.collect { case t @ Token("TOK_QUERY", _) => t }.nonEmpty => + + val Seq( +Some(viewNameParts), +Some(query), +maybeComment, +allowExisting, +maybeProperties, +maybeColumns, +maybePartCols + ) = getClauses( +Seq( + "TOK_TABNAME", + "TOK_QUERY", + "TOK_TABLECOMMENT", + "TOK_IFNOTEXISTS", + "TOK_TABLEPROPERTIES", + "TOK_TABCOLNAME", + "TOK_VIEWPARTCOLS"), +children) + + // If the view is partitioned, we let hive handle it. + if (maybePartCols.isDefined) { +NativePlaceholder + } else { +val (db, viewName) = extractDbNameTableName(viewNameParts) + +val originalText = context.getTokenRewriteStream + .toString(query.getTokenStartIndex, query.getTokenStopIndex) + +val schema = maybeColumns.map { cols => + BaseSemanticAnalyzer.getColumns(cols, true).asScala.map { field => +HiveColumn(field.getName, field.getType, field.getComment) --- End diff -- Does hive allow column types specified in the create view command? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10938] [SQL] remove typeId in columnar ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8989#discussion_r41225242 --- Diff: project/MimaExcludes.scala --- @@ -42,7 +42,9 @@ object MimaExcludes { excludePackage("org.spark-project.jetty"), MimaBuild.excludeSparkPackage("unused"), // SQL execution is considered private. -excludePackage("org.apache.spark.sql.execution") +excludePackage("org.apache.spark.sql.execution"), +// SQL columnar is considered private. +excludePackage("org.apache.spark.sql.columnar") --- End diff -- Yup. It would be great to minimize the number of top packages we have. You can do it in a followup pr too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10932] [PROJECT INFRA] Port two minor c...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/8986#issuecomment-145740694 Ah I see - didn't know it was failing Jenkins as well. Code changes from rxin/spark-utils LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10941] [SQL] Refactor AggregateFunction...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8973#issuecomment-145743534 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10885][Streaming]Display the failed out...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/8950#issuecomment-145743591 This looks cool! Also I like that you used "details" for that. In the case of "Failed", unless the details is opened, there is not indication of failure. So it might be better to show "Failed due to error: $exceptionMessage", and the full stacktrace in the detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10941] [SQL] Refactor AggregateFunction...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8973#issuecomment-145743522 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8848] [SQL] Refactors Parquet write pat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8988#issuecomment-145745373 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8848] [SQL] Refactors Parquet write pat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8988#issuecomment-145745361 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10337][SQL] fix hive views on non-hive-...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/8990#discussion_r41225405 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala --- @@ -1248,4 +1248,12 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton { """.stripMargin), Row("b", 6.0) :: Row("a", 7.0) :: Nil) } } + + test("SPARK-10337: correctly handle hive views") { +withSQLConf("spark.sql.hive.nonNativeView" -> "true") { + sqlContext.range(1, 10).write.format("json").saveAsTable("jt") + sql("CREATE VIEW testView AS SELECT id FROM jt") + checkAnswer(sql("SELECT * FROM testView ORDER BY id"), (1 to 9).map(i => Row(i))) +} + } --- End diff -- Do we have view tests (to make sure we are good when this flag is false) in the hive compatibility suite? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10941] [SQL] Refactor AggregateFunction...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8973#issuecomment-145744272 [Test build #43271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43271/console) for PR 8973 at commit [`e34e22e`](https://github.com/apache/spark/commit/e34e22ef25dabcf5ee03fd51631a3d8f1a227070). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Average(child: Expression) extends ExpressionAggregateFunction ` * `case class Count(child: Expression) extends ExpressionAggregateFunction ` * `case class First(child: Expression) extends ExpressionAggregateFunction ` * `case class Last(child: Expression) extends ExpressionAggregateFunction ` * `case class Max(child: Expression) extends ExpressionAggregateFunction ` * `case class Min(child: Expression) extends ExpressionAggregateFunction ` * `abstract class StddevAgg(child: Expression) extends ExpressionAggregateFunction ` * `case class Sum(child: Expression) extends ExpressionAggregateFunction ` * `sealed abstract class AggregateFunction2 extends Expression with ImplicitCastInputTypes ` * `abstract class ImperativeAggregateFunction extends AggregateFunction2 ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10941] [SQL] Refactor AggregateFunction...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8973#issuecomment-145744274 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43271/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10941] [SQL] Refactor AggregateFunction...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8973#issuecomment-145744273 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10772][Streaming][Scala]: NullPointerEx...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/8881#discussion_r41189182 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/dstream/TransformedDStream.scala --- @@ -38,6 +39,11 @@ class TransformedDStream[U: ClassTag] ( override def compute(validTime: Time): Option[RDD[U]] = { val parentRDDs = parents.map(_.getOrCompute(validTime).orNull).toSeq -Some(transformFunc(parentRDDs, validTime)) +val transformedRDD = transformFunc(parentRDDs, validTime) +if (transformedRDD == null) { + throw new SparkException("Transform function may not return null. " + --- End diff -- I would say "must not return null". Saying "may not" is a little ambiguous. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10917] [SQL] improve performance of com...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8971#issuecomment-145662420 [Test build #1843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1843/console) for PR 8971 at commit [`59bb2f9`](https://github.com/apache/spark/commit/59bb2f969cef9079606e2918289ce33db3201db4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8747#issuecomment-145645748 [Test build #43243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43243/console) for PR 8747 at commit [`d7f941d`](https://github.com/apache/spark/commit/d7f941d4edc6e3165790f2546fc3e7f378f04250). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class BufferHolder ` * `public class UnsafeArrayWriter ` * `public class UnsafeRowWriter ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8654][SQL] Fix Analysis exception when ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/8983#discussion_r41188938 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -305,12 +305,17 @@ object HiveTypeCoercion { /** * Convert all expressions in in() list to the left operator type + * except when the left operator type is NullType. In case when left hand + * operator type is NullType create a Literal(Null). */ object InConversion extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan resolveExpressions { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e + case i @ In(a, b) if (a.dataType == NullType) => +Literal.create(null, BooleanType) --- End diff -- instead of just casting null to boolean, can we come up with a better idea according to the data types of `b`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8747#issuecomment-145645883 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/43243/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8747#issuecomment-145645874 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10863][SPARKR] Method coltypes() to get...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8984#issuecomment-145648518 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10772][Streaming][Scala]: NullPointerEx...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/8881#issuecomment-145650573 @jhu-chang Could you fix the style issue and one minor issue that I pointed out. Style issues: ``` [error] /home/jenkins/workspace/NewSparkPullRequestBuilder/streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala:213:0: Whitespace at end of line [error] /home/jenkins/workspace/NewSparkPullRequestBuilder/streaming/src/test/scala/org/apache/spark/streaming/BasicOperationsSuite.scala:221:10: Whitespace at end of line ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10585][SQL] only copy data once when ge...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8747 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user NarineK commented on the pull request: https://github.com/apache/spark/pull/8920#issuecomment-145657747 @sun-rui Thanks! Done the changes, please check it out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10913][SPARKR] attach() function suppor...
GitHub user adrian555 opened a pull request: https://github.com/apache/spark/pull/8985 [SPARK-10913][SPARKR] attach() function support You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian555/spark attach_and_with Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8985.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8985 commit 319905f7d9db44ce326dee9ff2aa16f6e350ab50 Author: adrian555Date: 2015-10-02T22:33:22Z attach() function support commit 0aa94a2f4b1fbab975d6061db111e01fb9ccce2c Author: adrian555 Date: 2015-10-03T05:46:07Z [SPARK-10913] [SPARKR] attach() function support --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10913][SPARKR] attach() function suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8985#issuecomment-145661230 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10669] [Docs] Link to each language's A...
GitHub user keypointt opened a pull request: https://github.com/apache/spark/pull/8977 [SPARK-10669] [Docs] Link to each language's API in codetabs in ML docs: spark.mllib In the Markdown docs for the spark.mllib Programming Guide, we have code examples with codetabs for each language. We should link to each language's API docs within the corresponding codetab, but we are inconsistent about this. For an example of what we want to do, see the "ChiSqSelector" section in https://github.com/apache/spark/blob/64743870f23bffb8d96dcc8a0181c1452782a151/docs/mllib-feature-extraction.md This JIRA is just for spark.mllib, not spark.ml. Please let me know if more work is needed, thanks a lot. You can merge this pull request into a Git repository by running: $ git pull https://github.com/keypointt/spark SPARK-10669 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8977.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8977 commit f8289891d5b32fffdc6a4ce077d8d206e015119f Author: Xin RenDate: 2015-10-02T07:00:36Z [SPARK-10669] test modify commit 67c67158c93a4a8c5a963ecf821f5e85e1228bf3 Author: Xin Ren Date: 2015-10-04T16:54:35Z [SPARK-10669] Link to each language API in codetabs in spark mllib docs commit 31960f6790a75fd037ffd879a8d17e546c5fa6fa Author: Xin Ren Date: 2015-10-04T16:59:56Z [SPARK-10669] minor correction commit 82528d238a52a07c9e47c86050abb471775d2b20 Author: Xin Ren Date: 2015-10-05T05:37:45Z [SPARK-10669] re-commit, Link to each language API in codetabs in spark mllib docs commit ce92c03a3ea688e35560c7411fecf56971138c2b Author: Xin Ren Date: 2015-10-05T05:48:10Z [SPARK-10669] add up API links commit 5fa2ef77aa1d04c4d2e210062bafd9bca0b48bd9 Author: Xin Ren Date: 2015-10-05T06:27:39Z [SPARK-10669] undo previous wrong changes commit c75d79da41edae8c08b838bbe44a96fb88bc5f71 Author: Xin Ren Date: 2015-10-05T06:32:02Z [SPARK-10669] undo previous wrong changes for mllib-ensembles.md commit 5beb6ddc5eb6e90c531e326335a797205eb6a505 Author: Xin Ren Date: 2015-10-05T06:34:06Z [SPARK-10669] undo previous wrong changes for mllib-frequent-pattern-mining.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10669] [Docs] Link to each language's A...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8977#issuecomment-145446371 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10856][SQL] Mapping TimestampType to DA...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/8978 [SPARK-10856][SQL] Mapping TimestampType to DATETIME for SQL Server jdbc dialect JIRA: https://issues.apache.org/jira/browse/SPARK-10856 For Microsoft SQL Server, TimestampType should be mapped to DATETIME instead of TIMESTAMP. Related information for the datatype mapping: https://msdn.microsoft.com/en-us/library/ms378878(v=sql.110).aspx You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 mysql-jdbc-timestamp Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8978.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8978 commit 4e82ef152dca875b4e1523052ba130a6f6853d75 Author: Liang-Chi HsiehDate: 2015-10-05T07:41:47Z For Microsoft SQL Server, TimestampType should be mapped to DATETIME instead of TIMESTAMP. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10856][SQL] Mapping TimestampType to DA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8978#issuecomment-145454237 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8920#discussion_r41112866 --- Diff: R/pkg/R/DataFrame.R --- @@ -1304,24 +1306,62 @@ setClassUnion("characterOrColumn", c("character", "Column")) #' path <- "path/to/file.json" #' df <- jsonFile(sqlContext, path) #' arrange(df, df$col1) -#' arrange(df, "col1") #' arrange(df, asc(df$col1), desc(abs(df$col2))) +#' arrange(df, "col1") +#' arrange(df, "col2", FALSE) --- End diff -- remove this line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8920#discussion_r41112848 --- Diff: R/pkg/R/DataFrame.R --- @@ -1304,24 +1306,62 @@ setClassUnion("characterOrColumn", c("character", "Column")) #' path <- "path/to/file.json" #' df <- jsonFile(sqlContext, path) #' arrange(df, df$col1) -#' arrange(df, "col1") #' arrange(df, asc(df$col1), desc(abs(df$col2))) +#' arrange(df, "col1") +#' arrange(df, "col2", FALSE) +#' arrange(df, "col1", decreasing=TRUE) +#' arrange(df, "col1", "col2", c(TRUE, FALSE)) #' } setMethod("arrange", - signature(x = "DataFrame", col = "characterOrColumn"), + signature(x = "DataFrame", col="Column"), function(x, col, ...) { -if (class(col) == "character") { - sdf <- callJMethod(x@sdf, "sort", col, list(...)) -} else if (class(col) == "Column") { jcols <- lapply(list(col, ...), function(c) { c@jc }) - sdf <- callJMethod(x@sdf, "sort", jcols) -} + +sdf <- callJMethod(x@sdf, "sort", jcols) dataFrame(sdf) }) #' @rdname arrange +#' @export +setMethod("arrange", + signature(x = "DataFrame", col="character"), + function(x, col, ..., decreasing=FALSE) { + +# all sorting columns +by <- list(col, ...) + +# extracting the last element and uses it as decreasing if it is boolean --- End diff -- remove this block of code. Sorting direction must be specified by "decreasing = ". No need to check if last element in "..." is decreasing or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8920#discussion_r41112869 --- Diff: R/pkg/R/DataFrame.R --- @@ -1304,24 +1306,62 @@ setClassUnion("characterOrColumn", c("character", "Column")) #' path <- "path/to/file.json" #' df <- jsonFile(sqlContext, path) #' arrange(df, df$col1) -#' arrange(df, "col1") #' arrange(df, asc(df$col1), desc(abs(df$col2))) +#' arrange(df, "col1") +#' arrange(df, "col2", FALSE) +#' arrange(df, "col1", decreasing=TRUE) --- End diff -- coding style. decreasing = TRUE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10836] [SparkR] Added sort(x, decreasin...
Github user sun-rui commented on a diff in the pull request: https://github.com/apache/spark/pull/8920#discussion_r41112888 --- Diff: R/pkg/R/DataFrame.R --- @@ -1304,24 +1306,62 @@ setClassUnion("characterOrColumn", c("character", "Column")) #' path <- "path/to/file.json" #' df <- jsonFile(sqlContext, path) #' arrange(df, df$col1) -#' arrange(df, "col1") #' arrange(df, asc(df$col1), desc(abs(df$col2))) +#' arrange(df, "col1") +#' arrange(df, "col2", FALSE) +#' arrange(df, "col1", decreasing=TRUE) +#' arrange(df, "col1", "col2", c(TRUE, FALSE)) #' } setMethod("arrange", - signature(x = "DataFrame", col = "characterOrColumn"), + signature(x = "DataFrame", col="Column"), --- End diff -- coding style: col = "Column" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org