[GitHub] spark pull request: [SPARK-8477][sql][pyspark] Add in operator to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6908#issuecomment-113626980 [Test build #35314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35314/consoleFull) for PR 6908 at commit [`be795e0`](https://github.com/apache/spark/commit/be795e0c4112b5e30e3387e6d1fc98b7df26c81f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113629316 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5748#issuecomment-113629287 [Test build #35309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35309/console) for PR 5748 at commit [`fa04313`](https://github.com/apache/spark/commit/fa043131902fd5633a2ecaf5651b3414bd728669). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFIX] Fix scala style in DFSReadWriteTest t...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6907#issuecomment-113632888 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113634569 [Test build #35320 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35320/consoleFull) for PR 6888 at commit [`bdef29c`](https://github.com/apache/spark/commit/bdef29c4327245e33e3a6f8b6e9402dbc2ac9e4d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6876#issuecomment-113637840 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6876#issuecomment-113637801 [Test build #35307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35307/console) for PR 6876 at commit [`a0626ed`](https://github.com/apache/spark/commit/a0626edbf758c89a45a8c85285057e79ec6a2bce). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6876#issuecomment-113643422 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6876#issuecomment-113643472 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32872490 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with BeforeAndAfterAll { val received = rdd.map(_._2).collect.toSet assert(received === messages) + +// size-related method optimizations return sane results +assert(rdd.count === messages.size) +assert(rdd.countApprox(0).getFinalValue.mean === messages.size) +assert(! rdd.isEmpty) +assert(rdd.take(1).size === 1) +assert(messages(rdd.take(1).head._2)) --- End diff -- It's asserting that item taken from the rdd is a member of the set of messages sent On Fri, Jun 19, 2015 at 4:07 PM, Tathagata Das notificati...@github.com wrote: In external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala https://github.com/apache/spark/pull/6632#discussion_r32869380: @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with BeforeAndAfterAll { val received = rdd.map(_._2).collect.toSet assert(received === messages) + +// size-related method optimizations return sane results +assert(rdd.count === messages.size) +assert(rdd.countApprox(0).getFinalValue.mean === messages.size) +assert(! rdd.isEmpty) +assert(rdd.take(1).size === 1) +assert(messages(rdd.take(1).head._2)) What does this check? Shouldnt it check that rdd.take(1) === the // whatever is expected â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/6632/files#r32869380. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] Add regression test for SPARK-847...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113654119 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113656046 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113656013 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113656030 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6829#issuecomment-113656100 I see. So the Kafka is present only through the flume-kafka-source http://mvnrepository.com/artifact/org.apache.flume.flume-ng-sources/flume-kafka-source/1.6.0 Furthermore this is not available for Flume 1.4.0 as kafka source was added only in 1.6.0 So here are two questions 1. Do installations of Flume always have all the sources loaded? If not, then its an incorrect assumption that Scala will always be present. 2. Even if 1 is true, we have to upgrade Flume in Spark Streaming to version 1.6.0 for this to be feasible. That;s a whole different issue. I dont know enough about Flume, but I will be very surprised if the kafka source is always loaded in the classpath in all flume installations. @harishreedharan please comment. On Fri, Jun 19, 2015 at 2:50 PM, Sean Owen notificati...@github.com wrote: That looks like just the API module. I suspect it comes via the actual implementation such as in http://mvnrepository.com/artifact/org.apache.flume/flume-ng-sources/1.6.0 but I don't know Flume well. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/6829#issuecomment-113653289. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113661246 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113661226 [Test build #35319 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35319/console) for PR 6888 at commit [`1f09adf`](https://github.com/apache/spark/commit/1f09adf7622590becf096ca798066bec3ad03f50). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5482][PySpark] Allow individual test su...
Github user potix2 commented on the pull request: https://github.com/apache/spark/pull/4269#issuecomment-113661417 Sorry to confuse you, I agree with you. As a first step, we should rewrite run-tests in Python, then append new features. I took a look at #6866, I think it has some useful functions to rewrite bash code into Python. If you don't mind, I want to wait to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113673700 [Test build #941 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/941/consoleFull) for PR 6759 at commit [`8e2d56f`](https://github.com/apache/spark/commit/8e2d56fffc0560f0e9b915a705d92d70ae4676e9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113675079 Thanks! I am merging it to master and branch 1.4. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4644 blockjoin
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6883#issuecomment-113629566 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8468][ML] Take the negative of some met...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6905#issuecomment-113629487 [Test build #35311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35311/console) for PR 6905 at commit [`16e3b2c`](https://github.com/apache/spark/commit/16e3b2cbe4f0027a66e0cc68622b53ae503c2a37). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8468][ML] Take the negative of some met...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6905#issuecomment-113629511 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113632909 [Test build #35319 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35319/consoleFull) for PR 6888 at commit [`1f09adf`](https://github.com/apache/spark/commit/1f09adf7622590becf096ca798066bec3ad03f50). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113642553 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113642670 [Test build #35321 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35321/consoleFull) for PR 6759 at commit [`4891efb`](https://github.com/apache/spark/commit/4891efbb6b5f277082c06ea56400c83bc4678f35). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113642575 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8093] [SQL] Remove empty structs inferr...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6799#issuecomment-113645291 [Test build #940 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/940/consoleFull) for PR 6799 at commit [`76ac3e8`](https://github.com/apache/spark/commit/76ac3e865d2354ec85417149dea87b83d90ec261). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8359][SQL] Fix incorrect decimal precis...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6814#discussion_r32869361 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala --- @@ -162,4 +162,9 @@ class DecimalSuite extends SparkFunSuite with PrivateMethodTester { assert(new Decimal().set(100L, 10, 0).toUnscaledLong === 100L) assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue) } + + test(accurate precision after multiplication) { +val decimal = (Decimal(Long.MaxValue, 100, 0) * Decimal(Long.MaxValue, 100, 0)).toJavaBigDecimal --- End diff -- We can use 38 in this test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32869380 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with BeforeAndAfterAll { val received = rdd.map(_._2).collect.toSet assert(received === messages) + +// size-related method optimizations return sane results +assert(rdd.count === messages.size) +assert(rdd.countApprox(0).getFinalValue.mean === messages.size) +assert(! rdd.isEmpty) +assert(rdd.take(1).size === 1) +assert(messages(rdd.take(1).head._2)) --- End diff -- What does this check? Shouldnt it check that `rdd.take(1) === the // whatever is expected` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32869403 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with BeforeAndAfterAll { val received = rdd.map(_._2).collect.toSet assert(received === messages) + +// size-related method optimizations return sane results +assert(rdd.count === messages.size) +assert(rdd.countApprox(0).getFinalValue.mean === messages.size) +assert(! rdd.isEmpty) --- End diff -- There is not check whether isEmpty is successful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6759#discussion_r32871422 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala --- @@ -360,7 +367,7 @@ private[parquet] class MutableRowWriteSupport extends RowWriteSupport { case FloatType = writer.addFloat(record.getFloat(index)) case BooleanType = writer.addBoolean(record.getBoolean(index)) case DateType = writer.addInteger(record.getInt(index)) - case TimestampType = writeTimestamp(record(index).asInstanceOf[Long]) + case TimestampType = writeTimestamp(record.getLong(index)) --- End diff -- Nice catch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6759#discussion_r32871383 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala --- @@ -313,10 +314,16 @@ private[parquet] class RowWriteSupport extends WriteSupport[InternalRow] with Lo writer.addBinary(Binary.fromByteArray(scratchBytes, 0, numBytes)) } + // array used to write Timestamp as Int96 (fixed-length binary) + private val int96buf = new Array[Byte](12) + private[parquet] def writeTimestamp(ts: Long): Unit = { -val binaryNanoTime = CatalystTimestampConverter.convertFromTimestamp( - DateUtils.toJavaTimestamp(ts)) -writer.addBinary(binaryNanoTime) +val (julianDay, timeOfDayNanos) = DateTimeUtils.toJulianDay(ts) +val buf = ByteBuffer.wrap(int96buf) --- End diff -- Actually, do you know if there are any static methods that we could call that would just write put the longs and ints directly into the byte array at given offsets? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/6889#issuecomment-113650123 LGTM, waiting for tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8477][sql][pyspark] Add in operator to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6908#issuecomment-113651550 [Test build #35314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35314/console) for PR 6908 at commit [`be795e0`](https://github.com/apache/spark/commit/be795e0c4112b5e30e3387e6d1fc98b7df26c81f). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6889#issuecomment-113656892 [Test build #35316 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35316/console) for PR 6889 at commit [`9ce9f1e`](https://github.com/apache/spark/commit/9ce9f1ea0fd19209fd543a0650a20b46901d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-8398 hadoop input/output format advanced...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6848#issuecomment-113657049 [Test build #35330 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35330/consoleFull) for PR 6848 at commit [`df2c2ae`](https://github.com/apache/spark/commit/df2c2ae2fe88c4532dd680290d7d91e43a8b4f9b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6889#issuecomment-113656998 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113657139 [Test build #35331 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35331/consoleFull) for PR 6632 at commit [`321340d`](https://github.com/apache/spark/commit/321340d6e88bd424d62c1417d2f2a2111e7ac986). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8483][Streaming] Remove commons-lang3 d...
GitHub user harishreedharan opened a pull request: https://github.com/apache/spark/pull/6910 [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Si⦠â¦nk. Also bump Flume version to 1.6.0 You can merge this pull request into a Git repository by running: $ git pull https://github.com/harishreedharan/spark remove-commons-lang3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6910.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6910 commit ca35eb085a71a44e8e7e36d0e6a96b951727f0a1 Author: Hari Shreedharan hshreedha...@apache.org Date: 2015-06-19T22:42:40Z [SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. Also bump Flume version to 1.6.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6889 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8483][Streaming] Remove commons-lang3 d...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6910#discussion_r32876062 --- Diff: external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala --- @@ -53,7 +53,7 @@ private[flume] class SparkAvroCallbackHandler(val threads: Int, val channel: Cha // Since the new txn may not have the same sequence number we must guard against accidentally // committing a new transaction. To reduce the probability of that happening a random string is // prepended to the sequence number. Does not change for life of sink - private val seqBase = RandomStringUtils.randomAlphanumeric(8) + private val seqBase = UUID.randomUUID().toString.substring(0, 8) --- End diff -- Why not just use the Scala random string functionality instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113674331 [Test build #35328 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35328/console) for PR 6909 at commit [`5e9d688`](https://github.com/apache/spark/commit/5e9d68840ecd2441f1accca00c125d31fb1dbde9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `class StreamingKMeansModel(KMeansModel):` * `class StreamingKMeans(object):` * `abstract class GeneratedClass ` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113674309 [Test build #35327 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35327/console) for PR 6909 at commit [`7ede573`](https://github.com/apache/spark/commit/7ede57317ff331b72d0de2449ceaf81defdd4ce6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `class StreamingKMeansModel(KMeansModel):` * `class StreamingKMeans(object):` * `abstract class GeneratedClass ` * `case class Bin(child: Expression)` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113674327 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6912#issuecomment-113677840 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6912#issuecomment-113677726 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113676707 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6888 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8477][sql][pyspark] Add in operator to ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6908#discussion_r32864207 --- Diff: python/pyspark/sql/column.py --- @@ -326,6 +326,27 @@ def between(self, lowerBound, upperBound): return (self = lowerBound) (self = upperBound) +@since(1.5) +def In(self, *values): + +A boolean expression that is evaluated to true if the value of this +expression is any of the given columns. +NOTE: Normally, we shold name this function the small case `in`. However, `in` is +a reserved word in Python. So we can't help naming this the upper case `In`. + + df.select(df.name, df.age, df.age.In(2, 4)).show() ++-+---+-+ +| name|age|(age = 2)| ++-+---+-+ +|Alice| 2| true| +| Bob| 5|false| ++-+---+-+ + +for v in values: --- End diff -- This approach will not scale if you have many values, please call the java API `in`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4176] Support decimal types with precis...
Github user rtreffer commented on the pull request: https://github.com/apache/spark/pull/6796#issuecomment-113629961 I've pushed the hive generated parquet file and I'll call it a day. I think I'll have to relax the validation of column types for DECIMAL. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113632149 @yhuai updated to avoid changing equality behavior. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113635239 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8368] [SPARK-8058] [SQL] HiveContext ma...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/6895#issuecomment-113638271 Closing it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8368] [SPARK-8058] [SQL] HiveContext ma...
Github user yhuai closed the pull request at: https://github.com/apache/spark/pull/6895 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6901#discussion_r32867731 --- Diff: docs/programming-guide.md --- @@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does not fit in memory Spark will s to disk, incurring the additional overhead of disk I/O and increased garbage collection. Shuffle also generates a large number of intermediate files on disk. As of Spark 1.3, these files --- End diff -- Oh! I thought you meant it as the latter ... as of the latest version. This is a little confusing. :/ May be it makes sense to remove it completely. The GC based behavior is present for 4 versions now, since Spark 1.0, and its not gonna change in foreseeable future. So its best to remove it. The only things that may change in Spark 1.5 that we induce GC periodically ourselves. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113645504 Just a couple of more comments on the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5037][STREAMING] dynamically loaded DSt...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3858#issuecomment-113648094 That gives me some context. We are adding more stuff to the python API for parity with Scala and Java. We have added full Kafka Python API, Flume is being added, and hopefully Kinesis will also be added. The technique by which they have been added is similar to your approach at high level, but not quite the same. Take a look at the python KafkaUtils in the current code. Since we want to maintain consistent in design, I am happy to take a look if you can update the current PR to use the existing style. On Fri, Jun 19, 2015 at 1:22 PM, industrial-sloth notificati...@github.com wrote: Sure thing @tdas https://github.com/tdas. First a caveat: I haven't been keeping up with the spark community since ~March 2015, so the issues I originally hit might no longer exist w/ more recent spark releases. As of December 2014 we were exploring streaming options for real time analysis in Thunder (https://github.com/thunder-project). Thunder is pyspark based; at that time our pyspark dstream options, as I recall, were basically either file-based (watch a directory for new files) or to integrate with Kafka. Specifically there was no option to listen to a ZeroMQ stream or to many of the other dstream types available in the scala API. We wanted to be pushing a high-bandwidth stream of microscope images over to pyspark for further analysis. ZeroMQ seemed ideal; Kafka seemed like too much and file-based seemed to necessitate an additional unnecessary disk IO. So I put together a ZMQ solution for pyspark streaming and threw it out there in this PR. Again, haven't been keeping up, not sure whether this is still a concern w/ current releases of pyspark. I agree this is potentially an unusual use case - our workaround at the time was to go to the file-based dstream implementation, which was functional but perhaps not optimal. Any further comment on this @freeman-lab https://github.com/freeman-lab or @andrewosh https://github.com/andrewosh? â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3858#issuecomment-113631381. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] Add regression test for SPARK-847...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113652053 @marmbrus @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6759#discussion_r32872239 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -21,19 +21,31 @@ import java.sql.Timestamp import org.apache.spark.SparkFunSuite -class DateUtilsSuite extends SparkFunSuite { +class DateTimeUtilsSuite extends SparkFunSuite { - test(timestamp) { + test(timestamp and 100ns) { val now = new Timestamp(System.currentTimeMillis()) now.setNanos(100) -val ns = DateUtils.fromJavaTimestamp(now) +val ns = DateTimeUtils.fromJavaTimestamp(now) assert(ns % 1000L == 1) -assert(DateUtils.toJavaTimestamp(ns) == now) +assert(DateTimeUtils.toJavaTimestamp(ns) == now) List(-L, -1L, 0, 1L, L).foreach { t = - val ts = DateUtils.toJavaTimestamp(t) - assert(DateUtils.fromJavaTimestamp(ts) == t) - assert(DateUtils.toJavaTimestamp(DateUtils.fromJavaTimestamp(ts)) == ts) + val ts = DateTimeUtils.toJavaTimestamp(t) + assert(DateTimeUtils.fromJavaTimestamp(ts) == t) + assert(DateTimeUtils.toJavaTimestamp(DateTimeUtils.fromJavaTimestamp(ts)) == ts) } } + + test(100ns and julian day) { --- End diff -- Are there any other inputs that are worth testing here? It wouldn't be super hard to fuzz this using the invariant that some of these methods should be inverses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6829#issuecomment-113652186 I dont see a dependency on Kafka in Flume 1.4.0 http://mvnrepository.com/artifact/org.apache.flume/flume-ng-sdk/1.4.0 What am i missing? On Fri, Jun 19, 2015 at 2:21 PM, Hari Shreedharan notificati...@github.com wrote: Kafka brings in 2.10. â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/6829#issuecomment-113648154. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] Add regression test for SPARK-847...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113652204 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32872739 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with BeforeAndAfterAll { val received = rdd.map(_._2).collect.toSet assert(received === messages) + +// size-related method optimizations return sane results +assert(rdd.count === messages.size) +assert(rdd.countApprox(0).getFinalValue.mean === messages.size) +assert(! rdd.isEmpty) +assert(rdd.take(1).size === 1) +assert(messages(rdd.take(1).head._2)) --- End diff -- Shouldnt the test be stronger that it return the expected message from the right offset and not just any of the messages? Basically if there is a bug in the code where take(1) returns the last message in the offset range rather than the first message, it wont be caught. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6759#discussion_r32872749 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala --- @@ -21,19 +21,31 @@ import java.sql.Timestamp import org.apache.spark.SparkFunSuite -class DateUtilsSuite extends SparkFunSuite { +class DateTimeUtilsSuite extends SparkFunSuite { - test(timestamp) { + test(timestamp and 100ns) { val now = new Timestamp(System.currentTimeMillis()) now.setNanos(100) -val ns = DateUtils.fromJavaTimestamp(now) +val ns = DateTimeUtils.fromJavaTimestamp(now) assert(ns % 1000L == 1) -assert(DateUtils.toJavaTimestamp(ns) == now) +assert(DateTimeUtils.toJavaTimestamp(ns) == now) List(-L, -1L, 0, 1L, L).foreach { t = - val ts = DateUtils.toJavaTimestamp(t) - assert(DateUtils.fromJavaTimestamp(ts) == t) - assert(DateUtils.toJavaTimestamp(DateUtils.fromJavaTimestamp(ts)) == ts) + val ts = DateTimeUtils.toJavaTimestamp(t) + assert(DateTimeUtils.fromJavaTimestamp(ts) == t) + assert(DateTimeUtils.toJavaTimestamp(DateTimeUtils.fromJavaTimestamp(ts)) == ts) } } + + test(100ns and julian day) { +val (d, ns) = DateTimeUtils.toJulianDay(0) +assert(d == 2440587) --- End diff -- Could use `===` here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113656191 [Test build #35329 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35329/consoleFull) for PR 6632 at commit [`5a05d0f`](https://github.com/apache/spark/commit/5a05d0f633b66ffe42f8e7bb8f4e09308d79fa29). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6707#issuecomment-113656399 Thank you very much for this patch. This was a very important one, especially the tests. On Thu, Jun 18, 2015 at 8:02 PM, asfgit notificati...@github.com wrote: Closed #6707 https://github.com/apache/spark/pull/6707 via 3eaed87 https://github.com/apache/spark/commit/3eaed8769c16e887edb9d54f5816b4ee6da23de5 . â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/6707#event-334837731. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8483][Streaming] Remove commons-lang3 d...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6910#issuecomment-113663947 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113669127 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113669074 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5717#issuecomment-113670190 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5717#issuecomment-113670244 [Test build #35337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35337/consoleFull) for PR 5717 at commit [`211e101`](https://github.com/apache/spark/commit/211e1012dc28ed610d294d0678b1d5621a901e53). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5717#issuecomment-113670198 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8492] [SQL] support binaryType in Unsaf...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6911#issuecomment-113675867 [Test build #35339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35339/consoleFull) for PR 6911 at commit [`447dea0`](https://github.com/apache/spark/commit/447dea051b13da73e0b84e3de72fd16e6d466765). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...
Github user ericl commented on the pull request: https://github.com/apache/spark/pull/6912#issuecomment-113676454 @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113676455 [Test build #35329 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35329/console) for PR 6632 at commit [`5a05d0f`](https://github.com/apache/spark/commit/5a05d0f633b66ffe42f8e7bb8f4e09308d79fa29). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `abstract class GeneratedClass ` * `case class Bin(child: Expression)` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...
GitHub user ericl opened a pull request: https://github.com/apache/spark/pull/6912 [SPARK-6749] [SQL] Make metastore client robust to underlying socket connection loss This works around a bug in the underlying RetryingMetaStoreClient (HIVE-10384) by refreshing the metastore client on thrift exceptions. We attempt to emulate the proper hive behavior by retrying only as configured by hiveconf. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ericl/spark spark-6749 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/6912.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #6912 commit 7c8ee1691e76545dab5cfe9101e3ecf290117818 Author: Eric Liang e...@databricks.com Date: 2015-06-19T23:50:57Z Work around RetryingMetaStoreClient bug --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4644 blockjoin
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6883#issuecomment-113630110 [Test build #35315 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35315/consoleFull) for PR 6883 at commit [`adef52e`](https://github.com/apache/spark/commit/adef52ed4c335980e73c61036abb2a2806965de3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113630096 So the JIRA was about updating the examples actually. Its great that you have updated the docs AND the tests, but it would ideal if the examples DirectKafkaWordCount and JavaDirectKafkaWordCount are updated to show how the offset ranges can be accessed. Since you have updated the tests, mind updating the examples as well? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8481] [MLlib] GaussianMixtureModel pred...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6906#issuecomment-113632802 [Test build #35318 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35318/consoleFull) for PR 6906 at commit [`cb87180`](https://github.com/apache/spark/commit/cb87180516973caf772c95405a39b3f9bd627272). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFIX] Fix scala style in DFSReadWriteTest t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6907#issuecomment-113632842 [Test build #35304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35304/console) for PR 6907 at commit [`c53f188`](https://github.com/apache/spark/commit/c53f1883409648723ec543a06b4e0efde0b5ba0e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8186] [SPARK-8187] [SQL] datetime funct...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/6782#discussion_r32867754 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeFunctions.scala --- @@ -0,0 +1,176 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, CodeGenContext} +import org.apache.spark.sql.catalyst.util.DateUtils +import org.apache.spark.sql.types._ +import org.apache.spark.unsafe.types.UTF8String + +/** + * Adds a number of days to startdate: date_add('2008-12-31', 1) = '2009-01-01'. + */ +case class DateAdd(startDate: Expression, days: Expression) extends Expression { + override def children: Seq[Expression] = startDate :: days :: Nil + + override def foldable: Boolean = startDate.foldable days.foldable + override def nullable: Boolean = startDate.nullable || days.nullable + + override def checkInputDataTypes(): TypeCheckResult = { +val supportedLeftType = Seq(StringType, DateType, TimestampType, NullType) +if (!supportedLeftType.contains(startDate.dataType)) { + TypeCheckResult.TypeCheckFailure( +stype of startdate expression in DateAdd should be string/timestamp/date, + + s not ${startDate.dataType}) +} else if (days.dataType != IntegerType days.dataType != NullType) { + TypeCheckResult.TypeCheckFailure( +stype of days expression in DateAdd should be int, not ${days.dataType}.) +} else { + TypeCheckResult.TypeCheckSuccess +} + } + + override def dataType: DataType = StringType --- End diff -- In general though we cast back to a string whenever you need to. From an efficiency stand point it seems much better to keep it a date. /cc @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113644879 [Test build #35310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35310/console) for PR 6632 at commit [`f68bd32`](https://github.com/apache/spark/commit/f68bd3266df27fc8238195ac443c3e2cdb37803a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `abstract class GeneratedClass ` * `case class Bin(child: Expression)` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32869331 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with BeforeAndAfterAll { val received = rdd.map(_._2).collect.toSet assert(received === messages) + +// size-related method optimizations return sane results +assert(rdd.count === messages.size) +assert(rdd.countApprox(0).getFinalValue.mean === messages.size) +assert(! rdd.isEmpty) --- End diff -- nit: extra space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113644969 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8359][SQL] Fix incorrect decimal precis...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/6814#discussion_r32869329 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala --- @@ -286,6 +288,9 @@ object Decimal { /** Maximum number of decimal digits a Long can represent */ val MAX_LONG_DIGITS = 18 + /** Maximum precision a Decimal can support */ + val MAX_PRECISION = 38 --- End diff -- Having a short discussion with @marmbrus , we won't going to have fixed maximum precision in short term, will still support higher even unlimited precision. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113647862 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113647849 [Test build #35321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35321/console) for PR 6759 at commit [`4891efb`](https://github.com/apache/spark/commit/4891efbb6b5f277082c06ea56400c83bc4678f35). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8265] [MLlib] [PySpark] Add LinearDataG...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6715#issuecomment-113648863 [Test build #35312 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35312/console) for PR 6715 at commit [`6182884`](https://github.com/apache/spark/commit/618288411ff36fee254f4304acf7137018c01ec3). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` // class ParentClass(parentField: Int)` * ` // class ChildClass(childField: Int) extends ParentClass(1)` * ` // If the class type corresponding to current slot has writeObject() defined,` * ` // then its not obvious which fields of the class will be serialized as the writeObject()` * `class StreamingKMeansModel(KMeansModel):` * `class StreamingKMeans(object):` * `class LinearDataGenerator(object):` * `abstract class GeneratedClass ` * `case class Md5(child: Expression)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8265] [MLlib] [PySpark] Add LinearDataG...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6715#issuecomment-113648895 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Expose regionName setting in Kinesis receiver ...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/5375#issuecomment-113648887 This is not needed any more as Spark 1.4.0 has fixed this issue. Mind closing this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6829#issuecomment-113653289 That looks like just the API module. I suspect it comes via the actual implementation such as in http://mvnrepository.com/artifact/org.apache.flume/flume-ng-sources/1.6.0 but I don't know Flume well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/6759#discussion_r32872412 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala --- @@ -313,10 +314,16 @@ private[parquet] class RowWriteSupport extends WriteSupport[InternalRow] with Lo writer.addBinary(Binary.fromByteArray(scratchBytes, 0, numBytes)) } + // array used to write Timestamp as Int96 (fixed-length binary) + private val int96buf = new Array[Byte](12) + private[parquet] def writeTimestamp(ts: Long): Unit = { -val binaryNanoTime = CatalystTimestampConverter.convertFromTimestamp( - DateUtils.toJavaTimestamp(ts)) -writer.addBinary(binaryNanoTime) +val (julianDay, timeOfDayNanos) = DateTimeUtils.toJulianDay(ts) +val buf = ByteBuffer.wrap(int96buf) --- End diff -- Yeah, let's just leave this as-is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: spark-7300 remove temporary directories after ...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5834#discussion_r32873174 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -251,6 +251,11 @@ case class InsertIntoHiveTable( } } +//remove temporary directories +val fs = outputPath.getFileSystem(jobConf) +if ( outputPath.getParent.isRoot == false ) + fs.delete(outputPath.getParent,true) --- End diff -- also, please unindent these by 3 spaces --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/6829#issuecomment-113659788 Yes, all of the libs in the flume-ng/lib directory gets added to the classpath, so scala would get added to the classpath, but get loaded only as required (which is normal JVM protocol). We'd have to bump our dependency to 1.6.0 for scala to be automagically available. Even if we don't upgrade, we don't need to change the dependency set, as the behavior is the same as before (add scala to flume-ng/lib or plugins dir). Apart from the assembly part, nothing else changes. I am sending a PR soon to get rid of the commons-lang3 dependency anyway --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6759#issuecomment-113659586 [Test build #35332 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35332/consoleFull) for PR 6759 at commit [`634b9f5`](https://github.com/apache/spark/commit/634b9f5540b8045b24c20c7296d8cd73193c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113661550 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6888#issuecomment-113661524 [Test build #35320 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35320/console) for PR 6888 at commit [`bdef29c`](https://github.com/apache/spark/commit/bdef29c4327245e33e3a6f8b6e9402dbc2ac9e4d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6909#issuecomment-113668048 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6876#issuecomment-113670534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6876#issuecomment-113670505 [Test build #35322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35322/console) for PR 6876 at commit [`32d9811`](https://github.com/apache/spark/commit/32d981137fd24d1e55c3a4c2c23bb19e494b4f65). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org