date:20150619

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6908#issuecomment-113626980
  
  [Test build #35314 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35314/consoleFull)
 for   PR 6908 at commit 
[`be795e0`](https://github.com/apache/spark/commit/be795e0c4112b5e30e3387e6d1fc98b7df26c81f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5748#issuecomment-113629316
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7045] [MLlib] Avoid intermediate repres...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5748#issuecomment-113629287
  
  [Test build #35309 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35309/console)
 for   PR 5748 at commit 
[`fa04313`](https://github.com/apache/spark/commit/fa043131902fd5633a2ecaf5651b3414bd728669).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`
  * `case class Md5(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HotFIX] Fix scala style in DFSReadWriteTest t...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6907#issuecomment-113632888
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113634569
  
  [Test build #35320 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35320/consoleFull)
 for   PR 6888 at commit 
[`bdef29c`](https://github.com/apache/spark/commit/bdef29c4327245e33e3a6f8b6e9402dbc2ac9e4d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6876#issuecomment-113637840
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6876#issuecomment-113637801
  
  [Test build #35307 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35307/console)
 for   PR 6876 at commit 
[`a0626ed`](https://github.com/apache/spark/commit/a0626edbf758c89a45a8c85285057e79ec6a2bce).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6876#issuecomment-113643422
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6876#issuecomment-113643472
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger

Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/6632#discussion_r32872490
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
 ---
@@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 
 val received = rdd.map(_._2).collect.toSet
 assert(received === messages)
+
+// size-related method optimizations return sane results
+assert(rdd.count === messages.size)
+assert(rdd.countApprox(0).getFinalValue.mean === messages.size)
+assert(! rdd.isEmpty)
+assert(rdd.take(1).size === 1)
+assert(messages(rdd.take(1).head._2))
--- End diff --

It's asserting that item taken from the rdd is a member of the set of
messages sent

On Fri, Jun 19, 2015 at 4:07 PM, Tathagata Das notificati...@github.com
wrote:

 In
 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
 https://github.com/apache/spark/pull/6632#discussion_r32869380:

  @@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 
   val received = rdd.map(_._2).collect.toSet
   assert(received === messages)
  +
  +// size-related method optimizations return sane results
  +assert(rdd.count === messages.size)
  +assert(rdd.countApprox(0).getFinalValue.mean === messages.size)
  +assert(! rdd.isEmpty)
  +assert(rdd.take(1).size === 1)
  +assert(messages(rdd.take(1).head._2))

 What does this check? Shouldnt it check that rdd.take(1) === the //
 whatever is expected

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/6632/files#r32869380.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] Add regression test for SPARK-847...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113654119
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113656046
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113656013
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113656030
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/6829#issuecomment-113656100

I see. So the Kafka is present only through the flume-kafka-source

http://mvnrepository.com/artifact/org.apache.flume.flume-ng-sources/flume-kafka-source/1.6.0

Furthermore this is not available for Flume 1.4.0 as kafka source was added
only in 1.6.0

So here are two questions
1. Do installations of Flume always have all the sources loaded? If not,
then its an incorrect assumption that Scala will always be present.
2. Even if 1 is true, we have to upgrade Flume in Spark Streaming to
version 1.6.0 for this to be feasible. That;s a whole different issue.

I dont know enough about Flume, but I will be very surprised if the kafka
source is always loaded in the classpath in all flume installations.

@harishreedharan please comment.

On Fri, Jun 19, 2015 at 2:50 PM, Sean Owen notificati...@github.com wrote:

That looks like just the API module. I suspect it comes via the actual
implementation such as in
http://mvnrepository.com/artifact/org.apache.flume/flume-ng-sources/1.6.0
but I don't know Flume well.

â
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/6829#issuecomment-113653289.

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113661246
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113661226
  
  [Test build #35319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35319/console)
 for   PR 6888 at commit 
[`1f09adf`](https://github.com/apache/spark/commit/1f09adf7622590becf096ca798066bec3ad03f50).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5482][PySpark] Allow individual test su...

2015-06-19 Thread potix2

Github user potix2 commented on the pull request:

https://github.com/apache/spark/pull/4269#issuecomment-113661417
  
Sorry to confuse you, I agree with you. As a first step, we should rewrite 
run-tests in Python, then append new features. 

I took a look at #6866, I think it has some useful functions to rewrite 
bash code into Python. If you don't mind, I want to wait to merge it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113673700
  
  [Test build #941 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/941/consoleFull)
 for   PR 6759 at commit 
[`8e2d56f`](https://github.com/apache/spark/commit/8e2d56fffc0560f0e9b915a705d92d70ae4676e9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113675079
  
Thanks! I am merging it to master and branch 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4644 blockjoin

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6883#issuecomment-113629566
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8468][ML] Take the negative of some met...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6905#issuecomment-113629487
  
  [Test build #35311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35311/console)
 for   PR 6905 at commit 
[`16e3b2c`](https://github.com/apache/spark/commit/16e3b2cbe4f0027a66e0cc68622b53ae503c2a37).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8468][ML] Take the negative of some met...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6905#issuecomment-113629511
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113632909
  
  [Test build #35319 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35319/consoleFull)
 for   PR 6888 at commit 
[`1f09adf`](https://github.com/apache/spark/commit/1f09adf7622590becf096ca798066bec3ad03f50).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113642553
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113642670
  
  [Test build #35321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35321/consoleFull)
 for   PR 6759 at commit 
[`4891efb`](https://github.com/apache/spark/commit/4891efbb6b5f277082c06ea56400c83bc4678f35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113642575
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8093] [SQL] Remove empty structs inferr...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6799#issuecomment-113645291
  
  [Test build #940 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/940/consoleFull)
 for   PR 6799 at commit 
[`76ac3e8`](https://github.com/apache/spark/commit/76ac3e865d2354ec85417149dea87b83d90ec261).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8359][SQL] Fix incorrect decimal precis...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6814#discussion_r32869361
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala
 ---
@@ -162,4 +162,9 @@ class DecimalSuite extends SparkFunSuite with 
PrivateMethodTester {
 assert(new Decimal().set(100L, 10, 0).toUnscaledLong === 100L)
 assert(Decimal(Long.MaxValue, 100, 0).toUnscaledLong === Long.MaxValue)
   }
+
+  test(accurate precision after multiplication) {
+val decimal = (Decimal(Long.MaxValue, 100, 0) * Decimal(Long.MaxValue, 
100, 0)).toJavaBigDecimal
--- End diff --

We can use 38 in this test case


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6632#discussion_r32869380
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
 ---
@@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 
 val received = rdd.map(_._2).collect.toSet
 assert(received === messages)
+
+// size-related method optimizations return sane results
+assert(rdd.count === messages.size)
+assert(rdd.countApprox(0).getFinalValue.mean === messages.size)
+assert(! rdd.isEmpty)
+assert(rdd.take(1).size === 1)
+assert(messages(rdd.take(1).head._2))
--- End diff --

What does this check? Shouldnt it check that `rdd.take(1) === the  // 
whatever is expected` 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6632#discussion_r32869403
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
 ---
@@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 
 val received = rdd.map(_._2).collect.toSet
 assert(received === messages)
+
+// size-related method optimizations return sane results
+assert(rdd.count === messages.size)
+assert(rdd.countApprox(0).getFinalValue.mean === messages.size)
+assert(! rdd.isEmpty)
--- End diff --

There is not check whether isEmpty is successful.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6759#discussion_r32871422
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -360,7 +367,7 @@ private[parquet] class MutableRowWriteSupport extends 
RowWriteSupport {
   case FloatType = writer.addFloat(record.getFloat(index))
   case BooleanType = writer.addBoolean(record.getBoolean(index))
   case DateType = writer.addInteger(record.getInt(index))
-  case TimestampType = 
writeTimestamp(record(index).asInstanceOf[Long])
+  case TimestampType = writeTimestamp(record.getLong(index))
--- End diff --

Nice catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6759#discussion_r32871383
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -313,10 +314,16 @@ private[parquet] class RowWriteSupport extends 
WriteSupport[InternalRow] with Lo
 writer.addBinary(Binary.fromByteArray(scratchBytes, 0, numBytes))
   }
 
+  // array used to write Timestamp as Int96 (fixed-length binary)
+  private val int96buf = new Array[Byte](12)
+
   private[parquet] def writeTimestamp(ts: Long): Unit = {
-val binaryNanoTime = CatalystTimestampConverter.convertFromTimestamp(
-  DateUtils.toJavaTimestamp(ts))
-writer.addBinary(binaryNanoTime)
+val (julianDay, timeOfDayNanos) = DateTimeUtils.toJulianDay(ts)
+val buf = ByteBuffer.wrap(int96buf)
--- End diff --

Actually, do you know if there are any static methods that we could call 
that would just write put the longs and ints directly into the byte array at 
given offsets?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...

Github user davies commented on the pull request:

https://github.com/apache/spark/pull/6889#issuecomment-113650123
  
LGTM, waiting for tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8477][sql][pyspark] Add in operator to ...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6908#issuecomment-113651550
  
  [Test build #35314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35314/console)
 for   PR 6908 at commit 
[`be795e0`](https://github.com/apache/spark/commit/be795e0c4112b5e30e3387e6d1fc98b7df26c81f).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6889#issuecomment-113656892
  
  [Test build #35316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35316/console)
 for   PR 6889 at commit 
[`9ce9f1e`](https://github.com/apache/spark/commit/9ce9f1ea0fd19209fd543a0650a20b46901d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-8398 hadoop input/output format advanced...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6848#issuecomment-113657049
  
  [Test build #35330 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35330/consoleFull)
 for   PR 6848 at commit 
[`df2c2ae`](https://github.com/apache/spark/commit/df2c2ae2fe88c4532dd680290d7d91e43a8b4f9b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6889#issuecomment-113656998
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113657139
  
  [Test build #35331 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35331/consoleFull)
 for   PR 6632 at commit 
[`321340d`](https://github.com/apache/spark/commit/321340d6e88bd424d62c1417d2f2a2111e7ac986).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8483][Streaming] Remove commons-lang3 d...

2015-06-19 Thread harishreedharan

GitHub user harishreedharan opened a pull request:

https://github.com/apache/spark/pull/6910

[SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Siâ¦

â¦nk. Also bump Flume version to 1.6.0

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/harishreedharan/spark remove-commons-lang3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6910.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6910


commit ca35eb085a71a44e8e7e36d0e6a96b951727f0a1
Author: Hari Shreedharan hshreedha...@apache.org
Date:   2015-06-19T22:42:40Z

[SPARK-8483][Streaming] Remove commons-lang3 dependency from Flume Sink. 
Also bump Flume version to 1.6.0




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARKR][SPARK-8452] expose jobGroup API in Sp...

2015-06-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6889


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8483][Streaming] Remove commons-lang3 d...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6910#discussion_r32876062
  
--- Diff: 
external/flume-sink/src/main/scala/org/apache/spark/streaming/flume/sink/SparkAvroCallbackHandler.scala
 ---
@@ -53,7 +53,7 @@ private[flume] class SparkAvroCallbackHandler(val 
threads: Int, val channel: Cha
   // Since the new txn may not have the same sequence number we must guard 
against accidentally
   // committing a new transaction. To reduce the probability of that 
happening a random string is
   // prepended to the sequence number. Does not change for life of sink
-  private val seqBase = RandomStringUtils.randomAlphanumeric(8)
+  private val seqBase = UUID.randomUUID().toString.substring(0, 8)
--- End diff --

Why not just use the Scala random string functionality instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113674331
  
  [Test build #35328 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35328/console)
 for   PR 6909 at commit 
[`5e9d688`](https://github.com/apache/spark/commit/5e9d68840ecd2441f1accca00c125d31fb1dbde9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`
  * `class StreamingKMeansModel(KMeansModel):`
  * `class StreamingKMeans(object):`
  * `abstract class GeneratedClass `
  * `case class Md5(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113674309
  
  [Test build #35327 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35327/console)
 for   PR 6909 at commit 
[`7ede573`](https://github.com/apache/spark/commit/7ede57317ff331b72d0de2449ceaf81defdd4ce6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`
  * `class StreamingKMeansModel(KMeansModel):`
  * `class StreamingKMeans(object):`
  * `abstract class GeneratedClass `
  * `case class Bin(child: Expression)`
  * `case class Md5(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113674327
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6912#issuecomment-113677840
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6912#issuecomment-113677726
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113676707
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

2015-06-19 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/6888


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8477][sql][pyspark] Add in operator to ...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6908#discussion_r32864207
  
--- Diff: python/pyspark/sql/column.py ---
@@ -326,6 +326,27 @@ def between(self, lowerBound, upperBound):
 
 return (self = lowerBound)  (self = upperBound)
 
+@since(1.5)
+def In(self, *values):
+
+A boolean expression that is evaluated to true if the value of this
+expression is any of the given columns.
+NOTE: Normally, we shold name this function the small case `in`. 
However, `in` is
+a reserved word in Python. So we can't help naming this the upper 
case `In`.
+
+ df.select(df.name, df.age, df.age.In(2, 4)).show()
++-+---+-+
+| name|age|(age = 2)|
++-+---+-+
+|Alice|  2| true|
+|  Bob|  5|false|
++-+---+-+
+
+for v in values:
--- End diff --

This approach will not scale if you have many values, please call the java 
API  `in`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4176] Support decimal types with precis...

2015-06-19 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/6796#issuecomment-113629961
  
I've pushed the hive generated parquet file and I'll call it a day.

I think I'll have to relax the validation of column types for DECIMAL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

2015-06-19 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113632149
  
@yhuai updated to avoid changing equality behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113635239
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8368] [SPARK-8058] [SQL] HiveContext ma...

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/6895#issuecomment-113638271
  
Closing it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8368] [SPARK-8058] [SQL] HiveContext ma...

Github user yhuai closed the pull request at:

https://github.com/apache/spark/pull/6895


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5836] [DOCS] [STREAMING] Clarify what m...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6901#discussion_r32867731
  
--- Diff: docs/programming-guide.md ---
@@ -1144,9 +1144,11 @@ generate these on the reduce side. When data does 
not fit in memory Spark will s
 to disk, incurring the additional overhead of disk I/O and increased 
garbage collection.
 
 Shuffle also generates a large number of intermediate files on disk. As of 
Spark 1.3, these files
--- End diff --

Oh! I thought you meant it as the latter ... as of the latest version. 
This is a little confusing. :/
May be it makes sense to remove it completely. The GC based behavior is 
present for 4 versions now, since Spark 1.0, and its not gonna change in 
foreseeable future. So its best to remove it. The only things that may change 
in Spark 1.5 that we induce GC periodically ourselves. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113645504
  
Just a couple of more comments on the tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5037][STREAMING] dynamically loaded DSt...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3858#issuecomment-113648094
  
That gives me some context. We are adding more stuff to the python API for
parity with Scala and Java. We have added full Kafka Python API, Flume is
being added, and hopefully Kinesis will also be added. The technique by
which they have been added is similar to your approach at high level, but
not quite the same. Take a look at the python KafkaUtils in the current
code. Since we want to maintain consistent in design, I am happy to take a
look if you can update the current PR to use the existing style.

On Fri, Jun 19, 2015 at 1:22 PM, industrial-sloth notificati...@github.com
wrote:

 Sure thing @tdas https://github.com/tdas. First a caveat: I haven't
 been keeping up with the spark community since ~March 2015, so the issues 
I
 originally hit might no longer exist w/ more recent spark releases.

 As of December 2014 we were exploring streaming options for real time
 analysis in Thunder (https://github.com/thunder-project). Thunder is
 pyspark based; at that time our pyspark dstream options, as I recall, were
 basically either file-based (watch a directory for new files) or to
 integrate with Kafka. Specifically there was no option to listen to a
 ZeroMQ stream or to many of the other dstream types available in the scala
 API. We wanted to be pushing a high-bandwidth stream of microscope images
 over to pyspark for further analysis. ZeroMQ seemed ideal; Kafka seemed
 like too much and file-based seemed to necessitate an additional
 unnecessary disk IO. So I put together a ZMQ solution for pyspark 
streaming
 and threw it out there in this PR.

 Again, haven't been keeping up, not sure whether this is still a concern
 w/ current releases of pyspark. I agree this is potentially an unusual use
 case - our workaround at the time was to go to the file-based dstream
 implementation, which was functional but perhaps not optimal.

 Any further comment on this @freeman-lab https://github.com/freeman-lab
 or @andrewosh https://github.com/andrewosh?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/3858#issuecomment-113631381.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] Add regression test for SPARK-847...

2015-06-19 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113652053
  
@marmbrus @yhuai


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6759#discussion_r32872239
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -21,19 +21,31 @@ import java.sql.Timestamp
 
 import org.apache.spark.SparkFunSuite
 
-class DateUtilsSuite extends SparkFunSuite {
+class DateTimeUtilsSuite extends SparkFunSuite {
 
-  test(timestamp) {
+  test(timestamp and 100ns) {
 val now = new Timestamp(System.currentTimeMillis())
 now.setNanos(100)
-val ns = DateUtils.fromJavaTimestamp(now)
+val ns = DateTimeUtils.fromJavaTimestamp(now)
 assert(ns % 1000L == 1)
-assert(DateUtils.toJavaTimestamp(ns) == now)
+assert(DateTimeUtils.toJavaTimestamp(ns) == now)
 
 List(-L, -1L, 0, 1L, L).foreach { t =
-  val ts = DateUtils.toJavaTimestamp(t)
-  assert(DateUtils.fromJavaTimestamp(ts) == t)
-  assert(DateUtils.toJavaTimestamp(DateUtils.fromJavaTimestamp(ts)) == 
ts)
+  val ts = DateTimeUtils.toJavaTimestamp(t)
+  assert(DateTimeUtils.fromJavaTimestamp(ts) == t)
+  
assert(DateTimeUtils.toJavaTimestamp(DateTimeUtils.fromJavaTimestamp(ts)) == ts)
 }
   }
+
+  test(100ns and julian day) {
--- End diff --

Are there any other inputs that are worth testing here?  It wouldn't be 
super hard to fuzz this using the invariant that some of these methods should 
be inverses.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/6829#issuecomment-113652186
  
I dont see a dependency on Kafka in Flume 1.4.0
http://mvnrepository.com/artifact/org.apache.flume/flume-ng-sdk/1.4.0

What am i missing?

On Fri, Jun 19, 2015 at 2:21 PM, Hari Shreedharan notificati...@github.com
wrote:

 Kafka brings in 2.10.

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/6829#issuecomment-113648154.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] Add regression test for SPARK-847...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113652204
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6632#discussion_r32872739
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
 ---
@@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 
 val received = rdd.map(_._2).collect.toSet
 assert(received === messages)
+
+// size-related method optimizations return sane results
+assert(rdd.count === messages.size)
+assert(rdd.countApprox(0).getFinalValue.mean === messages.size)
+assert(! rdd.isEmpty)
+assert(rdd.take(1).size === 1)
+assert(messages(rdd.take(1).head._2))
--- End diff --

Shouldnt the test be stronger that it return the expected message from the 
right offset and not just any of the messages? Basically if there is a bug in 
the code where take(1) returns the last message in the offset range rather than 
the first message, it wont be caught. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6759#discussion_r32872749
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
 ---
@@ -21,19 +21,31 @@ import java.sql.Timestamp
 
 import org.apache.spark.SparkFunSuite
 
-class DateUtilsSuite extends SparkFunSuite {
+class DateTimeUtilsSuite extends SparkFunSuite {
 
-  test(timestamp) {
+  test(timestamp and 100ns) {
 val now = new Timestamp(System.currentTimeMillis())
 now.setNanos(100)
-val ns = DateUtils.fromJavaTimestamp(now)
+val ns = DateTimeUtils.fromJavaTimestamp(now)
 assert(ns % 1000L == 1)
-assert(DateUtils.toJavaTimestamp(ns) == now)
+assert(DateTimeUtils.toJavaTimestamp(ns) == now)
 
 List(-L, -1L, 0, 1L, L).foreach { t =
-  val ts = DateUtils.toJavaTimestamp(t)
-  assert(DateUtils.fromJavaTimestamp(ts) == t)
-  assert(DateUtils.toJavaTimestamp(DateUtils.fromJavaTimestamp(ts)) == 
ts)
+  val ts = DateTimeUtils.toJavaTimestamp(t)
+  assert(DateTimeUtils.fromJavaTimestamp(ts) == t)
+  
assert(DateTimeUtils.toJavaTimestamp(DateTimeUtils.fromJavaTimestamp(ts)) == ts)
 }
   }
+
+  test(100ns and julian day) {
+val (d, ns) = DateTimeUtils.toJulianDay(0)
+assert(d == 2440587)
--- End diff --

Could use `===` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113656191
  
  [Test build #35329 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35329/consoleFull)
 for   PR 6632 at commit 
[`5a05d0f`](https://github.com/apache/spark/commit/5a05d0f633b66ffe42f8e7bb8f4e09308d79fa29).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8080][STREAMING] Receiver.store with It...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/6707#issuecomment-113656399
  
Thank you very much for this patch. This was a very important one,
especially the tests.

On Thu, Jun 18, 2015 at 8:02 PM, asfgit notificati...@github.com wrote:

 Closed #6707 https://github.com/apache/spark/pull/6707 via 3eaed87
 
https://github.com/apache/spark/commit/3eaed8769c16e887edb9d54f5816b4ee6da23de5
 .

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/6707#event-334837731.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8483][Streaming] Remove commons-lang3 d...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6910#issuecomment-113663947
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113669127
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113669074
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5717#issuecomment-113670190
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5717#issuecomment-113670244
  
  [Test build #35337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35337/consoleFull)
 for   PR 5717 at commit 
[`211e101`](https://github.com/apache/spark/commit/211e1012dc28ed610d294d0678b1d5621a901e53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/5717#issuecomment-113670198
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8492] [SQL] support binaryType in Unsaf...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6911#issuecomment-113675867
  
  [Test build #35339 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35339/consoleFull)
 for   PR 6911 at commit 
[`447dea0`](https://github.com/apache/spark/commit/447dea051b13da73e0b84e3de72fd16e6d466765).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...

2015-06-19 Thread ericl

Github user ericl commented on the pull request:

https://github.com/apache/spark/pull/6912#issuecomment-113676454
  
@yhuai 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113676455
  
  [Test build #35329 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35329/console)
 for   PR 6632 at commit 
[`5a05d0f`](https://github.com/apache/spark/commit/5a05d0f633b66ffe42f8e7bb8f4e09308d79fa29).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`
  * `abstract class GeneratedClass `
  * `case class Bin(child: Expression)`
  * `case class Md5(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6749] [SQL] Make metastore client robus...

2015-06-19 Thread ericl

GitHub user ericl opened a pull request:

https://github.com/apache/spark/pull/6912

[SPARK-6749] [SQL] Make metastore client robust to underlying socket 
connection loss

This works around a bug in the underlying RetryingMetaStoreClient 
(HIVE-10384) by refreshing the metastore client on thrift exceptions. We 
attempt to emulate the proper hive behavior by retrying only as configured by 
hiveconf.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ericl/spark spark-6749

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/6912.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #6912


commit 7c8ee1691e76545dab5cfe9101e3ecf290117818
Author: Eric Liang e...@databricks.com
Date:   2015-06-19T23:50:57Z

Work around RetryingMetaStoreClient bug




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4644 blockjoin

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6883#issuecomment-113630110
  
  [Test build #35315 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35315/consoleFull)
 for   PR 6883 at commit 
[`adef52e`](https://github.com/apache/spark/commit/adef52ed4c335980e73c61036abb2a2806965de3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/6863#issuecomment-113630096
  
So the JIRA was about updating the examples actually. Its great that you 
have updated the docs AND the tests, but it would ideal if the examples 
DirectKafkaWordCount and JavaDirectKafkaWordCount are updated to show how the 
offset ranges can be accessed. Since you have updated the tests, mind updating 
the examples as well? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8481] [MLlib] GaussianMixtureModel pred...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6906#issuecomment-113632802
  
  [Test build #35318 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35318/consoleFull)
 for   PR 6906 at commit 
[`cb87180`](https://github.com/apache/spark/commit/cb87180516973caf772c95405a39b3f9bd627272).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [HotFIX] Fix scala style in DFSReadWriteTest t...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6907#issuecomment-113632842
  
  [Test build #35304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35304/console)
 for   PR 6907 at commit 
[`c53f188`](https://github.com/apache/spark/commit/c53f1883409648723ec543a06b4e0efde0b5ba0e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8186] [SPARK-8187] [SQL] datetime funct...

2015-06-19 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/6782#discussion_r32867754
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeFunctions.scala
 ---
@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.expressions.codegen.{GeneratedExpressionCode, 
CodeGenContext}
+import org.apache.spark.sql.catalyst.util.DateUtils
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * Adds a number of days to startdate: date_add('2008-12-31', 1) = 
'2009-01-01'.
+ */
+case class DateAdd(startDate: Expression, days: Expression) extends 
Expression {
+  override def children: Seq[Expression] = startDate :: days :: Nil
+
+  override def foldable: Boolean = startDate.foldable  days.foldable
+  override def nullable: Boolean = startDate.nullable || days.nullable
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+val supportedLeftType = Seq(StringType, DateType, TimestampType, 
NullType)
+if (!supportedLeftType.contains(startDate.dataType)) {
+  TypeCheckResult.TypeCheckFailure(
+stype of startdate expression in DateAdd should be 
string/timestamp/date, +
+  s not ${startDate.dataType})
+} else if (days.dataType != IntegerType  days.dataType != NullType) {
+  TypeCheckResult.TypeCheckFailure(
+stype of days expression in DateAdd should be int, not 
${days.dataType}.)
+} else {
+  TypeCheckResult.TypeCheckSuccess
+}
+  }
+
+  override def dataType: DataType = StringType
--- End diff --

In general though we cast back to a string whenever you need to.  From an 
efficiency stand point it seems much better to keep it a date.

/cc @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113644879
  
  [Test build #35310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35310/console)
 for   PR 6632 at commit 
[`f68bd32`](https://github.com/apache/spark/commit/f68bd3266df27fc8238195ac443c3e2cdb37803a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`
  * `abstract class GeneratedClass `
  * `case class Bin(child: Expression)`
  * `case class Md5(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6632#discussion_r32869331
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
 ---
@@ -68,6 +68,21 @@ class KafkaRDDSuite extends SparkFunSuite with 
BeforeAndAfterAll {
 
 val received = rdd.map(_._2).collect.toSet
 assert(received === messages)
+
+// size-related method optimizations return sane results
+assert(rdd.count === messages.size)
+assert(rdd.countApprox(0).getFinalValue.mean === messages.size)
+assert(! rdd.isEmpty)
--- End diff --

nit: extra space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6632#issuecomment-113644969
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8359][SQL] Fix incorrect decimal precis...

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/6814#discussion_r32869329
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala ---
@@ -286,6 +288,9 @@ object Decimal {
   /** Maximum number of decimal digits a Long can represent */
   val MAX_LONG_DIGITS = 18
 
+  /** Maximum precision a Decimal can support */
+  val MAX_PRECISION = 38
--- End diff --

Having a short discussion with @marmbrus , we won't going to have fixed 
maximum precision in short term, will still support higher even unlimited 
precision. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113647862
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113647849
  
  [Test build #35321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35321/console)
 for   PR 6759 at commit 
[`4891efb`](https://github.com/apache/spark/commit/4891efbb6b5f277082c06ea56400c83bc4678f35).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8265] [MLlib] [PySpark] Add LinearDataG...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6715#issuecomment-113648863
  
  [Test build #35312 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35312/console)
 for   PR 6715 at commit 
[`6182884`](https://github.com/apache/spark/commit/618288411ff36fee254f4304acf7137018c01ec3).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  // class ParentClass(parentField: Int)`
  * `  // class ChildClass(childField: Int) extends ParentClass(1)`
  * `  // If the class type corresponding to current slot has 
writeObject() defined,`
  * `  // then its not obvious which fields of the class will be 
serialized as the writeObject()`
  * `class StreamingKMeansModel(KMeansModel):`
  * `class StreamingKMeans(object):`
  * `class LinearDataGenerator(object):`
  * `abstract class GeneratedClass `
  * `case class Md5(child: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8265] [MLlib] [PySpark] Add LinearDataG...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6715#issuecomment-113648895
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Expose regionName setting in Kinesis receiver ...

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/5375#issuecomment-113648887
  
This is not needed any more as Spark 1.4.0 has fixed this issue. Mind 
closing this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...

2015-06-19 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/6829#issuecomment-113653289
  
That looks like just the API module. I suspect it comes via the actual 
implementation such as in 
http://mvnrepository.com/artifact/org.apache.flume/flume-ng-sources/1.6.0 but I 
don't know Flume well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/6759#discussion_r32872412
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala 
---
@@ -313,10 +314,16 @@ private[parquet] class RowWriteSupport extends 
WriteSupport[InternalRow] with Lo
 writer.addBinary(Binary.fromByteArray(scratchBytes, 0, numBytes))
   }
 
+  // array used to write Timestamp as Int96 (fixed-length binary)
+  private val int96buf = new Array[Byte](12)
+
   private[parquet] def writeTimestamp(ts: Long): Unit = {
-val binaryNanoTime = CatalystTimestampConverter.convertFromTimestamp(
-  DateUtils.toJavaTimestamp(ts))
-writer.addBinary(binaryNanoTime)
+val (julianDay, timeOfDayNanos) = DateTimeUtils.toJulianDay(ts)
+val buf = ByteBuffer.wrap(int96buf)
--- End diff --

Yeah, let's just leave this as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark-7300 remove temporary directories after ...

2015-06-19 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/5834#discussion_r32873174
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -251,6 +251,11 @@ case class InsertIntoHiveTable(
   }
 }
 
+//remove temporary directories
+val fs = outputPath.getFileSystem(jobConf)
+if ( outputPath.getParent.isRoot == false ) 
+  fs.delete(outputPath.getParent,true)
--- End diff --

also, please unindent these by 3 spaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8376][Docs]Add common lang3 to the Spar...

2015-06-19 Thread harishreedharan

Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/6829#issuecomment-113659788
  
Yes, all of the libs in the flume-ng/lib directory gets added to the 
classpath, so scala would get added to the classpath, but get loaded only as 
required (which is normal JVM protocol). We'd have to bump our dependency to 
1.6.0 for scala to be automagically available. 

Even if we don't upgrade, we don't need to change the dependency set, as 
the behavior is the same as before (add scala to flume-ng/lib or plugins dir). 
Apart from the assembly part, nothing else changes. I am sending a PR soon to 
get rid of the commons-lang3 dependency anyway


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8307] [SQL] improve timestamp from parq...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6759#issuecomment-113659586
  
  [Test build #35332 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35332/consoleFull)
 for   PR 6759 at commit 
[`634b9f5`](https://github.com/apache/spark/commit/634b9f5540b8045b24c20c7296d8cd73193c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113661550
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8420][SQL] Fix comparision of timestamp...

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6888#issuecomment-113661524
  
  [Test build #35320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/35320/console)
 for   PR 6888 at commit 
[`bdef29c`](https://github.com/apache/spark/commit/bdef29c4327245e33e3a6f8b6e9402dbc2ac9e4d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8498] [SQL] Add regression test for SPA...

2015-06-19 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/6909#issuecomment-113668048
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6876#issuecomment-113670534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8432] [SQL] fix hashCode() and equals()...