date:20150701

[GitHub] spark pull request: [SPARK-8238][SPARK-8239][SPARK-8242][SPARK-824...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6843#issuecomment-117935917
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7180#issuecomment-117935906
  
  [Test build #36356 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36356/consoleFull)
 for   PR 7180 at commit 
[`e4428f0`](https://github.com/apache/spark/commit/e4428f004ad7832795ba77343633bb76d8e11697).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] [WIP] Binary process...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6444#issuecomment-117935843
  
  [Test build #36357 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36357/consoleFull)
 for   PR 6444 at commit 
[`1b54894`](https://github.com/apache/spark/commit/1b54894824e4dd2b77874ac60a749638b04755d0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8238][SPARK-8239][SPARK-8242][SPARK-824...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6843#issuecomment-117935849
  
  [Test build #36341 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36341/console)
 for   PR 6843 at commit 
[`947a88a`](https://github.com/apache/spark/commit/947a88a99a84ea9af5e9f5fdd8e55ce69b2d915b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ExpectsInputTypes `
  * `trait AutoCastInputTypes `
  * `abstract class BinaryExpression extends Expression with 
trees.BinaryNode[Expression] `
  * `abstract class BinaryOperator extends BinaryExpression `
  * `abstract class BinaryArithmetic extends BinaryOperator `
  * `abstract class BinaryComparison extends BinaryOperator with Predicate `
  * `case class Ascii(child: Expression) extends UnaryExpression with 
AutoCastInputTypes `
  * `case class Base64(child: Expression) extends UnaryExpression with 
AutoCastInputTypes `
  * `case class UnBase64(child: Expression) extends UnaryExpression with 
AutoCastInputTypes `
  * `case class Decode(bin: Expression, charset: Expression)`
  * `case class Encode(value: Expression, charset: Expression)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7180#issuecomment-117935787
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] [WIP] Binary process...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6444#issuecomment-117935770
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] [WIP] Binary process...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6444#issuecomment-117935784
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7180#issuecomment-117935766
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread tarekauel

Github user tarekauel commented on the pull request:

https://github.com/apache/spark/pull/7178#issuecomment-117935673
  
Thanks for your feedback. I removed the type information from the python 
description and changed it for the dataframe api. I hope it's clear now.

One comment to python: The max integer value is the max long value of 
Java/Scala. Because of that there is no value in specifying the result type for 
python.
```
>>> type(sqlContext.createDataFrame([(sys.maxint,)], 
['a']).select(shiftLeft('a', 1).alias('r')).first().asDict().get('r'))

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread sarutak

Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/7180#issuecomment-117935702
  
ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8777] [SQL] Add random data generator t...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/7176#issuecomment-117935616
  
Here's another neat use-case for this data generator: in my Tungsten SQL 
external sort patch, I used these generators to produce random input rows for 
testing my sort operator: 
https://github.com/JoshRosen/spark/blob/92e9bd1f50f2aa93cb8df0ce588fd03f1bfee5d5/sql/core/src/test/scala/org/apache/spark/sql/execution/UnsafeExternalSortSuite.scala#L41


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8016] YARN cluster / client modes have ...

2015-07-01 Thread ehnalis

Github user ehnalis commented on the pull request:

https://github.com/apache/spark/pull/6671#issuecomment-117935529
  
It would be nice to have at least some clarification on this matter in the 
docs. Or we might consider deprecating the {{SparkContext.setAppName}}, since 
it can not provide a solution for every deployment scenario.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33751162
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1299,6 +1299,44 @@ object functions {
   def rint(columnName: String): Column = rint(Column(columnName))
 
   /**
+   * Shift the the given value numBits left. Returns int for tinyint, 
smallint and int and
+   * bigint for bigint a.
+   *
+   * @group math_funcs
+   * @since 1.5.0
+   */
+  def shiftLeft(e: Column, numBits: Integer): Column = ShiftLeft(e.expr, 
lit(numBits).expr)
--- End diff --

I mean we should use `Int` in the signature. 
```
def shiftLeft(e: Column, numBits: Int)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33751078
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
 ---
@@ -351,6 +351,108 @@ case class Pow(left: Expression, right: Expression)
   }
 }
 
+case class ShiftLeft(left: Expression, right: Expression) extends 
BinaryExpression {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (NullType, _) | (_, NullType) => return 
TypeCheckResult.TypeCheckSuccess
+  case (_, IntegerType) => left.dataType match {
+case LongType | IntegerType | ShortType | ByteType =>
+  return TypeCheckResult.TypeCheckSuccess
+case _ => // failed
+  }
+  case _ => // failed
+}
+TypeCheckResult.TypeCheckFailure(
+s"ShiftLeft expects long, integer, short or byte value as first 
argument and an " +
+  s"integer value as second argument, not (${left.dataType}, 
${right.dataType})")
+  }
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+if (valueLeft != null) {
+  val valueRight = right.eval(input)
+  if (valueRight != null) {
+valueLeft match {
+  case l: Long => l << valueRight.asInstanceOf[Integer]
+  case i: Integer => i << valueRight.asInstanceOf[Integer]
+  case s: Short => s << valueRight.asInstanceOf[Integer]
+  case b: Byte => b << valueRight.asInstanceOf[Integer]
--- End diff --

yes, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] [WIP] Binary process...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6444#issuecomment-117934941
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7078] [SPARK-7079] [WIP] Binary process...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6444#issuecomment-117934882
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117934332
  
  [Test build #36354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36354/consoleFull)
 for   PR 6822 at commit 
[`26dbed2`](https://github.com/apache/spark/commit/26dbed27cba6ca14e0c453a0b4a37b8e3c9fd132).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7179#discussion_r33750893
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
 
 futures.foreach(Await.result(_, 10.seconds))
   }
+
+  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
+  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
+test(s"GenerateOrdering with $dataType") {
+  val rowOrdering = RowOrdering.forSchema(Seq(dataType, dataType))
+  val genOrdering = GenerateOrdering.generate(
+BoundReference(0, dataType, nullable = true).asc ::
+BoundReference(1, dataType, nullable = true).asc :: Nil)
+  val rowType = StructType(
+StructField("a", dataType, nullable = true) ::
+StructField("b", dataType, nullable = true) :: Nil)
+  val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
+  // Sort ordering is not defined for NaN, so skip any random inputs 
that contain it:
+  def isIncomparable(v: Any): Boolean = v match {
--- End diff --

damn


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117932225
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117932376
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread tarekauel

Github user tarekauel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33750813
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1299,6 +1299,44 @@ object functions {
   def rint(columnName: String): Column = rint(Column(columnName))
 
   /**
+   * Shift the the given value numBits left. Returns int for tinyint, 
smallint and int and
+   * bigint for bigint a.
+   *
+   * @group math_funcs
+   * @since 1.5.0
+   */
+  def shiftLeft(e: Column, numBits: Integer): Column = ShiftLeft(e.expr, 
lit(numBits).expr)
--- End diff --

If your comment is about the `lit(numBits)` , yes it's true. I tried to do 
it consistent with other functions (sha2 does it the same way, see 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1458,
 or log 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1161).
 Do I get something wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-8398 hadoop input/output format advanced...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6848#issuecomment-117930897
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-8398 hadoop input/output format advanced...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6848#issuecomment-117930360
  
  [Test build #36337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36337/console)
 for   PR 6848 at commit 
[`96ccee0`](https://github.com/apache/spark/commit/96ccee0d1eaf08303777042e094cbabcc51a0140).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `   * converters, but then we couldn't have an object for every 
subclass of Writable (you can't`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7180#issuecomment-117926802
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/7180#issuecomment-117924835
  
Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-7879][MLlib] KMeans API for spark.ml Pi...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6756#issuecomment-117924373
  
  [Test build #36353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36353/consoleFull)
 for   PR 6756 at commit 
[`4d2ad1e`](https://github.com/apache/spark/commit/4d2ad1ea4adf24836c43caeb69379c43d63bd3ec).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7179#discussion_r33750487
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
 
 futures.foreach(Await.result(_, 10.seconds))
   }
+
+  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
+  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
+test(s"GenerateOrdering with $dataType") {
+  val rowOrdering = RowOrdering.forSchema(Seq(dataType, dataType))
+  val genOrdering = GenerateOrdering.generate(
+BoundReference(0, dataType, nullable = true).asc ::
+BoundReference(1, dataType, nullable = true).asc :: Nil)
+  val rowType = StructType(
+StructField("a", dataType, nullable = true) ::
+StructField("b", dataType, nullable = true) :: Nil)
+  val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
+  // Sort ordering is not defined for NaN, so skip any random inputs 
that contain it:
+  def isIncomparable(v: Any): Boolean = v match {
--- End diff --

It turns out that it's actually possible to crash the `Sort` operator with 
"Comparison method violates its general contract!" errors if NaNs are present 
in the column being sorted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8783] [SQL] CTAS with WITH clause does ...

2015-07-01 Thread sirpkt

GitHub user sirpkt opened a pull request:

https://github.com/apache/spark/pull/7180

[SPARK-8783] [SQL] CTAS with WITH clause does not work

Currently, CTESubstitution only handles the case that WITH is on the top of 
the plan.
I think it SHOULD handle the case that WITH is child of CTAS.
This patch simply changes 'match' to 'transform' for recursive search of 
WITH in the plan. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sirpkt/spark SPARK-8783

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7180.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7180


commit 1671c77139d4e2e65e4b00acabf5f82512129dd1
Author: Keuntae Park 
Date:   2015-07-02T06:09:18Z

WITH clause can be inside CTAS

commit e4428f004ad7832795ba77343633bb76d8e11697
Author: Keuntae Park 
Date:   2015-07-02T06:20:45Z

Merge remote-tracking branch 'upstream/master' into CTASwithWITH




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-7879][MLlib] KMeans API for spark.ml Pi...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6756#issuecomment-117924302
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-7879][MLlib] KMeans API for spark.ml Pi...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6756#issuecomment-117924293
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Spark-7879][MLlib] KMeans API for spark.ml Pi...

2015-07-01 Thread yu-iskw

Github user yu-iskw commented on the pull request:

https://github.com/apache/spark/pull/6756#issuecomment-117924077
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7179#discussion_r33750308
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
 
 futures.foreach(Await.result(_, 10.seconds))
   }
+
+  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
+  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
+test(s"GenerateOrdering with $dataType") {
--- End diff --

The nesting of the loops here is slightly misleading, because we'll always 
report a passed test for types where we don't have a data generator. We at 
least test that we're able to generate code for the ordering even if we don't 
actually execute that code. Maybe this is an okay trade-off, but it's a concern 
to watch out for.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8677][SQL] Fix non-terminating decimal ...

2015-07-01 Thread JihongMA

Github user JihongMA commented on the pull request:

https://github.com/apache/spark/pull/7056#issuecomment-117923839
  
Thanks for fixing this division problem.  after rebasing with the fix, I 
noticed one more issue w.r.t the accuracy of Decimal computation.

scala> val aa = Decimal(2) / Decimal(3);
aa: org.apache.spark.sql.types.Decimal = 1

when a Decimal is defined as Decimal.Unlimited, if we inherit the scale 
value of the result from its parent, we will see big accuracy issue as shown in 
the above example output, once we go coupe round of division over decimal data 
vs. double data. below is a sample output from my run while testing my code 
change, as you can see the result is far off from its double counterpart,  
since you guys have been fixing issue around Decimal, would like to see if we 
can work out a more proper fix in this context, is there guideline about 
precision/scale settings for Decimal.Unlimited when it comes to division 
operation? 

10:27:46.042 WARN 
org.apache.spark.sql.catalyst.expressions.CombinePartialStdFunction: COMBINE 
STDDEV DOUBLE---4.0 , 0.8VALUE
10:27:46.137 WARN 
org.apache.spark.sql.catalyst.expressions.CombinePartialStdFunction: COMBINE 
STDDEV DECIMAL---4.29000 , 0.858VALUE


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7179#discussion_r33750029
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
 
 futures.foreach(Await.result(_, 10.seconds))
   }
+
+  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
+  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
+test(s"GenerateOrdering with $dataType") {
+  val rowOrdering = RowOrdering.forSchema(Seq(dataType, dataType))
+  val genOrdering = GenerateOrdering.generate(
+BoundReference(0, dataType, nullable = true).asc ::
+BoundReference(1, dataType, nullable = true).asc :: Nil)
+  val rowType = StructType(
+StructField("a", dataType, nullable = true) ::
+StructField("b", dataType, nullable = true) :: Nil)
+  val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
+  // Sort ordering is not defined for NaN, so skip any random inputs 
that contain it:
+  def isIncomparable(v: Any): Boolean = v match {
--- End diff --

Given that we might use sorting for clustering as part of a sort-based 
distinct operator, I wonder whether this has any bad implications for 
performing distinct on columns that contain NaN. Should we warn about this 
undefined behavior somewhere in our documentation?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33750007
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -174,8 +171,7 @@ case class Sha1(child: Expression) extends 
UnaryExpression with AutoCastInputTyp
  * A function that computes a cyclic redundancy check value and returns it 
as a bigint
  * For input of type [[BinaryType]]
  */
-case class Crc32(child: Expression)
-  extends UnaryExpression with AutoCastInputTypes {
+case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
--- End diff --

+1 for StringType -> BinaryType


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33750006
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -174,8 +171,7 @@ case class Sha1(child: Expression) extends 
UnaryExpression with AutoCastInputTyp
  * A function that computes a cyclic redundancy check value and returns it 
as a bigint
  * For input of type [[BinaryType]]
  */
-case class Crc32(child: Expression)
-  extends UnaryExpression with AutoCastInputTypes {
+case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
--- End diff --

I think that's a good idea for this, but it probably make thing more 
complicated for auto casting. (Which data type should be cast to?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8777] Add random data generator test ut...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/7176#issuecomment-117923085
  
#7179 has a fun example of how I used these utilities to uncover a bug in 
`GenerateOrdering` while writing a high-coverage test for that class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...

2015-07-01 Thread dbtsai

Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/7080#discussion_r33749927
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -98,6 +98,15 @@ class LogisticRegression(override val uid: String)
   def setFitIntercept(value: Boolean): this.type = set(fitIntercept, value)
   setDefault(fitIntercept -> true)
 
+  /**
+   * Whether to standardize the training features prior to fitting the 
model sequence.
--- End diff --

This is copied from R's description. I think it's about fitting a sequence 
of models with different regularization. I will modify it to `to fitting the 
model`.

```R
standardize 
Logical flag for x variable standardization, prior to fitting the model 
sequence. 
The coefficients are always returned on the original scale. Default is 
standardize=TRUE. 
If variables are in the same units already, you might not wish to 
standardize. 
See details below for y standardization with family="gaussian".
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7179#issuecomment-117923013
  
  [Test build #36352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36352/consoleFull)
 for   PR 7179 at commit 
[`f9efbb5`](https://github.com/apache/spark/commit/f9efbb5f317d28f8d38e1de9943fa9f976e8b5e5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7179#discussion_r33749911
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
 
 futures.foreach(Await.result(_, 10.seconds))
   }
+
+  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
+  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
+test(s"GenerateOrdering with $dataType") {
+  val rowOrdering = RowOrdering.forSchema(Seq(dataType, dataType))
+  val genOrdering = GenerateOrdering.generate(
+BoundReference(0, dataType, nullable = true).asc ::
+BoundReference(1, dataType, nullable = true).asc :: Nil)
+  val rowType = StructType(
+StructField("a", dataType, nullable = true) ::
+StructField("b", dataType, nullable = true) :: Nil)
+  val toCatalyst = 
CatalystTypeConverters.createToCatalystConverter(rowType)
+  // Sort ordering is not defined for NaN, so skip any random inputs 
that contain it:
+  def isIncomparable(v: Any): Boolean = v match {
--- End diff --

While working on this, I discovered that `RowOrdering` and 
`GenerateOrdering` disagree for inputs containing NaN. This isn't a bug per-se, 
since many systems have undefined behavior when sorting on NaN.  For this 
reason, I think that some databases prohibit NaN and Infinity from being used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7179#issuecomment-117922918
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8656][WebUI] Fix the webUI and JSON API...

2015-07-01 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/7038#issuecomment-117922942
  
what's the non-compatible part?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7179#issuecomment-117922900
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749855
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -174,8 +171,7 @@ case class Sha1(child: Expression) extends 
UnaryExpression with AutoCastInputTyp
  * A function that computes a cyclic redundancy check value and returns it 
as a bigint
  * For input of type [[BinaryType]]
  */
-case class Crc32(child: Expression)
-  extends UnaryExpression with AutoCastInputTypes {
+case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
--- End diff --

I was thinking about having an AbstractDataType that's a TypeCollection, 
that expressions can put arbitrary types into it. Basically similar to the 
Seq[Any] idea, but with better type safety.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/7179#discussion_r33749834
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
 ---
@@ -42,4 +47,47 @@ class CodeGenerationSuite extends SparkFunSuite {
 
 futures.foreach(Await.result(_, 10.seconds))
   }
+
+  // Test GenerateOrdering for all common types. For each type, we 
construct random input rows that
+  // contain two columns of that type, then for pairs of 
randomly-generated rows we check that
+  // GenerateOrdering agrees with RowOrdering.
+  (DataTypeTestUtils.atomicTypes ++ Set(NullType)).foreach { dataType =>
--- End diff --

This test is total overkill, but it's a neat example of how randomized data 
generation plus a list of types can be used for exploratory testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: spark ssc.textFileStream returns empty

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/6837#issuecomment-117922732
  
@sduchh this is opened against the wrong branch. Please submit the change 
to the master branch and file a JIRA here to describe the issue: 
https://issues.apache.org/jira/browse/SPARK. In the mean time we should close 
this issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/7179#issuecomment-117922725
  
This should block on #7176 being merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8496] [TEST] Do not run slow tests for ...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/6917#issuecomment-117922651
  
I'm going to close this until we can do that. Note to self: DO NOT delete 
the branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749803
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -704,19 +704,46 @@ object HiveTypeCoercion {
 
   /**
* Casts types according to the expected input types for Expressions 
that have the trait
-   * [[AutoCastInputTypes]].
+   * [[ExpectsInputTypes]].
*/
   object ImplicitTypeCasts extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
-  case e: AutoCastInputTypes if e.children.map(_.dataType) != 
e.inputTypes =>
-val newC = (e.children, e.children.map(_.dataType), 
e.inputTypes).zipped.map {
-  case (child, actual, expected) =>
-if (actual == expected) child else Cast(child, expected)
+  case e: ExpectsInputTypes =>
+val children: Seq[Expression] = e.children.zip(e.inputTypes).map { 
case (in, expected) =>
+  implicitCast(in, expected)
 }
-e.withNewChildren(newC)
+e.withNewChildren(children)
+}
+
+/**
+ * If needed, cast the expression into the expected type.
+ * If the implicit cast is not allowed, return the expression itself.
+ */
+def implicitCast(e: Expression, expectedType: AbstractDataType): 
Expression = {
+  (e, expectedType) match {
+// Cast null type (usually from null literals) into target types
+case (in @ NullType(), target: DataType) => Cast(in, 
target.defaultConcreteType)
+
+// Implicit cast among numeric types
+case (in @ NumericType(), target: NumericType) if in.dataType != 
target =>
+  Cast(in, target)
+
+// Implicit cast between date time types
+case (in @ DateType(), TimestampType) => Cast(in, TimestampType)
+case (in @ TimestampType(), DateType) => Cast(in, DateType)
+
+// Implicit from string to atomic types, and vice versa
+case (in @ StringType(), target: AtomicType) if target != 
StringType =>
+  Cast(in, target.defaultConcreteType)
+case (in, StringType) if in.dataType != StringType =>
--- End diff --

Yes, that's what I was thinking, if we need to work with 
`Expression.checkInputTypes` or combine the logic.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8782] [SQL] Fix code generation for ORD...

2015-07-01 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/7179

[SPARK-8782] [SQL] Fix code generation for ORDER BY NULL

This fixes code generation for queries containing `ORDER BY NULL`.  
Previously, the generated code would fail to compile.

I added a traditional regression test for this issue, plus a much larger 
fuzz test built on top of the randomized testing utilities introduced in #7176. 
 This test failed to uncover any additional bugs in GenerateOrdering but did 
manage to discover an interesting corner-case involving NaNs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark generate-order-fixes

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/7179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #7179


commit d2b4a4a9a2139b1a6c2be5d1f1aa3d98a6c9ed99
Author: Josh Rosen 
Date:   2015-07-02T03:18:05Z

Add random data generator test utilities to Spark SQL.

commit ab76cbd89bf800d590b7833f5a25c62df4ec2a95
Author: Josh Rosen 
Date:   2015-07-02T04:37:38Z

Move code to Catalyst package.

commit 5acdd5ccf36487ba49815e8e0429f4c99558d427
Author: Josh Rosen 
Date:   2015-07-02T05:15:13Z

Infinity and NaN are interesting.

commit b55875a05e4805cfdf2c3468a6cd50eec6a30578
Author: Josh Rosen 
Date:   2015-07-02T05:23:55Z

Generate doubles and floats over entire possible range.

commit 7d5c13ea39cc0b811cc57b58b4214395026b1432
Author: Josh Rosen 
Date:   2015-07-02T05:40:55Z

Add regression test for SPARK-8782 (ORDER BY NULL)

commit e7dc4fbb7c9e441c4367af7680c3acb42440ef33
Author: Josh Rosen 
Date:   2015-07-02T06:15:49Z

Add very generic test for ordering

commit f9efbb5f317d28f8d38e1de9943fa9f976e8b5e5
Author: Josh Rosen 
Date:   2015-07-02T06:17:28Z

Fix ORDER BY NULL




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8496] [TEST] Do not run slow tests for ...

2015-07-01 Thread andrewor14

Github user andrewor14 closed the pull request at:

https://github.com/apache/spark/pull/6917


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749707
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -174,8 +171,7 @@ case class Sha1(child: Expression) extends 
UnaryExpression with AutoCastInputTyp
  * A function that computes a cyclic redundancy check value and returns it 
as a bigint
  * For input of type [[BinaryType]]
  */
-case class Crc32(child: Expression)
-  extends UnaryExpression with AutoCastInputTypes {
+case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
--- End diff --

I am not sure if we can always cast a string to binary correctly, as it 
produces different binary when specifying different encoder. It's actually the 
case accept multiple `DataType` for an expression. 
And also for `Length`, which support both `StringType` and `BinaryType`.

We probably need another PR for this improvement.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8656][WebUI] Fix the webUI and JSON API...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7038#issuecomment-117922427
  
Hm I just saw @srowen's comment. Actually this change is not backward 
compatible so I'm not sure if we can merge it. @rxin any thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8707] RDD#toDebugString fails if any ca...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7127#issuecomment-117922333
  
  [Test build #36350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36350/consoleFull)
 for   PR 7127 at commit 
[`acf3661`](https://github.com/apache/spark/commit/acf36617d045c2fe28f1f2c6e1bf332f4d6cd463).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117922330
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117922327
  
  [Test build #36351 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36351/console)
 for   PR 6822 at commit 
[`c51a2e2`](https://github.com/apache/spark/commit/c51a2e266ad58fe2c868a37e4ddda3fd629318d5).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class Factorial(child: Expression)`
  * `case class UnHex(child: Expression) extends UnaryExpression with 
Serializable `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8687][YARN]Fix bug: Executor can't fetc...

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7066


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117922112
  
  [Test build #36351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36351/consoleFull)
 for   PR 6822 at commit 
[`c51a2e2`](https://github.com/apache/spark/commit/c51a2e266ad58fe2c868a37e4ddda3fd629318d5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread tarekauel

Github user tarekauel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33749593
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
 ---
@@ -351,6 +351,108 @@ case class Pow(left: Expression, right: Expression)
   }
 }
 
+case class ShiftLeft(left: Expression, right: Expression) extends 
BinaryExpression {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (NullType, _) | (_, NullType) => return 
TypeCheckResult.TypeCheckSuccess
+  case (_, IntegerType) => left.dataType match {
+case LongType | IntegerType | ShortType | ByteType =>
+  return TypeCheckResult.TypeCheckSuccess
+case _ => // failed
+  }
+  case _ => // failed
+}
+TypeCheckResult.TypeCheckFailure(
+s"ShiftLeft expects long, integer, short or byte value as first 
argument and an " +
+  s"integer value as second argument, not (${left.dataType}, 
${right.dataType})")
+  }
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+if (valueLeft != null) {
+  val valueRight = right.eval(input)
+  if (valueRight != null) {
+valueLeft match {
+  case l: Long => l << valueRight.asInstanceOf[Integer]
+  case i: Integer => i << valueRight.asInstanceOf[Integer]
+  case s: Short => s << valueRight.asInstanceOf[Integer]
+  case b: Byte => b << valueRight.asInstanceOf[Integer]
--- End diff --

What do you mean by overflow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8687][YARN]Fix bug: Executor can't fetc...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7066#issuecomment-117922073
  
Merging into master. Thanks @SaintBacchus!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749566
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -704,19 +704,46 @@ object HiveTypeCoercion {
 
   /**
* Casts types according to the expected input types for Expressions 
that have the trait
-   * [[AutoCastInputTypes]].
+   * [[ExpectsInputTypes]].
*/
   object ImplicitTypeCasts extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
-  case e: AutoCastInputTypes if e.children.map(_.dataType) != 
e.inputTypes =>
-val newC = (e.children, e.children.map(_.dataType), 
e.inputTypes).zipped.map {
-  case (child, actual, expected) =>
-if (actual == expected) child else Cast(child, expected)
+  case e: ExpectsInputTypes =>
+val children: Seq[Expression] = e.children.zip(e.inputTypes).map { 
case (in, expected) =>
+  implicitCast(in, expected)
 }
-e.withNewChildren(newC)
+e.withNewChildren(children)
+}
+
+/**
+ * If needed, cast the expression into the expected type.
+ * If the implicit cast is not allowed, return the expression itself.
+ */
+def implicitCast(e: Expression, expectedType: AbstractDataType): 
Expression = {
+  (e, expectedType) match {
+// Cast null type (usually from null literals) into target types
+case (in @ NullType(), target: DataType) => Cast(in, 
target.defaultConcreteType)
+
+// Implicit cast among numeric types
+case (in @ NumericType(), target: NumericType) if in.dataType != 
target =>
+  Cast(in, target)
+
+// Implicit cast between date time types
+case (in @ DateType(), TimestampType) => Cast(in, TimestampType)
+case (in @ TimestampType(), DateType) => Cast(in, DateType)
+
+// Implicit from string to atomic types, and vice versa
+case (in @ StringType(), target: AtomicType) if target != 
StringType =>
+  Cast(in, target.defaultConcreteType)
+case (in, StringType) if in.dataType != StringType =>
--- End diff --

btw "never" to a certain extent. I think there can be exceptions, but very 
few.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117921999
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8707] RDD#toDebugString fails if any ca...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7127#issuecomment-117922019
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8213][SQL]Add function factorial

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6822#issuecomment-117922024
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8707] RDD#toDebugString fails if any ca...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7127#issuecomment-117921765
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8591][CORE]Block failed to unroll to me...

2015-07-01 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/6990#discussion_r33749523
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -833,8 +833,10 @@ private[spark] class BlockManager(
 logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
 
 // Either we're storing bytes and we asynchronously started 
replication, or we're storing
-// values and need to serialize and replicate them now:
-if (putLevel.replication > 1) {
+// values and need to serialize and replicate them now.
+// Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
+// putting it to local is failed.
+if (!putBlockInfo.isFailed && putLevel.replication > 1) {
--- End diff --

Since those blocks are never used and eventually gets evicted, the only
problem is slightly higher memory usage for some time. So I dont really see
a critical problem that needs to be solved at the cost of the de-optimizing
existing code paths.

On Wed, Jul 1, 2015 at 11:01 PM, Dibyendu Bhattacharya <
notificati...@github.com> wrote:

> In core/src/main/scala/org/apache/spark/storage/BlockManager.scala
> :
>
> > @@ -833,8 +833,10 @@ private[spark] class BlockManager(
> >  logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
> >
> >  // Either we're storing bytes and we asynchronously started 
replication, or we're storing
> > -// values and need to serialize and replicate them now:
> > -if (putLevel.replication > 1) {
> > +// values and need to serialize and replicate them now.
> > +// Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
> > +// putting it to local is failed.
> > +if (!putBlockInfo.isFailed && putLevel.replication > 1) {
>
> The problem here is , if local memory got filled up and block store
> failed, blocks still get replicated to remote and used up memory but same
> blocks never used in Streaming jobs... Even though those blocks will
> eventually evicted , but this fix will optimize the memory. I understand
> your concern about RDD partition which can still use the remote replica 
for
> speedup even local store failed.
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8707] RDD#toDebugString fails if any ca...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7127#issuecomment-117921992
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3071] Increase default driver memory

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33749366
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
 ---
@@ -351,6 +351,108 @@ case class Pow(left: Expression, right: Expression)
   }
 }
 
+case class ShiftLeft(left: Expression, right: Expression) extends 
BinaryExpression {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (NullType, _) | (_, NullType) => return 
TypeCheckResult.TypeCheckSuccess
+  case (_, IntegerType) => left.dataType match {
+case LongType | IntegerType | ShortType | ByteType =>
+  return TypeCheckResult.TypeCheckSuccess
+case _ => // failed
+  }
+  case _ => // failed
+}
+TypeCheckResult.TypeCheckFailure(
+s"ShiftLeft expects long, integer, short or byte value as first 
argument and an " +
+  s"integer value as second argument, not (${left.dataType}, 
${right.dataType})")
+  }
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+if (valueLeft != null) {
+  val valueRight = right.eval(input)
+  if (valueRight != null) {
+valueLeft match {
+  case l: Long => l << valueRight.asInstanceOf[Integer]
+  case i: Integer => i << valueRight.asInstanceOf[Integer]
+  case s: Short => s << valueRight.asInstanceOf[Integer]
+  case b: Byte => b << valueRight.asInstanceOf[Integer]
+}
+  } else {
+null
+  }
+} else {
+  null
+}
+  }
+
+  override def dataType: DataType = {
+left.dataType match {
+  case LongType => LongType
+  case IntegerType | ShortType | ByteType => IntegerType
+  case _ => NullType
+}
+  }
+
+  override protected def genCode(ctx: CodeGenContext, ev: 
GeneratedExpressionCode): String = {
+nullSafeCodeGen(ctx, ev, (result, left, right) => s"$result = $left << 
$right;")
+  }
+
+  override def toString: String = s"ShiftLeft($left, $right)"
--- End diff --

Should we use prettyName instead of toString?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8777] Add random data generator test ut...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7176#issuecomment-117921522
  
  [Test build #36339 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36339/console)
 for   PR 7176 at commit 
[`ab76cbd`](https://github.com/apache/spark/commit/ab76cbd89bf800d590b7833f5a25c62df4ec2a95).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8777] Add random data generator test ut...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7176#issuecomment-117921676
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3071] Increase default driver memory

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7132#issuecomment-117921384
  
Merging into master! Thanks @ilganeli. We'll update the release notes 
separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33749445
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
 ---
@@ -351,6 +351,108 @@ case class Pow(left: Expression, right: Expression)
   }
 }
 
+case class ShiftLeft(left: Expression, right: Expression) extends 
BinaryExpression {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (NullType, _) | (_, NullType) => return 
TypeCheckResult.TypeCheckSuccess
+  case (_, IntegerType) => left.dataType match {
+case LongType | IntegerType | ShortType | ByteType =>
+  return TypeCheckResult.TypeCheckSuccess
+case _ => // failed
+  }
+  case _ => // failed
+}
+TypeCheckResult.TypeCheckFailure(
+s"ShiftLeft expects long, integer, short or byte value as first 
argument and an " +
+  s"integer value as second argument, not (${left.dataType}, 
${right.dataType})")
+  }
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+if (valueLeft != null) {
+  val valueRight = right.eval(input)
+  if (valueRight != null) {
+valueLeft match {
+  case l: Long => l << valueRight.asInstanceOf[Integer]
+  case i: Integer => i << valueRight.asInstanceOf[Integer]
+  case s: Short => s << valueRight.asInstanceOf[Integer]
+  case b: Byte => b << valueRight.asInstanceOf[Integer]
+}
+  } else {
+null
+  }
+} else {
+  null
+}
+  }
+
+  override def dataType: DataType = {
+left.dataType match {
+  case LongType => LongType
+  case IntegerType | ShortType | ByteType => IntegerType
+  case _ => NullType
+}
+  }
+
+  override protected def genCode(ctx: CodeGenContext, ev: 
GeneratedExpressionCode): String = {
+nullSafeCodeGen(ctx, ev, (result, left, right) => s"$result = $left << 
$right;")
+  }
+
+  override def toString: String = s"ShiftLeft($left, $right)"
--- End diff --

actually you don't need a toString here. just remove it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7177#issuecomment-117920641
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33749385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
 ---
@@ -351,6 +351,108 @@ case class Pow(left: Expression, right: Expression)
   }
 }
 
+case class ShiftLeft(left: Expression, right: Expression) extends 
BinaryExpression {
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+(left.dataType, right.dataType) match {
+  case (NullType, _) | (_, NullType) => return 
TypeCheckResult.TypeCheckSuccess
+  case (_, IntegerType) => left.dataType match {
+case LongType | IntegerType | ShortType | ByteType =>
+  return TypeCheckResult.TypeCheckSuccess
+case _ => // failed
+  }
+  case _ => // failed
+}
+TypeCheckResult.TypeCheckFailure(
+s"ShiftLeft expects long, integer, short or byte value as first 
argument and an " +
+  s"integer value as second argument, not (${left.dataType}, 
${right.dataType})")
+  }
+
+  override def eval(input: InternalRow): Any = {
+val valueLeft = left.eval(input)
+if (valueLeft != null) {
+  val valueRight = right.eval(input)
+  if (valueRight != null) {
+valueLeft match {
+  case l: Long => l << valueRight.asInstanceOf[Integer]
+  case i: Integer => i << valueRight.asInstanceOf[Integer]
+  case s: Short => s << valueRight.asInstanceOf[Integer]
+  case b: Byte => b << valueRight.asInstanceOf[Integer]
--- End diff --

Should we handle overflow?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749395
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -174,8 +171,7 @@ case class Sha1(child: Expression) extends 
UnaryExpression with AutoCastInputTyp
  * A function that computes a cyclic redundancy check value and returns it 
as a bigint
  * For input of type [[BinaryType]]
  */
-case class Crc32(child: Expression)
-  extends UnaryExpression with AutoCastInputTypes {
+case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
--- End diff --

i need to think about whether we should support implicit casts from string 
to binary. sql server does support that. hive doesn't, but hive chose to make a 
lot of the udfs work against both types.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7177#issuecomment-117920387
  
  [Test build #36349 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36349/console)
 for   PR 7177 at commit 
[`392ae54`](https://github.com/apache/spark/commit/392ae5429f7daa6f5b06daabfde467a596162cfe).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33749410
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1299,6 +1299,44 @@ object functions {
   def rint(columnName: String): Column = rint(Column(columnName))
 
   /**
+   * Shift the the given value numBits left. Returns int for tinyint, 
smallint and int and
+   * bigint for bigint a.
+   *
+   * @group math_funcs
+   * @since 1.5.0
+   */
+  def shiftLeft(e: Column, numBits: Integer): Column = ShiftLeft(e.expr, 
lit(numBits).expr)
--- End diff --

Integer -> Int ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8769][TRIVIAL][DOCS] toLocalIterator sh...

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7171


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8740] [PROJECT INFRA] Support GitHub OA...

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7136


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8740] [PROJECT INFRA] Support GitHub OA...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7136#issuecomment-117920080
  
Merging into master thanks @JoshRosen!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749314
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -704,19 +704,46 @@ object HiveTypeCoercion {
 
   /**
* Casts types according to the expected input types for Expressions 
that have the trait
-   * [[AutoCastInputTypes]].
+   * [[ExpectsInputTypes]].
*/
   object ImplicitTypeCasts extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
-  case e: AutoCastInputTypes if e.children.map(_.dataType) != 
e.inputTypes =>
-val newC = (e.children, e.children.map(_.dataType), 
e.inputTypes).zipped.map {
-  case (child, actual, expected) =>
-if (actual == expected) child else Cast(child, expected)
+  case e: ExpectsInputTypes =>
+val children: Seq[Expression] = e.children.zip(e.inputTypes).map { 
case (in, expected) =>
+  implicitCast(in, expected)
 }
-e.withNewChildren(newC)
+e.withNewChildren(children)
+}
+
+/**
+ * If needed, cast the expression into the expected type.
+ * If the implicit cast is not allowed, return the expression itself.
+ */
+def implicitCast(e: Expression, expectedType: AbstractDataType): 
Expression = {
+  (e, expectedType) match {
+// Cast null type (usually from null literals) into target types
+case (in @ NullType(), target: DataType) => Cast(in, 
target.defaultConcreteType)
+
+// Implicit cast among numeric types
+case (in @ NumericType(), target: NumericType) if in.dataType != 
target =>
+  Cast(in, target)
+
+// Implicit cast between date time types
+case (in @ DateType(), TimestampType) => Cast(in, TimestampType)
+case (in @ TimestampType(), DateType) => Cast(in, DateType)
+
+// Implicit from string to atomic types, and vice versa
+case (in @ StringType(), target: AtomicType) if target != 
StringType =>
+  Cast(in, target.defaultConcreteType)
+case (in, StringType) if in.dataType != StringType =>
--- End diff --

I don't think that makes sense w.r.t. "semantic context".

As a matter of fact, most mature databases have well defined semantics for 
implicit type casting. If we make numeric type -> stringtype an implicit cast 
rule, we should have that for all expressions. If we don't want that implicit 
cast rule, then we should never do the implicit cast. I'd argue that having 
some expressions doing implicit casting and some not doing it is extremely 
confusing. 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8769][TRIVIAL][DOCS] toLocalIterator sh...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7171#issuecomment-117919970
  
Merging into master 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8771][TRIVIAL] Add a version to the dep...

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7172


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5016][MLLib] Distribute GMM mixture com...

2015-07-01 Thread feynmanliang

Github user feynmanliang commented on the pull request:

https://github.com/apache/spark/pull/7166#issuecomment-117919850
  
You're totally right; sorry about that! Totally slipped my mind. Will do
next time.

On Wed, Jul 1, 2015 at 9:01 PM Manoj Kumar  wrote:

> Thanks for picking up on this. Next time it might be better to just
> cherry-pick the commits from the branch as contributor and commit history
> is not lost.
>
> â
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8223][SPARK-8224][SQL] shift left and s...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7178#discussion_r33749252
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -412,6 +412,32 @@ def sha2(col, numBits):
 return Column(jc)
 
 
+@since(1.5)
+def shiftLeft(col, numBits):
+"""Shift the the given value numBits left. Returns int for tinyint, 
smallint and int and
+bigint for bigint a.
--- End diff --

Cannot understand the last sentence


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8771][TRIVIAL] Add a version to the dep...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7172#issuecomment-117919790
  
Merging into master 1.4. Thanks @holdenk.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7069


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8591][CORE]Block failed to unroll to me...

2015-07-01 Thread dibbhatt

Github user dibbhatt commented on a diff in the pull request:

https://github.com/apache/spark/pull/6990#discussion_r33749161
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -833,8 +833,10 @@ private[spark] class BlockManager(
 logDebug("Put block %s locally took %s".format(blockId, 
Utils.getUsedTimeMs(startTimeMs)))
 
 // Either we're storing bytes and we asynchronously started 
replication, or we're storing
-// values and need to serialize and replicate them now:
-if (putLevel.replication > 1) {
+// values and need to serialize and replicate them now.
+// Should not replicate the block if its StorageLevel is 
StorageLevel.NONE or
+// putting it to local is failed.
+if (!putBlockInfo.isFailed && putLevel.replication > 1) {
--- End diff --

The problem here is , if local memory got filled up and block store failed, 
blocks still get replicated to remote and used up memory but same blocks never 
used in Streaming jobs... Even though those blocks will eventually evicted , 
but this fix will optimize the memory.  I understand your concern about RDD 
partition which can still use the remote replica for speedup even local store 
failed. 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8688][YARN]Bug fix: disable the cache f...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7069#issuecomment-117919596
  
Alright then, I'm merging this into master 1.4.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8754][YARN] YarnClientSchedulerBackend ...

2015-07-01 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7153


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 ---
@@ -704,19 +704,46 @@ object HiveTypeCoercion {
 
   /**
* Casts types according to the expected input types for Expressions 
that have the trait
-   * [[AutoCastInputTypes]].
+   * [[ExpectsInputTypes]].
*/
   object ImplicitTypeCasts extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan 
transformAllExpressions {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
-  case e: AutoCastInputTypes if e.children.map(_.dataType) != 
e.inputTypes =>
-val newC = (e.children, e.children.map(_.dataType), 
e.inputTypes).zipped.map {
-  case (child, actual, expected) =>
-if (actual == expected) child else Cast(child, expected)
+  case e: ExpectsInputTypes =>
+val children: Seq[Expression] = e.children.zip(e.inputTypes).map { 
case (in, expected) =>
+  implicitCast(in, expected)
 }
-e.withNewChildren(newC)
+e.withNewChildren(children)
+}
+
+/**
+ * If needed, cast the expression into the expected type.
+ * If the implicit cast is not allowed, return the expression itself.
+ */
+def implicitCast(e: Expression, expectedType: AbstractDataType): 
Expression = {
+  (e, expectedType) match {
+// Cast null type (usually from null literals) into target types
+case (in @ NullType(), target: DataType) => Cast(in, 
target.defaultConcreteType)
+
+// Implicit cast among numeric types
+case (in @ NumericType(), target: NumericType) if in.dataType != 
target =>
+  Cast(in, target)
+
+// Implicit cast between date time types
+case (in @ DateType(), TimestampType) => Cast(in, TimestampType)
+case (in @ TimestampType(), DateType) => Cast(in, DateType)
+
+// Implicit from string to atomic types, and vice versa
+case (in @ StringType(), target: AtomicType) if target != 
StringType =>
+  Cast(in, target.defaultConcreteType)
+case (in, StringType) if in.dataType != StringType =>
--- End diff --

Agree with @yhuai, we should consider more of the semantic context when 
cast to `StringType`.
e.g. `trim(value)`, the `value` should be exact the `StringType`.
   but `repeat(value, times)`, the `value` should be `StringType` or even 
any of the `NumericType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7175#issuecomment-117919036
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/7175#discussion_r33749127
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -174,8 +171,7 @@ case class Sha1(child: Expression) extends 
UnaryExpression with AutoCastInputTyp
  * A function that computes a cyclic redundancy check value and returns it 
as a bigint
  * For input of type [[BinaryType]]
  */
-case class Crc32(child: Expression)
-  extends UnaryExpression with AutoCastInputTypes {
+case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
--- End diff --

Crc32 should be able to work with StringType, but StringType cannot be 
implicit casted BinaryType, right ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8772][SQL] Implement implicit type cast...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7175#issuecomment-117918991
  
  [Test build #36336 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36336/console)
 for   PR 7175 at commit 
[`f0ff97f`](https://github.com/apache/spark/commit/f0ff97feeb3fc6a9c41a0ec6dc7a1daf1230dad6).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait ExpectsInputTypes `
  * `abstract class BinaryExpression extends Expression with 
trees.BinaryNode[Expression] `
  * `abstract class BinaryOperator extends BinaryExpression `
  * `abstract class BinaryArithmetic extends BinaryOperator `
  * `case class Md5(child: Expression) extends UnaryExpression with 
ExpectsInputTypes `
  * `case class Sha1(child: Expression) extends UnaryExpression with 
ExpectsInputTypes `
  * `case class Crc32(child: Expression) extends UnaryExpression with 
ExpectsInputTypes `
  * `case class Not(child: Expression) extends UnaryExpression with 
Predicate with ExpectsInputTypes `
  * `abstract class BinaryComparison extends BinaryOperator with Predicate `
  * `trait StringRegexExpression extends ExpectsInputTypes `
  * `trait CaseConversionExpression extends ExpectsInputTypes `
  * `trait StringComparison extends ExpectsInputTypes `
  * `case class StringLength(child: Expression) extends UnaryExpression 
with ExpectsInputTypes `
  * `protected[sql] abstract class AtomicType extends DataType `
  * `abstract class NumericType extends AtomicType `
  * `abstract class DataType extends AbstractDataType `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8754][YARN] YarnClientSchedulerBackend ...

2015-07-01 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/7153#issuecomment-117919170
  
Merging into master 1.4. Thanks @devaraj-kavali 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8708][MLlib] Paritition ALS ratings bas...

2015-07-01 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/7121#discussion_r33748863
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -88,12 +88,25 @@ class MatrixFactorizationModel(
* @return RDD of Ratings.
*/
   def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
-val users = userFeatures.join(usersProducts).map {
-  case (user, (uFeatures, product)) => (product, (user, uFeatures))
-}
-users.join(productFeatures).map {
-  case (product, ((user, uFeatures), pFeatures)) =>
-Rating(user, product, blas.ddot(uFeatures.length, uFeatures, 1, 
pFeatures, 1))
+val usersCount = usersProducts.keys.countApproxDistinct()
--- End diff --

Btw, if we really want to optimize number of passes. We can copy the code 
from 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1170
 and use one pass to count both columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7177#issuecomment-117918101
  
  [Test build #36349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/36349/consoleFull)
 for   PR 7177 at commit 
[`392ae54`](https://github.com/apache/spark/commit/392ae5429f7daa6f5b06daabfde467a596162cfe).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7177#issuecomment-117918010
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8765] [MLlib] Fix PySpark PowerIteratio...

2015-07-01 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7177#issuecomment-117917993
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1142 matches

Mail list logo