date:20170106

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054651
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
--- End diff --

`findEqual(target, children.tail, index=1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054605
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
--- End diff --

What is the purpose of these checks?

Based on MySQL's `field` function, the type casting rules is described as
```
If all arguments to FIELD() are strings, all arguments are compared as 
strings. If all arguments are numbers, they are compared as numbers. Otherwise, 
the arguments are compared as double.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054575
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 ---
@@ -137,4 +139,48 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 checkEvaluation(CaseKeyWhen(c6, Seq(c5, c2, c4, c3)), null, row)
 checkEvaluation(CaseKeyWhen(literalNull, Seq(c2, c5, c1, c6)), null, 
row)
   }
+
+  test("case field") {
+val str1 = Literal("è±è±ä¸ç")
+val str2 = Literal("a")
+val str3 = Literal("b")
+val str4 = Literal("")
+val str5 = Literal("999")
+val strNull = Literal.create(null, StringType)
+
+val bool1 = Literal(true)
+val bool2 = Literal(false)
+
+val int1 = Literal(1)
+val int2 = Literal(2)
+val int3 = Literal(3)
+val int4 = Literal(999)
+val intNull = Literal.create(null, IntegerType)
+
+val double1 = Literal(1.221)
+val double2 = Literal(1.222)
+val double3 = Literal(1.224)
+
+val timeStamp1 = Literal(new Timestamp(2016, 12, 27, 14, 22, 1, 1))
+val timeStamp2 = Literal(new Timestamp(1988, 6, 3, 1, 1, 1, 1))
+val timeStamp3 = Literal(new Timestamp(1990, 6, 5, 1, 1, 1, 1))
+
+val date1 = Literal(new Date(1949, 1, 1))
+val date2 = Literal(new Date(1979, 1, 1))
+val date3 = Literal(new Date(1989, 1, 1))
+
+checkEvaluation(Field(Seq(str1, str2, str3, str1)), 3)
+checkEvaluation(Field(Seq(str2, str2, str2, str1)), 1)
+checkEvaluation(Field(Seq(str4, str4, str4, str1)), 1)
+checkEvaluation(Field(Seq(bool1, bool2, bool1, bool1)), 2)
+checkEvaluation(Field(Seq(int1, int2, int3, int1)), 3)
+checkEvaluation(Field(Seq(double2, double3, double1, double2)), 3)
+checkEvaluation(Field(Seq(timeStamp1, timeStamp2, timeStamp3, 
timeStamp1)), 3)
+checkEvaluation(Field(Seq(date1, date1, date2, date3)), 1)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int4)), 6)
+checkEvaluation(Field(Seq(str5, str1, str2, str4)), 0)
+checkEvaluation(Field(Seq(int4, double3, str5, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(int1, strNull, intNull, bool1, date1, 
timeStamp2, int3)), 0)
+checkEvaluation(Field(Seq(strNull, int1, str1, str2, str3)), 0)
--- End diff --

This is to test `null`. Could you add the description? 
```
If the search string is NULL, the return value is 0 because NULL fails 
equality comparison with any value
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054547
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
+  """)
+case class Field(children: Seq[Expression]) extends Expression {
+
+  override def nullable: Boolean = false
+  override def foldable: Boolean = children.forall(_.foldable)
+
+  private lazy val ordering = 
TypeUtils.getInterpretedOrdering(children(0).dataType)
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children.length <= 1) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 
arguments")
+} else if (!children.forall(_.dataType.isInstanceOf[AtomicType])) {
+  TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to 
be of AtomicType")
+} else
+  TypeCheckResult.TypeCheckSuccess
+  }
+
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: InternalRow): Any = {
+val target = children.head.eval(input)
+val targetDataType = children.head.dataType
+def findEqual(target: Any, params: Seq[Expression], index: Int): Int = 
{
+  params.toList match {
+case Nil => 0
+case head::tail if targetDataType == head.dataType
+  && head.eval(input) != null && ordering.equiv(target, 
head.eval(input)) => index
+case _ => findEqual(target, params.tail, index + 1)
+  }
+}
+if(target == null)
+  0
+else
+  findEqual(target, children.tail, 1)
--- End diff --

Could you fix the style, based on 
https://github.com/databricks/scala-style-guide#curly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054508
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
+  extended = """
+Examples:
+  > SELECT _FUNC_(10, 9, 3, 10, 4);
+   3
--- End diff --

More examples please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16476#discussion_r95054387
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -340,3 +341,91 @@ object CaseKeyWhen {
 CaseWhen(cases, elseValue)
   }
 }
+
+/**
+ * A function that returns the index of str in (str1, str2, ...) list or 0 
if not found.
+ * It takes at least 2 parameters, and all parameters' types should be 
subtypes of AtomicType.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(str, str1, str2, ...) - Returns the index of str in the 
str1,str2,... or 0 if not found.",
--- End diff --

Can we use `expr1, expr2, expr3` here? The type can be any atomic type? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #71006 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71006/testReport)**
 for PR 16493 at commit 
[`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread dilipbiswal

Github user dilipbiswal commented on the issue:

https://github.com/apache/spark/pull/16493
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16493
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71005/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #71005 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71005/testReport)**
 for PR 16493 at commit 
[`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16493
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16400#discussion_r95053274
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1362,6 +1362,13 @@ options.
  - Dataset and DataFrame API `explode` has been deprecated, alternatively, 
use `functions.explode()` with `select` or `flatMap`
  - Dataset and DataFrame API `registerTempTable` has been deprecated and 
replaced by `createOrReplaceTempView`
 
+ - Changes to `CREATE TABLE ... LOCATION` behavior.
+- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE 
EXTERNAL TABLE ... LOCATION`
+  in order to prevent accidental dropping the existing data in the 
user-provided locations.
+  Please see 
[SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276) for details.
+- As a result, `DROP TABLE` statements on those tables will not remove 
the data.
+  Note that this is different than the Hive behavior.
--- End diff --

Now, we can remove this sentence. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16400#discussion_r95053222
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1362,6 +1362,13 @@ options.
  - Dataset and DataFrame API `explode` has been deprecated, alternatively, 
use `functions.explode()` with `select` or `flatMap`
  - Dataset and DataFrame API `registerTempTable` has been deprecated and 
replaced by `createOrReplaceTempView`
 
+ - Changes to `CREATE TABLE ... LOCATION` behavior.
+- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE 
EXTERNAL TABLE ... LOCATION`
+  in order to prevent accidental dropping the existing data in the 
user-provided locations.
--- End diff --

Also add two more sentences here. `That means, a Hive table created in 
Spark SQL with the user-specified location is a Hive external table. Users are 
not allowed to specify the location for Hive managed tables. `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16493
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71004/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16493
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #71004 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71004/testReport)**
 for PR 16493 at commit 
[`f9f0b01`](https://github.com/apache/spark/commit/f9f0b01e5cf6e8a6a212324686e82b0c4bf1b5fc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16488: [MINOR] Bump R version to 2.2.0.

2017-01-06 Thread shivaram

Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/16488
  
Yeah I think it does get automatically updated during the release but its 
good to keep this in sync this just for the development builds etc. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #71005 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71005/testReport)**
 for PR 16493 at commit 
[`3c779d5`](https://github.com/apache/spark/commit/3c779d59fa54f9ed62a3ebef260b097695c0eff1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-06 Thread actuaryzhang

Github user actuaryzhang commented on the issue:

https://github.com/apache/spark/pull/16344
  
@srowen @yanboliang 
I'm closing this PR since it does not seem to be very clean to integrate 
into the current GLM setup. I appreciate all the comments and discussions. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16400#discussion_r95052715
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1362,6 +1362,13 @@ options.
  - Dataset and DataFrame API `explode` has been deprecated, alternatively, 
use `functions.explode()` with `select` or `flatMap`
  - Dataset and DataFrame API `registerTempTable` has been deprecated and 
replaced by `createOrReplaceTempView`
 
+ - Changes to `CREATE TABLE ... LOCATION` behavior.
+- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE 
EXTERNAL TABLE ... LOCATION`
+  in order to prevent accidental dropping the existing data in the 
user-provided locations.
--- End diff --

wait. I need to rephrase it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16400#discussion_r95052705
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1362,6 +1362,13 @@ options.
  - Dataset and DataFrame API `explode` has been deprecated, alternatively, 
use `functions.explode()` with `select` or `flatMap`
  - Dataset and DataFrame API `registerTempTable` has been deprecated and 
replaced by `createOrReplaceTempView`
 
+ - Changes to `CREATE TABLE ... LOCATION` behavior.
+- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE 
EXTERNAL TABLE ... LOCATION`
+  in order to prevent accidental dropping the existing data in the 
user-provided locations.
+  Please see 
[SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276) for details.
--- End diff --

Nit: No need to show the JIRA here. Please remove it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16400#discussion_r95052691
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1362,6 +1362,13 @@ options.
  - Dataset and DataFrame API `explode` has been deprecated, alternatively, 
use `functions.explode()` with `select` or `flatMap`
  - Dataset and DataFrame API `registerTempTable` has been deprecated and 
replaced by `createOrReplaceTempView`
 
+ - Changes to `CREATE TABLE ... LOCATION` behavior.
+- From Spark 2.0, `CREATE TABLE ... LOCATION` is equivalent to `CREATE 
EXTERNAL TABLE ... LOCATION`
+  in order to prevent accidental dropping the existing data in the 
user-provided locations.
--- End diff --

Also add one more sentence here. `Users are not allowed to specify the 
location for managed tables.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM

2017-01-06 Thread actuaryzhang

Github user actuaryzhang closed the pull request at:

https://github.com/apache/spark/pull/16344


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16400: [SPARK-18941][SQL][DOC] Add a new behavior docume...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16400#discussion_r95052654
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1362,6 +1362,13 @@ options.
  - Dataset and DataFrame API `explode` has been deprecated, alternatively, 
use `functions.explode()` with `select` or `flatMap`
  - Dataset and DataFrame API `registerTempTable` has been deprecated and 
replaced by `createOrReplaceTempView`
 
+ - Changes to `CREATE TABLE ... LOCATION` behavior.
--- End diff --

`behavior.` -> `behavior for Hive tables.`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-17931][CORE] taskScheduler has some unneeded seri...

2017-01-06 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/15505
  
@kayousterhout 
Okay, I'll do the code revision this weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16439: [SPARK-19026]SPARK_LOCAL_DIRS(multiple directorie...

2017-01-06 Thread zuotingbing

Github user zuotingbing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16439#discussion_r95052089
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -445,12 +445,24 @@ private[deploy] class Worker(
   // Create local dirs for the executor. These are passed to the 
executor via the
   // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
   // application finishes.
-  val appLocalDirs = appDirectories.getOrElse(appId,
-Utils.getOrCreateLocalRootDirs(conf).map { dir =>
-  val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
-  Utils.chmod700(appDir)
-  appDir.getAbsolutePath()
-}.toSeq)
+  val appLocalDirs = appDirectories.getOrElse(appId, {
+val dirs = Utils.getOrCreateLocalRootDirs(conf).flatMap { dir 
=>
+  try {
+val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
+Utils.chmod700(appDir)
+Some(appDir.getAbsolutePath())
+  } catch {
+case e: IOException =>
+  logWarning(s"${e.getMessage}. Ignoring this directory.")
+  None
+  }
+}.toSeq
+if (dirs.isEmpty) {
+  throw new IOException("None subfolder can be created in " +
+s"${Utils.getOrCreateLocalRootDirs(conf).mkString(",")}.")
--- End diff --

Thanks vanzin. i will commit it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16439: [SPARK-19026]SPARK_LOCAL_DIRS(multiple directorie...

2017-01-06 Thread zuotingbing

Github user zuotingbing commented on a diff in the pull request:

https://github.com/apache/spark/pull/16439#discussion_r95052088
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala 
---
@@ -445,12 +445,24 @@ private[deploy] class Worker(
   // Create local dirs for the executor. These are passed to the 
executor via the
   // SPARK_EXECUTOR_DIRS environment variable, and deleted by the 
Worker when the
   // application finishes.
-  val appLocalDirs = appDirectories.getOrElse(appId,
-Utils.getOrCreateLocalRootDirs(conf).map { dir =>
-  val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
-  Utils.chmod700(appDir)
-  appDir.getAbsolutePath()
-}.toSeq)
+  val appLocalDirs = appDirectories.getOrElse(appId, {
+val dirs = Utils.getOrCreateLocalRootDirs(conf).flatMap { dir 
=>
+  try {
+val appDir = Utils.createDirectory(dir, namePrefix = 
"executor")
+Utils.chmod700(appDir)
+Some(appDir.getAbsolutePath())
+  } catch {
+case e: IOException =>
+  logWarning(s"${e.getMessage}. Ignoring this directory.")
+  None
+  }
+}.toSeq
+if (dirs.isEmpty) {
+  throw new IOException("None subfolder can be created in " +
--- End diff --

Thanks vanzin. i will commit it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95052038
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
+
+  // Scalar subquery and predicate subquery
+  val cachedPlan2 =
+sql(
+  """
+|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
+|WHERE
+|c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
+|OR
+|EXISTS (SELECT c1 FROM t3)
+|OR
+|c1 IN (SELECT c1 FROM t4)
+  """.stripMargin).queryExecution.optimizedPlan
+
+
+  val cachedRelations = 
scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
+  cachedRelations += getCachedPlans(cachedPlan2)
+  cachedPlan2 transformAllExpressions {
+case e: SubqueryExpression => cachedRelations += 
getCachedPlans(e.plan)
+  e
+  }
+  assert(cachedRelations.flatten.size == 4)
+
+  spark.catalog.uncacheTable("t1")
+  spark.catalog.uncacheTable("t2")
+  spark.catalog.uncacheTable("t3")
+  spark.catalog.uncacheTable("t4")
--- End diff --

@gatorsmile sorry.. missed this one .. Will make the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95051560
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +577,67 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+
+
--- End diff --

Nit: remove these two lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16495
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95051550
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +577,67 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
--- End diff --

The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16495
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71003/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16495
  
**[Test build #71003 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71003/testReport)**
 for PR 16495 at commit 
[`d3e2dad`](https://github.com/apache/spark/commit/d3e2dadaa767bd3fec10ba329625ceaa5ccabcbb).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95051546
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
+
+  // Scalar subquery and predicate subquery
+  val cachedPlan2 =
+sql(
+  """
+|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
+|WHERE
+|c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
+|OR
+|EXISTS (SELECT c1 FROM t3)
+|OR
+|c1 IN (SELECT c1 FROM t4)
+  """.stripMargin).queryExecution.optimizedPlan
+
+
+  val cachedRelations = 
scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
+  cachedRelations += getCachedPlans(cachedPlan2)
+  cachedPlan2 transformAllExpressions {
+case e: SubqueryExpression => cachedRelations += 
getCachedPlans(e.plan)
+  e
+  }
+  assert(cachedRelations.flatten.size == 4)
+
+  spark.catalog.uncacheTable("t1")
+  spark.catalog.uncacheTable("t2")
+  spark.catalog.uncacheTable("t3")
+  spark.catalog.uncacheTable("t4")
--- End diff --

How about this? @dilipbiswal 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #71004 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71004/testReport)**
 for PR 16493 at commit 
[`f9f0b01`](https://github.com/apache/spark/commit/f9f0b01e5cf6e8a6a212324686e82b0c4bf1b5fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95051398
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -131,6 +132,12 @@ class CacheManager extends Logging {
 
   /** Replaces segments of the given logical plan with cached versions 
where possible. */
   def useCachedData(plan: LogicalPlan): LogicalPlan = {
+useCachedDataInternal(plan) transformAllExpressions {
+  case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
+}
+  }
+
+  private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = {
--- End diff --

@gatorsmile Thank you very much. I have addressed your comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15119
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71001/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15119
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15119
  
**[Test build #71001 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71001/testReport)**
 for PR 15119 at commit 
[`7a7b6ba`](https://github.com/apache/spark/commit/7a7b6ba213e57a705642d42220037a7b9a18e3a6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95050805
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -131,6 +132,12 @@ class CacheManager extends Logging {
 
   /** Replaces segments of the given logical plan with cached versions 
where possible. */
   def useCachedData(plan: LogicalPlan): LogicalPlan = {
+useCachedDataInternal(plan) transformAllExpressions {
+  case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
+}
+  }
+
+  private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = {
--- End diff --

@gatorsmile Sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95050799
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
+
+  // Scalar subquery and predicate subquery
+  val cachedPlan2 =
+sql(
+  """
+|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
+|WHERE
+|c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
+|OR
+|EXISTS (SELECT c1 FROM t3)
+|OR
+|c1 IN (SELECT c1 FROM t4)
+  """.stripMargin).queryExecution.optimizedPlan
+
+
+  val cachedRelations = 
scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
+  cachedRelations += getCachedPlans(cachedPlan2)
+  cachedPlan2 transformAllExpressions {
+case e: SubqueryExpression => cachedRelations += 
getCachedPlans(e.plan)
+  e
+  }
+  assert(cachedRelations.flatten.size == 4)
--- End diff --

@gatorsmile Thanks... I will make the change


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95050745
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala ---
@@ -131,6 +132,12 @@ class CacheManager extends Logging {
 
   /** Replaces segments of the given logical plan with cached versions 
where possible. */
   def useCachedData(plan: LogicalPlan): LogicalPlan = {
+useCachedDataInternal(plan) transformAllExpressions {
+  case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
+}
+  }
+
+  private def useCachedDataInternal(plan: LogicalPlan): LogicalPlan = {
--- End diff --

After rethinking about it, we do not need to add a new function. We can 
combine them into a single function, like:
```Scala
  /** Replaces segments of the given logical plan with cached versions 
where possible. */
  def useCachedData(plan: LogicalPlan): LogicalPlan = {
val newPlan = plan transformDown {
  case currentFragment =>
lookupCachedData(currentFragment)
  .map(_.cachedRepresentation.withOutput(currentFragment.output))
  .getOrElse(currentFragment)
}

newPlan transformAllExpressions {
  case s: SubqueryExpression => s.withNewPlan(useCachedData(s.plan))
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16485: [SPARK-19099] correct the wrong time display in history ...

2017-01-06 Thread 351zyf

Github user 351zyf commented on the issue:

https://github.com/apache/spark/pull/16485
  
But the time display on history server web UI is not correct. It is 8 hours 
eralier than the actual time here.

Am I using the wrong configuration ? 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95050711
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
--- End diff --

Then, this can be simplified to 
```Scala
assert (getNumInMemoryRelations(cachedPlan2) == 3)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95050708
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
+
+  // Scalar subquery and predicate subquery
+  val cachedPlan2 =
+sql(
+  """
+|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
+|WHERE
+|c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
+|OR
+|EXISTS (SELECT c1 FROM t3)
+|OR
+|c1 IN (SELECT c1 FROM t4)
+  """.stripMargin).queryExecution.optimizedPlan
+
+
+  val cachedRelations = 
scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
+  cachedRelations += getCachedPlans(cachedPlan2)
+  cachedPlan2 transformAllExpressions {
+case e: SubqueryExpression => cachedRelations += 
getCachedPlans(e.plan)
+  e
+  }
+  assert(cachedRelations.flatten.size == 4)
--- End diff --

Then, this can be simplified to 
```Scala
assert (getNumInMemoryRelations(cachedPlan2) == 4)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16493
  
In the test suite, we can have such a helper function to count 
`InMemoryRelation`
```Scala
  private def getNumInMemoryRelations(plan: LogicalPlan): Int = {
var sum = plan.collect { case _: InMemoryRelation => 1 }.sum
plan.transformAllExpressions {
  case e: SubqueryExpression =>
sum += getNumInMemoryRelations(e.plan)
e
}
sum
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16480: [SPARK-18194][ML] Log instrumentation in OneVsRes...

2017-01-06 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16480


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15413: [SPARK-17847][ML] Reduce shuffled data size of GaussianM...

2017-01-06 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/15413
  
OK, I'll just wait so @sethah can make a final pass and so @yanboliang can 
merge the 2 tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16454: [SPARK-19055][SQL][PySpark] Fix SparkSession init...

2017-01-06 Thread holdenk

Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/16454#discussion_r95050517
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -1886,6 +1887,28 @@ def test_hivecontext(self):
 self.assertTrue(os.path.exists(metastore_path))
 
 
+class SQLTests2(ReusedPySparkTestCase):
--- End diff --

Is there any particular reason this is built on a `ReusedPySparkTestCase`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-06 Thread jkbradley

Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/16480
  
Merging with master.
Not backporting unless people request it since this memory leak is very 
minor.
Thanks @sueann !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16495: SPARK-16920: Add a stress test for evaluateEachIteration...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16495
  
**[Test build #71003 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71003/testReport)**
 for PR 16495 at commit 
[`d3e2dad`](https://github.com/apache/spark/commit/d3e2dadaa767bd3fec10ba329625ceaa5ccabcbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16495: SPARK-16920: Add a stress test for evaluateEachIt...

2017-01-06 Thread mhmoudr

GitHub user mhmoudr opened a pull request:

https://github.com/apache/spark/pull/16495

SPARK-16920: Add a stress test for evaluateEachIteration for 2000 trees

## What changes were proposed in this pull request?

Just adding a test to prove error by tree is working for 2000 trees, the 
fix of SPARK-15858
before it was failing to do the calculation after long time
 
## How was this patch tested?

Just run the test 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mhmoudr/spark SPARK-16920

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16495.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16495


commit d3e2dadaa767bd3fec10ba329625ceaa5ccabcbb
Author: Mahmoud Rawas 
Date:   2017-01-07T02:35:46Z

SPARK-16920: Add a stress test for calculating error by tree 
(evaluateEachIteration) for 2000 trees




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16493
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70999/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16493
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #70999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70999/testReport)**
 for PR 16493 at commit 
[`f733f90`](https://github.com/apache/spark/commit/f733f90325b975973e60272ba6708dff5059f9dd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16492
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16492
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70998/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16492
  
**[Test build #70998 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70998/testReport)**
 for PR 16492 at commit 
[`59a1161`](https://github.com/apache/spark/commit/59a11611999fddd0670218b16b991e691bcc574e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16138
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70997/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16138
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #70997 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70997/testReport)**
 for PR 16138 at commit 
[`1eb2ad0`](https://github.com/apache/spark/commit/1eb2ad00f4d033134d1d66d5dda24eee8cd29489).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95049506
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
+
+  // Scalar subquery and predicate subquery
+  val cachedPlan2 =
+sql(
+  """
+|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
+|WHERE
+|c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
+|OR
+|EXISTS (SELECT c1 FROM t3)
+|OR
+|c1 IN (SELECT c1 FROM t4)
+  """.stripMargin).queryExecution.optimizedPlan
+
+
+  val cachedRelations = 
scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
+  cachedRelations += getCachedPlans(cachedPlan2)
+  cachedPlan2 transformAllExpressions {
+case e: SubqueryExpression => cachedRelations += 
getCachedPlans(e.plan)
+  e
+  }
+  assert(cachedRelations.flatten.size == 4)
+
+  spark.catalog.uncacheTable("t1")
+  spark.catalog.uncacheTable("t2")
+  spark.catalog.uncacheTable("t3")
+  spark.catalog.uncacheTable("t4")
--- End diff --

```Scala
  override def afterEach(): Unit = {
try {
  clearCache()
} finally {
  super.afterEach()
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16493#discussion_r95049495
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala ---
@@ -565,4 +567,82 @@ class CachedTableSuite extends QueryTest with 
SQLTestUtils with SharedSQLContext
 case i: InMemoryRelation => i
   }.size == 1)
   }
+
+  test("SPARK-19093 Caching in side subquery") {
+withTempView("t1") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  spark.catalog.cacheTable("t1")
+  val cachedPlan =
+sql(
+  """
+|SELECT * FROM t1
+|WHERE
+|NOT EXISTS (SELECT * FROM t1)
+  """.stripMargin).queryExecution.optimizedPlan
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 2)
+  spark.catalog.uncacheTable("t1")
+}
+  }
+
+  test("SPARK-19093 scalar and nested predicate query") {
+def getCachedPlans(plan: LogicalPlan): Seq[LogicalPlan] = {
+  plan collect {
+case i: InMemoryRelation => i
+  }
+}
+withTempView("t1", "t2", "t3", "t4") {
+  Seq(1).toDF("c1").createOrReplaceTempView("t1")
+  Seq(2).toDF("c1").createOrReplaceTempView("t2")
+  Seq(1).toDF("c1").createOrReplaceTempView("t3")
+  Seq(1).toDF("c1").createOrReplaceTempView("t4")
+  spark.catalog.cacheTable("t1")
+  spark.catalog.cacheTable("t2")
+  spark.catalog.cacheTable("t3")
+  spark.catalog.cacheTable("t4")
+
+  // Nested predicate subquery
+  val cachedPlan =
+sql(
+"""
+  |SELECT * FROM t1
+  |WHERE
+  |c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE 
c1 = 1))
+""".stripMargin).queryExecution.optimizedPlan
+
+  assert(
+cachedPlan.collect {
+  case i: InMemoryRelation => i
+}.size == 3)
+
+  // Scalar subquery and predicate subquery
+  val cachedPlan2 =
+sql(
+  """
+|SELECT * FROM (SELECT max(c1) FROM t1 GROUP BY c1)
+|WHERE
+|c1 = (SELECT max(c1) FROM t2 GROUP BY c1)
+|OR
+|EXISTS (SELECT c1 FROM t3)
+|OR
+|c1 IN (SELECT c1 FROM t4)
+  """.stripMargin).queryExecution.optimizedPlan
+
+
+  val cachedRelations = 
scala.collection.mutable.MutableList.empty[Seq[LogicalPlan]]
+  cachedRelations += getCachedPlans(cachedPlan2)
+  cachedPlan2 transformAllExpressions {
+case e: SubqueryExpression => cachedRelations += 
getCachedPlans(e.plan)
+  e
+  }
+  assert(cachedRelations.flatten.size == 4)
+
+  spark.catalog.uncacheTable("t1")
+  spark.catalog.uncacheTable("t2")
+  spark.catalog.uncacheTable("t3")
+  spark.catalog.uncacheTable("t4")
--- End diff --

You can call `clearCache()` and then no need to uncache each table. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16493
  
@dilipbiswal Could you post the nested subquery in the PR description? It 
can help the other reviewers understand the fix. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16494
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16493
  
Although the test cases can be improved, the code fix looks good to me. cc 
@JoshRosen @hvanhovell 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16494
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71002/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16494
  
**[Test build #71002 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71002/testReport)**
 for PR 16494 at commit 
[`0d1c475`](https://github.com/apache/spark/commit/0d1c475c80a6fd0373108610ca8e41f7af0e6d01).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14451: [SPARK-16848][SQL] Check schema validation for user-spec...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14451
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn...

2017-01-06 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16387#discussion_r95048733
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
 ---
@@ -115,6 +115,7 @@ class ExternalAppendOnlyMap[K, V, C](
   private val keyComparator = new HashComparator[K]
   private val ser = serializer.newInstance()
 
+  @volatile private var isReadingIterator: Boolean = false
--- End diff --

Yeah, alternatively we can remove the assert and check if `readingIterator` 
is null or not. I just want to keep the original behavior.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation for pr...

2017-01-06 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16430#discussion_r95048331
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/ProjectEstimationSuite.scala
 ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.statsEstimation
+
+import org.apache.spark.sql.catalyst.expressions.{Alias, AttributeMap, 
AttributeReference}
+import org.apache.spark.sql.catalyst.plans.logical._
+import 
org.apache.spark.sql.catalyst.plans.logical.statsEstimation.EstimationUtils._
+import org.apache.spark.sql.types.IntegerType
+
+
+class ProjectEstimationSuite extends StatsEstimationTestBase {
+
+  test("estimate project with alias") {
+val ar1 = AttributeReference("key1", IntegerType)()
+val ar2 = AttributeReference("key2", IntegerType)()
+val colStat1 = ColumnStat(2, Some(1), Some(2), 0, 4, 4)
+val colStat2 = ColumnStat(1, Some(10), Some(10), 0, 4, 4)
+
+val child = StatsTestPlan(
+  outputList = Seq(ar1, ar2),
+  stats = Statistics(
+sizeInBytes = 2 * (4 + 4),
+rowCount = Some(2),
+attributeStats = AttributeMap(Seq(ar1 -> colStat1, ar2 -> 
colStat2
+
+val project = Project(Seq(ar1, Alias(ar2, "abc")()), child)
+val expectedColStats = Seq("key1" -> colStat1, "abc" -> colStat2)
+val expectedAttrStats = toAttributeMap(expectedColStats, project)
+// The number of rows won't change for project.
+val expectedStats = Statistics(
+  sizeInBytes = 2 * getRowSize(project.output, expectedAttrStats),
--- End diff --

I tested getRowSize for int type. But yes, we should have a separate test 
for this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory ...

2017-01-06 Thread merlintang

Github user merlintang closed the pull request at:

https://github.com/apache/spark/pull/15819


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-06 Thread merlintang

Github user merlintang commented on the issue:

https://github.com/apache/spark/pull/15819
  
Many thanks, Xiao. I learnt lots. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
@merlintang Can you close this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15819: [SPARK-18372][SQL][Branch-1.6].Staging directory fail to...

2017-01-06 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15819
  
Thanks! Merging to 1.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16492
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16492
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70995/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16492
  
**[Test build #70995 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70995/testReport)**
 for PR 16492 at commit 
[`080a269`](https://github.com/apache/spark/commit/080a2698928366e4a17d165cebebf4f44c797f40).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-06 Thread imatiach-msft

Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/16494
  
@jkbradley @vanzin @skyluc @luluorta @uncleGen @kanzhang Could you please 
take a look at this pull request to fix the method fromEdges in EdgeRDD class 
used by LDA?  Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16480
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...

2017-01-06 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16491#discussion_r95047025
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala ---
@@ -260,6 +260,14 @@ class LDASuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultRead
 Vectors.dense(model2.topicsMatrix.toArray) absTol 1e-6)
   assert(Vectors.dense(model.getDocConcentration) ~==
 Vectors.dense(model2.getDocConcentration) absTol 1e-6)
+  val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
+  val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
+  val trainingLogLikelihood =
+model.asInstanceOf[DistributedLDAModel].trainingLogLikelihood
+  val trainingLogLikelihood2 =
+model2.asInstanceOf[DistributedLDAModel].trainingLogLikelihood
+  assert(logPrior ~== logPrior2 absTol 1e-6)
+  assert(trainingLogLikelihood ~== trainingLogLikelihood2 absTol 1e-6)
--- End diff --

Ok, I guess I remember this wrong because of the other PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16480
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70996/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16480
  
**[Test build #70996 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70996/testReport)**
 for PR 16480 at commit 
[`0034461`](https://github.com/apache/spark/commit/00344616edfcc11d48fee5775186f26c3d49b118).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with Cla...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16494
  
**[Test build #71002 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71002/testReport)**
 for PR 16494 at commit 
[`0d1c475`](https://github.com/apache/spark/commit/0d1c475c80a6fd0373108610ca8e41f7af0e6d01).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16494: [SPARK-17975][MLLIB] Fix EMLDAOptimizer failing w...

2017-01-06 Thread imatiach-msft

GitHub user imatiach-msft opened a pull request:

https://github.com/apache/spark/pull/16494

[SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with ClassCastException

## What changes were proposed in this pull request?

LDA fails with a ClassCastException when run on a dataset with at least one 
row that contains an empty sparse vector.  The error occurs in method fromEdges 
where one of the edges may already be an EdgeRDDImpl and it does not need to be 
converted.

## How was this patch tested?

I first ran LDA on the dataset provided by the JIRA submitter and I was 
able to reproduce the issue.  I then fixed the issue based on the submitter's 
suggestion and simplified the test case so that we wouldn't need to read in a 
file.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/imatiach-msft/spark ilmat/fix-EMLDA

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16494.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16494


commit 0a201b713276b92a20db330ccd20b9a562694f5a
Author: Ilya Matiach 
Date:   2017-01-06T21:36:19Z

adding test case to reproduce the error

commit 66dbfea60fec23fb8b39e23adf1861cfa02d7d42
Author: Ilya Matiach 
Date:   2017-01-07T00:42:49Z

[SPARK-17975][MLLIB] Fix EMLDAOptimizer failing with ClassCastException

commit 0d1c475c80a6fd0373108610ca8e41f7af0e6d01
Author: Ilya Matiach 
Date:   2017-01-07T01:04:40Z

Optimizing test case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-06 Thread BryanCutler

Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/15119
  
@vanzin and @themodernlife , I have this fixed up to use the additional 
repositories when loading ivy settings from a file.  Added a new test for 
loading a settings and fixed up the docs for `spark.jars.ivy` - I agree that 
the name is confusing, but hopefully clear in the docs.  Thanks @themodernlife 
for helping out with the docs too.  If you're able to try out this latest 
revision too, that would be great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16473
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16473
  
**[Test build #71000 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71000/testReport)**
 for PR 16473 at commit 
[`a1c0e59`](https://github.com/apache/spark/commit/a1c0e59bd7c5b139c2a682603a1fc4ca8ad211b1).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...

2017-01-06 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16473
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71000/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16491: [SPARK-19110][ML][MLLIB]:DistributedLDAModel retu...

2017-01-06 Thread wangmiao1981

Github user wangmiao1981 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16491#discussion_r95045854
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala ---
@@ -260,6 +260,14 @@ class LDASuite extends SparkFunSuite with 
MLlibTestSparkContext with DefaultRead
 Vectors.dense(model2.topicsMatrix.toArray) absTol 1e-6)
   assert(Vectors.dense(model.getDocConcentration) ~==
 Vectors.dense(model2.getDocConcentration) absTol 1e-6)
+  val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior
+  val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior
+  val trainingLogLikelihood =
+model.asInstanceOf[DistributedLDAModel].trainingLogLikelihood
+  val trainingLogLikelihood2 =
+model2.asInstanceOf[DistributedLDAModel].trainingLogLikelihood
+  assert(logPrior ~== logPrior2 absTol 1e-6)
+  assert(trainingLogLikelihood ~== trainingLogLikelihood2 absTol 1e-6)
--- End diff --

`LocalLDAModel` doesn't extend `DistributedLDAModel` and vice versa. I am 
not clear how to check `trainingLogLikelihood ` and `logPrior` in 
`LocalLDAModel`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15119: [SPARK-17568][CORE][DEPLOY] Add spark-submit option to o...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15119
  
**[Test build #71001 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71001/testReport)**
 for PR 15119 at commit 
[`7a7b6ba`](https://github.com/apache/spark/commit/7a7b6ba213e57a705642d42220037a7b9a18e3a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16493: [SPARK-19093][SQL] Cached tables are not used in Subquer...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16493
  
**[Test build #70999 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70999/testReport)**
 for PR 16493 at commit 
[`f733f90`](https://github.com/apache/spark/commit/f733f90325b975973e60272ba6708dff5059f9dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16473
  
**[Test build #71000 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71000/testReport)**
 for PR 16473 at commit 
[`a1c0e59`](https://github.com/apache/spark/commit/a1c0e59bd7c5b139c2a682603a1fc4ca8ad211b1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...

2017-01-06 Thread vanzin

Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16473
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16493: [SPARK-19093][SQL] Cached tables are not used in ...

2017-01-06 Thread dilipbiswal

GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/16493

[SPARK-19093][SQL] Cached tables are not used in SubqueryExpression

## What changes were proposed in this pull request?
Consider the plans inside subquery expressions while looking up cache 
manager to make
used of cached data. Currently CacheManager.useCachedData does not consider 
the
subquery expressions in the plan.

## How was this patch tested?
Added new tests in CachedTableSuite.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark SPARK-19093

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16493.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16493


commit f733f90325b975973e60272ba6708dff5059f9dd
Author: Dilip Biswal 
Date:   2017-01-07T00:18:23Z

[SPARK-19093] Cached tables are not used in SubqueryExpression




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn...

2017-01-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16387#discussion_r95044381
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala
 ---
@@ -115,6 +115,7 @@ class ExternalAppendOnlyMap[K, V, C](
   private val keyComparator = new HashComparator[K]
   private val ser = serializer.newInstance()
 
+  @volatile private var isReadingIterator: Boolean = false
--- End diff --

I'm a little confused. Isn't this having the same effect as just removing 
the assert, since you're setting this to `true` right after instantiating 
`readingIterator`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16492: [SPARK-19113][SS][Tests]Set UncaughtExceptionHandler in ...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16492
  
**[Test build #70998 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70998/testReport)**
 for PR 16492 at commit 
[`59a1161`](https://github.com/apache/spark/commit/59a11611999fddd0670218b16b991e691bcc574e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16443: [SPARK-19042] Remove query string from jar url fo...

2017-01-06 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/16443#discussion_r95044096
  
--- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
@@ -34,12 +34,14 @@ import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.internal.Logging
 import org.apache.spark.memory.TaskMemoryManager
 import org.apache.spark.rpc.RpcTimeout
-import org.apache.spark.scheduler.{AccumulableInfo, DirectTaskResult, 
IndirectTaskResult, Task}
+import org.apache.spark.scheduler.{DirectTaskResult, IndirectTaskResult, 
Task}
 import org.apache.spark.shuffle.FetchFailedException
 import org.apache.spark.storage.{StorageLevel, TaskResultBlockId}
 import org.apache.spark.util._
 import org.apache.spark.util.io.ChunkedByteBuffer
 
+
--- End diff --

Don't add these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16138: [WIP][SPARK-16609] Add to_date/to_timestamp with format ...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16138
  
**[Test build #70997 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70997/testReport)**
 for PR 16138 at commit 
[`1eb2ad0`](https://github.com/apache/spark/commit/1eb2ad00f4d033134d1d66d5dda24eee8cd29489).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16480: [SPARK-18194][ML] Log instrumentation in OneVsRest, Cros...

2017-01-06 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16480
  
**[Test build #70996 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70996/testReport)**
 for PR 16480 at commit 
[`0034461`](https://github.com/apache/spark/commit/00344616edfcc11d48fee5775186f26c3d49b118).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 328 matches

Mail list logo