[GitHub] spark issue #15958: [SPARK-17932][SQL] Support SHOW TABLES EXTENDED LIKE 'id...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15958
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15958: [SPARK-17932][SQL] Support SHOW TABLES EXTENDED LIKE 'id...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15958
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69393/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16048
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16067
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org




[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15975
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15975: [SPARK-18538] [SQL] Fix Concurrent Table Fetching Using ...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15975
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69392/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16060: [SPARK-18220][SQL] read Hive orc table with varchar colu...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16060
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16048: [DO_NOT_MERGE]Test kafka deletion

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16048
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69391/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16067
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69388/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15979
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16072: [SPARK-18639] Build only a single pip package

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16072
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69384/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16072: [SPARK-18639] Build only a single pip package

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16072
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15979
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69387/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16060: [SPARK-18220][SQL] read Hive orc table with varchar colu...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16060
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69389/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16072: [SPARK-18639] Build only a single pip package

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16072
  
**[Test build #3445 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3445/consoleFull)**
 for PR 16072 at commit 
[`88b53c3`](https://github.com/apache/spark/commit/88b53c3b542d1423c169af7b4e52ecd6da067ced).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16060: [SPARK-18220][SQL] read Hive orc table with varchar colu...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16060
  
**[Test build #3446 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3446/consoleFull)**
 for PR 16060 at commit 
[`8b697be`](https://github.com/apache/spark/commit/8b697be520bb9c070462bebc8c72796eca8c8517).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90185362
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
--- End diff --

The limitation is 64k, we don't need to be so conservative


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90185562
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
+  // Split these expressions only if they are created from a row object
+  (ctx.INPUT_ROW != null && ctx.currentVars == null)) {
+
+  val (condFuncName, condGlobalIsNull, condGlobalValue) =
+createAndAddFunction(ctx, condEval, predicate, "evalIfCondExpr")
+  val (trueFuncName, trueGlobalIsNull, trueGlobalValue) =
+createAndAddFunction(ctx, trueEval, trueValue, "evalIfTrueExpr")
+  val (falseFuncName, falseGlobalIsNull, falseGlobalValue) =
+createAndAddFunction(ctx, falseEval, falseValue, "evalIfFalseExpr")
+  s"""
+$condFuncName(${ctx.INPUT_ROW});
+boolean ${ev.isNull} = false;
+${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
+if (!$condGlobalIsNull && $condGlobalValue) {
+  $trueFuncName(${ctx.INPUT_ROW});
+  ${ev.isNull} = $trueGlobalIsNull;
+  ${ev.value} = $trueGlobalValue;
+} else {
+  $falseFuncName(${ctx.INPUT_ROW});
+  ${ev.isNull} = $falseGlobalIsNull;
+  ${ev.value} = $falseGlobalValue;
+}
+  """
+}
+else {
+  s"""
+${condEval.code}
+boolean ${ev.isNull} = false;
+${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
+if (!${condEval.isNull} && ${condEval.value}) {
+  ${trueEval.code}
+  ${ev.isNull} = ${trueEval.isNull};
+  ${ev.value} = ${trueEval.value};
+} else {
+  ${falseEval.code}
+  ${ev.isNull} = ${falseEval.isNull};
+  ${ev.value} = ${falseEval.value};
+}
+  """
+}
+
+ev.copy(code = generatedCode)
+  }
+
+  private def createAndAddFunction(ctx: CodegenContext, ev: ExprCode, 
expr: Expression,
--- End diff --

please follow the existing code style in Spark, i.e.
```
def xxx(
  param1: xxx,
  param2: xxx): xxx
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90185627
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
+  // Split these expressions only if they are created from a row object
+  (ctx.INPUT_ROW != null && ctx.currentVars == null)) {
+
+  val (condFuncName, condGlobalIsNull, condGlobalValue) =
+createAndAddFunction(ctx, condEval, predicate, "evalIfCondExpr")
+  val (trueFuncName, trueGlobalIsNull, trueGlobalValue) =
+createAndAddFunction(ctx, trueEval, trueValue, "evalIfTrueExpr")
+  val (falseFuncName, falseGlobalIsNull, falseGlobalValue) =
+createAndAddFunction(ctx, falseEval, falseValue, "evalIfFalseExpr")
+  s"""
+$condFuncName(${ctx.INPUT_ROW});
+boolean ${ev.isNull} = false;
+${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
+if (!$condGlobalIsNull && $condGlobalValue) {
+  $trueFuncName(${ctx.INPUT_ROW});
+  ${ev.isNull} = $trueGlobalIsNull;
+  ${ev.value} = $trueGlobalValue;
+} else {
+  $falseFuncName(${ctx.INPUT_ROW});
+  ${ev.isNull} = $falseGlobalIsNull;
+  ${ev.value} = $falseGlobalValue;
+}
+  """
+}
+else {
+  s"""
+${condEval.code}
+boolean ${ev.isNull} = false;
+${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
+if (!${condEval.isNull} && ${condEval.value}) {
+  ${trueEval.code}
+  ${ev.isNull} = ${trueEval.isNull};
+  ${ev.value} = ${trueEval.value};
+} else {
+  ${falseEval.code}
+  ${ev.isNull} = ${falseEval.isNull};
+  ${ev.value} = ${falseEval.value};
+}
+  """
+}
+
+ev.copy(code = generatedCode)
+  }
+
+  private def createAndAddFunction(ctx: CodegenContext, ev: ExprCode, 
expr: Expression,
+   baseFuncName: String): (String, String, 
String) = {
+val globalIsNull = ctx.freshName("isNull")
+ctx.addMutableState("boolean", globalIsNull, s"$globalIsNull = false;")
+val globalValue = ctx.freshName("value")
+ctx.addMutableState(ctx.javaType(expr.dataType), globalValue,
--- End diff --

looks like we don't need to pass in the `expr`, but only its data type


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15958: [SPARK-17932][SQL] Support SHOW TABLES EXTENDED LIKE 'id...

2016-11-30 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/15958
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90185819
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
+  // Split these expressions only if they are created from a row object
+  (ctx.INPUT_ROW != null && ctx.currentVars == null)) {
--- End diff --

So if condition and true/false expressions are bound to `currentVars`, we 
still exceed JVM code size limit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15979
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15979
  
**[Test build #69394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69394/consoleFull)**
 for PR 15979 at commit 
[`70dd650`](https://github.com/apache/spark/commit/70dd650a7e43a44a056c4aa95dbbd88d23cbfbee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15958: [SPARK-17932][SQL] Support SHOW TABLES EXTENDED LIKE 'id...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15958
  
**[Test build #69395 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69395/consoleFull)**
 for PR 15958 at commit 
[`958fe8b`](https://github.com/apache/spark/commit/958fe8b083feb6a312b02abe8325b973bc91500f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16067
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16067: [SPARK-17897] [SQL] Fixed IsNotNull Constraint Inference...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16067
  
**[Test build #69396 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69396/consoleFull)**
 for PR 16067 at commit 
[`54c0dd1`](https://github.com/apache/spark/commit/54c0dd10d4aabc4700d4a33206c481703c16fb83).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15987: [SPARK-17732][SPARK-18515][SQL] ALTER TABLE DROP PARTITI...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15987
  
**[Test build #69397 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69397/consoleFull)**
 for PR 15987 at commit 
[`c35aeab`](https://github.com/apache/spark/commit/c35aeabe05b50762e3a7ea620ea3009b02f0231d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto pick" fe...

2016-11-30 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/16052
  
@rxin OK, I will backport it to branch-2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90189388
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,12 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // we know primitive type takes only non-null, or
+  // we can infer correct nullability for struct's fieldValue by a 
guard using If(IsNull())
--- End diff --

where do we add the `If(IsNull)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90189486
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,12 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // we know primitive type takes only non-null, or
+  // we can infer correct nullability for struct's fieldValue by a 
guard using If(IsNull())
--- End diff --

let's mention that it's the last line of this section


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90189759
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala
 ---
@@ -405,13 +407,12 @@ case class WrapOption(child: Expression, optType: 
DataType)
  * A place holder for the loop variable used in [[MapObjects]].  This 
should never be constructed
  * manually, but will instead be passed into the provided lambda function.
  */
-case class LambdaVariable(value: String, isNull: String, dataType: 
DataType) extends LeafExpression
+case class LambdaVariable(value: String, isNull: String, dataType: 
DataType,
+nullable: Boolean = true) extends LeafExpression
--- End diff --

nit: code style should be
```
case class xxx(
  param1: xxx
  param2: xxx) extends ...
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread kapilsingh5050
Github user kapilsingh5050 commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90189958
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
--- End diff --

I used the same limit as in following change:

https://github.com/apache/spark/pull/14692/files#diff-8bcc5aea39c73d4bf38aef6f6951d42cL595

which is based on some benchmarks. In addition, for JIT to do its 
optimisations, I think the limit is 8k.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90190203
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoderSuite.scala
 ---
@@ -338,6 +338,18 @@ class ExpressionEncoderSuite extends PlanTest with 
AnalysisTest {
 }
   }
 
+  test("nullable of encoder serializer") {
+def checkNullable[T: Encoder](nullable: Boolean*): Unit = {
--- End diff --

this is over design, we only pass a single parameter, or are you going to 
add some more non-flat cases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #69398 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69398/consoleFull)**
 for PR 15671 at commit 
[`a8ff7f7`](https://github.com/apache/spark/commit/a8ff7f74de8af1e3f0655b21d5235a6eb6e1cd03).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/15979
  
Good to merge pending Jenkins. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-30 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r90191009
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -869,6 +891,31 @@ object TestReceiver {
   val counter = new AtomicInteger(1)
 }
 
+class FakeByteArrayReceiver extends 
Receiver[Array[Byte]](StorageLevel.MEMORY_ONLY) with Logging {
--- End diff --

@zsxwing yes, failure occurs when receiver store `Array[Byte]` data and the 
automatic serializer selection would pick JavaSerializer as the type of data  
is erased to be Object . However, after get from remote executor, the 
input-stream data will be deserialized with KryoSerializer as Task could get 
data type properly,  leading to the **com.esotericsoftware.kryo.KryoException: 
Encountered unregistered class ID: 13994**


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #69398 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69398/consoleFull)**
 for PR 15671 at commit 
[`a8ff7f7`](https://github.com/apache/spark/commit/a8ff7f74de8af1e3f0655b21d5235a6eb6e1cd03).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69398/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16052: [SPARK-18617][CORE][STREAMING] Close "kryo auto p...

2016-11-30 Thread uncleGen
Github user uncleGen commented on a diff in the pull request:

https://github.com/apache/spark/pull/16052#discussion_r90191359
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala 
---
@@ -869,6 +891,31 @@ object TestReceiver {
   val counter = new AtomicInteger(1)
 }
 
+class FakeByteArrayReceiver extends 
Receiver[Array[Byte]](StorageLevel.MEMORY_ONLY) with Logging {
--- End diff --

BTW, existing unit test could cover other cases besides `Array[Byte]` type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15915: [SPARK-18485][CORE] Underlying integer overflow when cre...

2016-11-30 Thread uncleGen
Github user uncleGen commented on the issue:

https://github.com/apache/spark/pull/15915
  
Is there any update? @JoshRosen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero sen...

2016-11-30 Thread AnthonyTruchet
GitHub user AnthonyTruchet opened a pull request:

https://github.com/apache/spark/pull/16078

[SPARK-18471][MLLIB] Fix huge vectors of zero send in closure in L-BFGS

## What changes were proposed in this pull request?

Introduced util TreeAggregatorWithZeroGenerator and used it to avoid 
sending huge zero vector in L-BFGS or similar aggregation.

## How was this patch tested?

Run custom L-BGFS using this aggregator instead of treeAggregate with 
significantly increased performance and stability.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/criteo-forks/spark ENG-17719-wrapper

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16078


commit b4333a3b89362952e900acd7824e5d40500e3b9f
Author: Anthony Truchet 
Date:   2016-11-29T18:20:38Z

[SPARK-18471][MLLIB] Fix huge vectors of zero send in closure in L-BFGS

Introduced util tTreeAggregatoreWithZeroGenerator to avoid sending huge
zero vector in L-BFGS or similar agregation, as only the size of the zero
value to be generated is captured in the closure.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90191825
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

The concept of `None` can't well fit with Tuple, except for Tuple1.

For a Tuple 2, for example, we encoder it as a row of 2 columns. If it is 
None, should we encode it as `[null]` or `[null, null]`? Conceptually, `[null]` 
looks the correct answer. However, practically it becomes a row with only one 
column and there is a conflict in data format.

Currently, if given a None for a Tuple2 data, we will encode it as `[null, 
null]`, as you seen in the test @ueshin mentioned.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90191951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
+  // Split these expressions only if they are created from a row object
+  (ctx.INPUT_ROW != null && ctx.currentVars == null)) {
--- End diff --

I'm curious about how other similar patches handle whole stage codegen, any 
ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue:

https://github.com/apache/spark/pull/16038
  
@mridulm we don't know how to monitor the size of the serialize task. Sure 
it would not 10MB due to all those zeros. But we nonetheless measure a 
significant increase in performance and (more importantly) in stability when 
using the workaround and our custom TreeAggregatorWithZeroGenerator 

@srowen when the density of a Sparse Vector increases it became *very* 
inefficient, not only because it is using almost twice the size of memory, but 
mainly because you fragment memory access on overload the GC :( 

See #16078 for a generic wrapper around treeAggregate with uses Option[U] 
do denote the zero of U by None and generate a full representation by need when 
calling seqOp or comboOp. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90191991
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
--- End diff --

ah i see


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15991: [SPARK-17843][WEB UI] Indicate event logs pending for pr...

2016-11-30 Thread vijoshi
Github user vijoshi commented on the issue:

https://github.com/apache/spark/pull/15991
  
@tgravescs would like to have this in 2.0 along with the other improvement 
that got accepted for backport (spark-18010). so would you consider allowing 
this into 2.0 as well?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16078
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90192290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

So I think we can't very well modeling optional data type like 
Option[Tuple2[Int, String]] with a row.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #69399 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69399/consoleFull)**
 for PR 15314 at commit 
[`194f7b4`](https://github.com/apache/spark/commit/194f7b47f8f02e6c9d09d928d9ec7bd83b0ee921).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90192638
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

using `Dataset[Option[xxx]]` is going to be banned, see 
https://github.com/apache/spark/pull/15979


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #69400 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69400/consoleFull)**
 for PR 15671 at commit 
[`e3d8676`](https://github.com/apache/spark/commit/e3d8676513f75d839daadd81ccc776f479602dc1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16078
  
This is getting a bit out of hand, with 6 pull requests now. To be clear, I 
do not think we should merge this change. It's not necessary to address the 
problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16038
  
As in https://github.com/apache/spark/pull/16078 I do not think we should 
merge a change like this. Let's fix the problem directly in 
https://github.com/apache/spark/pull/16037


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16037
  
Following https://github.com/apache/spark/pull/16038 I suggest this proceed 
by making the zero value a sparse vector, and then making it dense in the seqOp 
immediately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90193978
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

+1

`Option[Int]`, `Option[String]`, `Option[Tuple1[Int]]`, 
`Option[Tuple1[String]]` are ok?
If so I'd like you to add the following tests to `ExpressionEncoderSuite`:

```scala
encodeDecodeTest(Option(31), "option of int")
encodeDecodeTest(Option("abc"), "option of string")
encodeDecodeTest(Option.empty[Int], "empty option of int")
encodeDecodeTest(Option.empty[String], "empty option of string")
encodeDecodeTest(Option(Tuple1(31)), "option of tuple1 of int")
encodeDecodeTest(Option.empty[Tuple1[Int]], "empty option of tuple1 of int")
encodeDecodeTest(Option(Tuple1("abc")), "option of tuple1 of string")
encodeDecodeTest(Option.empty[Tuple1[String]], "empty option of tuple1 of 
string")
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13706: [SPARK-15988] [SQL] Implement DDL commands: Create/Drop ...

2016-11-30 Thread lshmouse
Github user lshmouse commented on the issue:

https://github.com/apache/spark/pull/13706
  
@lianhuiwang 

Just a feedback.  With this patch, creating a MACRO throws the following 
exception.
Any suggestion? I am trying to debug it.

```
16/11/30 16:59:18 INFO execution.SparkSqlParser: Parsing command: CREATE 
TEMPORARY MACRO flr(time_ms bigint) FLOOR(time_ms/1000/3600)*3600
16/11/30 16:59:18 ERROR thriftserver.SparkExecuteStatementOperation: Error 
executing query, currentState RUNNING, 
org.apache.spark.sql.AnalysisException: Cannot resolve 
'(FLOOR(((boundreference() / 1000) / 3600)) * 3600)' for CREATE TEMPORARY MACRO 
flr, due to data type mismatch: differing types in '(FLOOR(((boundreference() / 
1000) / 3600)) * 3600)' (bigint and int).;
  at 
org.apache.spark.sql.execution.command.CreateMacroCommand.run(macros.scala:70)  

  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
  
  at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120)
  
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:120)
  
  at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:141)

  at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)  

  at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:138)  

  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:119)  
 
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
 
  at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)

  at org.apache.spark.sql.Dataset.(Dataset.scala:186) 
 
  at org.apache.spark.sql.Dataset.(Dataset.scala:167) 
 
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) 
 
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)  
 
  at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)  
 
  at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:221)
  at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:165)
  at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:162)
  at java.security.AccessController.doPrivileged(Native Method) 
 
  at javax.security.auth.Subject.doAs(Subject.java:415) 
 
  at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1854)

  at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:175)
  at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) 

  at java.util.concurrent.FutureTask.run(FutureTask.java:262)   
 
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 

  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 

  at java.lang.Thread.run(Thread.java:745)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15620: [SPARK-18091] [SQL] Deep if expressions cause Gen...

2016-11-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15620#discussion_r90188857
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -64,19 +64,72 @@ case class If(predicate: Expression, trueValue: 
Expression, falseValue: Expressi
 val trueEval = trueValue.genCode(ctx)
 val falseEval = falseValue.genCode(ctx)
 
-ev.copy(code = s"""
-  ${condEval.code}
-  boolean ${ev.isNull} = false;
-  ${ctx.javaType(dataType)} ${ev.value} = 
${ctx.defaultValue(dataType)};
-  if (!${condEval.isNull} && ${condEval.value}) {
-${trueEval.code}
-${ev.isNull} = ${trueEval.isNull};
-${ev.value} = ${trueEval.value};
-  } else {
-${falseEval.code}
-${ev.isNull} = ${falseEval.isNull};
-${ev.value} = ${falseEval.value};
-  }""")
+// place generated code of condition, true value and false value in 
separate methods if
+// their code combined is large
+val combinedLength = condEval.code.length + trueEval.code.length + 
falseEval.code.length
+val generatedCode = if (combinedLength > 1024 &&
+  // Split these expressions only if they are created from a row object
+  (ctx.INPUT_ROW != null && ctx.currentVars == null)) {
+
+  val (condFuncName, condGlobalIsNull, condGlobalValue) =
+createAndAddFunction(ctx, condEval, predicate, "evalIfCondExpr")
+  val (trueFuncName, trueGlobalIsNull, trueGlobalValue) =
+createAndAddFunction(ctx, trueEval, trueValue, "evalIfTrueExpr")
+  val (falseFuncName, falseGlobalIsNull, falseGlobalValue) =
+createAndAddFunction(ctx, falseEval, falseValue, "evalIfFalseExpr")
--- End diff --

I don't think we need to split true/false blocks here, which handle 64kb 
limit well.
It looks working with the same blocks without splitting at my local test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to Pyspark ...

2016-11-30 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/15817
  
Sorry for delay - this LGTM. Given it's been around for a while and given 
RC2 is likely to be cut, I've gone ahead and merged to master / branch-2.1. 
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15817: [SPARK-18366][PYSPARK][ML] Add handleInvalid to P...

2016-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15817


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue:

https://github.com/apache/spark/pull/16078
  
It is necessary to address it in L-BFGS at least. We propose a solution in 
core which can be legitimately rejected as not relevant for core. And two 
solutions in MLlib, one provide for a reusable util to aggregate large vectors, 
the over one is a specific hack in L-BFGS. At least one of them deserves to be 
worked on I believe a Sparse vectors are not an option for performance reasons.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16038: [SPARK-18471][CORE] New treeAggregate overload fo...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at:

https://github.com/apache/spark/pull/16038


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16038: [SPARK-18471][CORE] New treeAggregate overload for big l...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue:

https://github.com/apache/spark/pull/16038
  
Agreed this is too ML specific to be deserve a fix in core.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90199326
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

I see that `Option[Tuple1[Int]]`, `Option[Tuple1[String]]` are invalid.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15780: [SPARK-18284][SQL] Make ExpressionEncoder.seriali...

2016-11-30 Thread ueshin
Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/15780#discussion_r90199899
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 
---
@@ -590,7 +591,11 @@ object ScalaReflection extends ScalaReflection {
   "cannot be used as field name\n" + 
walkedTypePath.mkString("\n"))
   }
 
-  val fieldValue = Invoke(inputObject, fieldName, 
dataTypeFor(fieldType))
+  // primitive take only non-null or struct takes non-null object 
guarded by isNull
--- End diff --

Anyway, after #15979, I believe we can use the previous code I suggested:

```scala
val fieldValue = Invoke(
  AssertNotNull(inputObject, walkedTypePath), fieldName, 
dataTypeFor(fieldType),
  returnNullable = !fieldType.typeSymbol.asClass.isPrimitive)
```

What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16060: [SPARK-18220][SQL] read Hive orc table with varchar colu...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16060
  
**[Test build #3446 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3446/consoleFull)**
 for PR 16060 at commit 
[`8b697be`](https://github.com/apache/spark/commit/8b697be520bb9c070462bebc8c72796eca8c8517).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15979
  
**[Test build #69394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69394/consoleFull)**
 for PR 15979 at commit 
[`70dd650`](https://github.com/apache/spark/commit/70dd650a7e43a44a056c4aa95dbbd88d23cbfbee).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16040: [SPARK-18612][MLLIB] Delete broadcasted variable in LBFG...

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16040
  
Merged to master/2.1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15979
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69394/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15979
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16040: [SPARK-18612][MLLIB] Delete broadcasted variable ...

2016-11-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16040


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16037
  
This is all a bit confusing - can we highlight which PR is actually to be 
reviewed? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16072: [SPARK-18639] Build only a single pip package

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16072
  
**[Test build #3445 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3445/consoleFull)**
 for PR 16072 at commit 
[`88b53c3`](https://github.com/apache/spark/commit/88b53c3b542d1423c169af7b4e52ecd6da067ced).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16069: [WIP][SPARK-18638][BUILD] Upgrade sbt to 0.13.13

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16069
  
Regarding plugin updates: I wouldn't mind updating the Maven plugins too 
while you're at it. `mvn versions:display-plugin-updates` will show you all the 
candidates. Some changes we can't or won't take but simple maintenance updates 
to plugins are generally fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/16037
  
This should be the main PR @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #69399 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69399/consoleFull)**
 for PR 15314 at commit 
[`194f7b4`](https://github.com/apache/spark/commit/194f7b47f8f02e6c9d09d928d9ec7bd83b0ee921).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69399/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #12896: [SPARK-14489][ML][PYSPARK] ALS unknown user/item predict...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/12896
  
**[Test build #69401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69401/consoleFull)**
 for PR 12896 at commit 
[`a439899`](https://github.com/apache/spark/commit/a4398995fbd3180f04ef1837113ce88a8703a6ee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet commented on the issue:

https://github.com/apache/spark/pull/16078
  
In order to reduce the mess around multiple PRs I'll close this one and 
rebase the change in #16037 as you requested.

What is the right way,convenient for you, to share a proposal without 
creating a PR btw ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero sen...

2016-11-30 Thread AnthonyTruchet
Github user AnthonyTruchet closed the pull request at:

https://github.com/apache/spark/pull/16078


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11974: [SPARK-14174][ML] Accelerate KMeans via Mini-Batch EM

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/11974
  
**[Test build #69402 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69402/consoleFull)**
 for PR 11974 at commit 
[`c27f128`](https://github.com/apache/spark/commit/c27f128136817fed810602df4ccf7ed6e77894c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15979
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15671
  
**[Test build #69400 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69400/consoleFull)**
 for PR 15671 at commit 
[`e3d8676`](https://github.com/apache/spark/commit/e3d8676513f75d839daadd81ccc776f479602dc1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69400/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15671: [SPARK-18206][ML]Add instrumentation for MLP,NB,LDA,AFT,...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15671
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15979: [SPARK-18251][SQL] the type of Dataset can't be Option o...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15979
  
**[Test build #69403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69403/consoleFull)**
 for PR 15979 at commit 
[`70dd650`](https://github.com/apache/spark/commit/70dd650a7e43a44a056c4aa95dbbd88d23cbfbee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16078: [SPARK-18471][MLLIB] Fix huge vectors of zero send in cl...

2016-11-30 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16078
  
@AnthonyTruchet I think in this case it was just confusing to have many PRs 
opened against the issue. One option is to either adjust the existing PR with 
changes (so that only one PR is open). 

You can mark the PR as "[WIP]" to indicate it's somewhat up for discussion 
still.

An even better approach is to first discuss on the JIRA ticket and try to 
settle on an accepted solution, then open the PR. This is especially important 
if there is a proposed change to APIs or new core methods etc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16037
  
Right ok. So I think the approach of making the zero vector sparse then 
calling `toDense` in `seqOp` as @srowen suggested makes most sense.

Currently the gradient vector *must* be dense in MLlib since both `axpy` 
and the logic for multinomial logreg requires it. So the thing that is 
initially serialized with the task should be tiny, and the call `toDense` for 
the first instance in each partition will essentially generate the dense zero 
vector. Thereafter it should be a no-op as the vector will be dense and 
`toDense` will just be a ref to the values array.

Can we see if this works:
```scala
  val zeroVector = Vectors.sparse(n, Seq())
  val (gradientSum, lossSum) = data.treeAggregate((zeroVector, 0.0))(
  seqOp = (c, v) => (c, v) match { case ((grad, loss), (label, 
features)) =>
val denseGrad = grad.toDense
val l = localGradient.compute(
  features, label, bcW.value, denseGrad)
(denseGrad, loss + l)
  },
  combOp = (c1, c2) => (c1, c2) match { case ((grad1, loss1), 
(grad2, loss2)) =>
axpy(1.0, grad2, grad1)
(grad1, loss1 + loss2)
  })
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16037: [SPARK-18471][MLLIB] In LBFGS, avoid sending huge vector...

2016-11-30 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16037
  
What worries me more actually is that the initial vector when sent in the 
closure should be compressed. So why is this issue occurring? Is it a problem 
with serialization / compression? OR even after compression it is still too 
large? Would be good to understand that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue:

https://github.com/apache/spark/pull/15736
  
@srowen how about this for profiling?

```
private[spark] object WritablePartitionedPairCollection {
  /**
   * Takes an optional parameter (keyComparator), use if provided
   * and returns a comparator for the partitions
   */
  def getComparator[K](keyComparator: Option[Comparator[K]]): 
Comparator[(Int, K)] = {
if (keyComparator.isDefined) {
  val theKeyComp = keyComparator.get
  new Comparator[(Int, K)] {
// We know we have a non-empty comparator here
override def compare(a: (Int, K), b: (Int, K)): Int = {
  if (a._1 == b._1) {
theKeyComp.compare(a._2, b._2)
  } else {
a._1 - b._1
  }
}
  }
} else return new Comparator[(Int, K)] {
  override def compare(a: (Int, K), b: (Int, K)): Int = {
a._1 - b._1
  }
}
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15736
  
Looks right except you just want to write

```
if (...) {
  ...
} else {
  new Comparator...
}
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15713: [SPARK-18196] [CORE] Optimise CompactBuffer implementati...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue:

https://github.com/apache/spark/pull/15713
  
In response to @rxin's question, for HiBench CompactBuffers are **used only 
on PageRank** (none of the other 11) and these buffers mainly have between 3 
and 40 elements, no more than 60, never with only two elements. The PageRank 
workload processes 500k pages (large profile), we have 500k CompactBuffer 
constructor calls and 500k prints in the += method when curSize <= 2, 
indicating they're always expanding.

I don't know of any cases where we're adding only a couple of elements, I 
also ran SparkSqlPerf, all 100 queries, again we have no output indicating that 
we use this class (no prints from the constructor, the growToSize or the += 
methods). 

Here's a breakdown of growBySize invocations (prints the curSize variable) 
with PageRank so we have an idea of how big the CompactBuffers actually become.

I used the Spark WordCount example on the 677mb stdout file containing my 
prints to generate this data and we have a total of 18,762,361 growth events.

```
(3,50), (4,50), (5,50), (6,50), (7,50), (8,50), 
(9,50), (10,50), (11,50), (12,50), (13,50), (14,50), 
(15,50), (16,50), (17,50), (18,50), (19,50), (20,50), 
(21,48), (22,45), (23,42), (24,499978), (25,499951), (26,499879), 
(27,499729), (28,499321), (29,498517), (30,496984), (31,494114), (32,488878), 
(33,480328), (34,467214), (35,447829), (36,421619), (37,387790), (38,346826), 
(39,300660), (40,251266), (41,201702), (42,155372) (43,114024), (44,79886), 
(45,53196), (46,33580), (47,20146), (48,11569), (49,6222), (50,3143), 
(51,1491), (52,684), (53,289), (54,126), (55,39), (56,15), (57,6), (58,1), 
(59,1), (60,1)
```
On the left we have the CompactBuffer size in elements and on the right we 
have a number representing how many times this appeared in the output file 
(therefore the CompactBuffer has grown to have this many elements that many 
times).

If there are better ways to figure this out or other workloads to suggest 
do let me know, I've got the code ready that replaces CompactBuffer with 
ArrayBuffer(2) for profiling and testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16079: [SPARK-18645][Deploy] Fix spark-daemon.sh argumen...

2016-11-30 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/16079

[SPARK-18645][Deploy] Fix spark-daemon.sh arguments error lead to throws 
Unrecognized option

## What changes were proposed in this pull request?

spark-daemon.sh will lost single quotes around after #15338. as follows:
```
execute_command nice -n 0 bash 
/opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit
 --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name Thrift 
JDBC/ODBC Server --conf spark.driver.extraJavaOptions=-XX:+UseG1GC 
-XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
```
With this fix, as follows:
```
execute_command nice -n 0 bash 
/opt/cloudera/parcels/SPARK-2.1.0-cdh5.4.3.d20161129-21.04.38/lib/spark/bin/spark-submit
 --class org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --name 
'Thrift JDBC/ODBC Server' --conf 'spark.driver.extraJavaOptions=-XX:+UseG1GC 
-XX:-HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp'
```

## How was this patch tested?

- Manual tests 
- Build the package and start-thriftserver.sh with `--conf 
'spark.driver.extraJavaOptions=-XX:+UseG1GC -XX:-HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/tmp'`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-18645

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16079.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16079


commit 563645f9fe6e999adc2ee22422cfb236278f3c10
Author: Yuming Wang 
Date:   2016-11-30T10:00:13Z

Fix Unrecognized option




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16079: [SPARK-18645][Deploy] Fix spark-daemon.sh arguments erro...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16079
  
**[Test build #69404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69404/consoleFull)**
 for PR 16079 at commit 
[`563645f`](https://github.com/apache/spark/commit/563645f9fe6e999adc2ee22422cfb236278f3c10).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15987: [SPARK-17732][SPARK-18515][SQL] ALTER TABLE DROP PARTITI...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15987
  
**[Test build #69397 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69397/consoleFull)**
 for PR 15987 at commit 
[`c35aeab`](https://github.com/apache/spark/commit/c35aeabe05b50762e3a7ea620ea3009b02f0231d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait CommandWithExpression extends LeafNode `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15987: [SPARK-17732][SPARK-18515][SQL] ALTER TABLE DROP PARTITI...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15987
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15736: [SPARK-18224] [CORE] Optimise PartitionedPairBuffer impl...

2016-11-30 Thread a-roberts
Github user a-roberts commented on the issue:

https://github.com/apache/spark/pull/15736
  
Good point, done, I can get profiling the below code then? Builds fine and 
no scalastyle problems

```
  def getComparator[K](keyComparator: Option[Comparator[K]]): 
Comparator[(Int, K)] = {
if (keyComparator.isDefined) {
  val theKeyComp = keyComparator.get
  new Comparator[(Int, K)] {
// We know we have a non-empty comparator here
override def compare(a: (Int, K), b: (Int, K)): Int = {
  if (a._1 == b._1) {
theKeyComp.compare(a._2, b._2)
  } else {
a._1 - b._1
  }
}
  }
} else {
  new Comparator[(Int, K)] {
override def compare(a: (Int, K), b: (Int, K)): Int = {
  a._1 - b._1
}
  }
}
  }
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15987: [SPARK-17732][SPARK-18515][SQL] ALTER TABLE DROP PARTITI...

2016-11-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15987
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69397/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15958: [SPARK-17932][SQL] Support SHOW TABLES EXTENDED LIKE 'id...

2016-11-30 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15958
  
**[Test build #69395 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69395/consoleFull)**
 for PR 15958 at commit 
[`958fe8b`](https://github.com/apache/spark/commit/958fe8b083feb6a312b02abe8325b973bc91500f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >