[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread daroo
Github user daroo commented on the issue:

https://github.com/apache/spark/pull/19789
  
It seems that your "magic spell" didn't work. No build was triggered 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19747: [Spark-22431][SQL] Ensure that the datatype in th...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19747#discussion_r152734937
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala 
---
@@ -895,6 +898,19 @@ private[hive] object HiveClientImpl {
 Option(hc.getComment).map(field.withComment).getOrElse(field)
   }
 
+  private def verifyColumnDataType(schema: StructType): Unit = {
+schema.foreach(field => {
+  val typeString = field.dataType.catalogString
--- End diff --

`catalogString` is generated by Spark. It is not related to the restriction 
of Hive. 

See my fix: 
https://github.com/gatorsmile/spark/commit/bdcb9c8d29db022d9703eb91ef3f74c35bc24ec1

After you applying my fix, you also need to update the test cases to make 
the exception types consistent. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19747: [Spark-22431][SQL] Ensure that the datatype in th...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19747#discussion_r152734005
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -174,6 +174,87 @@ class HiveCatalogedDDLSuite extends DDLSuite with 
TestHiveSingleton with BeforeA
   test("alter datasource table add columns - partitioned - orc") {
 testAddColumnPartitioned("orc")
   }
+
+  test("SPARK-22431: illegal nested type") {
+val queries = Seq(
+  "CREATE TABLE t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q",
+  "CREATE TABLE t(q STRUCT<`$a`:INT, col2:STRING>, i1 INT)",
+  "CREATE VIEW t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q")
+
+queries.foreach(query => {
+  val err = intercept[AnalysisException] {
+spark.sql(query)
+  }.getMessage
+  assert(err.contains("Cannot recognize the data type"))
+})
+
+withView("v") {
+  spark.sql("CREATE VIEW v AS SELECT STRUCT('a' AS `a`, 1 AS b) q")
+  assert(spark.sql("SELECT * FROM v").count() == 1L)
--- End diff --

Could you check the contents instead of number of row counts?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19747: [Spark-22431][SQL] Ensure that the datatype in th...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19747#discussion_r152734026
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -174,6 +174,87 @@ class HiveCatalogedDDLSuite extends DDLSuite with 
TestHiveSingleton with BeforeA
   test("alter datasource table add columns - partitioned - orc") {
 testAddColumnPartitioned("orc")
   }
+
+  test("SPARK-22431: illegal nested type") {
+val queries = Seq(
+  "CREATE TABLE t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q",
+  "CREATE TABLE t(q STRUCT<`$a`:INT, col2:STRING>, i1 INT)",
+  "CREATE VIEW t AS SELECT STRUCT('a' AS `$a`, 1 AS b) q")
+
+queries.foreach(query => {
+  val err = intercept[AnalysisException] {
+spark.sql(query)
+  }.getMessage
+  assert(err.contains("Cannot recognize the data type"))
+})
+
+withView("v") {
+  spark.sql("CREATE VIEW v AS SELECT STRUCT('a' AS `a`, 1 AS b) q")
+  assert(spark.sql("SELECT * FROM v").count() == 1L)
--- End diff --

The same applies to the other test cases


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19518
  
I'd prefer inner class approach.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19788: [SPARK-9853][Core] Optimize shuffle fetch of contiguous ...

2017-11-22 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/19788
  
@yucai would you mind adding more explanations to your PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18692#discussion_r152725251
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
   if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
   }
 }
+
+/**
+ * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
+ * only to CROSS joins.
+ *
+ * For instance, if there is a CROSS join, where the left relation has 'a 
= 1' and the right
+ * relation has 'b = 1', then the rule infers 'a = b' as a join predicate.
+ */
+object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (SQLConf.get.constraintPropagationEnabled) {
+  inferJoinConditions(plan)
+} else {
+  plan
+}
+  }
+
+  private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan 
transform {
+case join @ Join(left, right, Cross, conditionOpt) =>
+  val leftConstraints = 
join.constraints.filter(_.references.subsetOf(left.outputSet))
+  val rightConstraints = 
join.constraints.filter(_.references.subsetOf(right.outputSet))
+  val inferredJoinPredicates = inferJoinPredicates(leftConstraints, 
rightConstraints)
+
+  val newConditionOpt = conditionOpt match {
+case Some(condition) =>
+  val existingPredicates = splitConjunctivePredicates(condition)
+  val newPredicates = findNewPredicates(inferredJoinPredicates, 
existingPredicates)
+  if (newPredicates.nonEmpty) Some(And(newPredicates.reduce(And), 
condition)) else None
+case None =>
+  inferredJoinPredicates.reduceOption(And)
+  }
+  if (newConditionOpt.isDefined) Join(left, right, Inner, 
newConditionOpt) else join
--- End diff --

Yes. In this PR, we just need to consider cross join without any join 
condition. 

In the future, we can extend it. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19797
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19797
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84124/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19797
  
**[Test build #84124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84124/testReport)**
 for PR 19797 at commit 
[`4d9657a`](https://github.com/apache/spark/commit/4d9657ada452f8fee85e89818026cf15aea3aafc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on...

2017-11-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19370


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
Seems there is a conflict while backporting to branch-2.2. @jsnowacki, mind 
opening a backporting PR to branch-2.2 please?

I think this is important for many Windows users and I guess relatively 
low-risky.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

2017-11-22 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19797#discussion_r152719853
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -799,9 +799,11 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 
   private[this] def castToByteCode(from: DataType, ctx: CodegenContext): 
CastFunction = from match {
 case StringType =>
-  val wrapper = ctx.freshName("wrapper")
-  ctx.addMutableState("UTF8String.IntWrapper", wrapper,
-s"$wrapper = new UTF8String.IntWrapper();")
+  val wrapper = "intWrapper"
--- End diff --

Do you worry about name collision among different methods? Since the 
lifetime of this object is very short, I intentionally use the same name for 
the same type (`IntWrapper` or `LongWrapper`) to reuse the object.
We could use different names among different methods.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

2017-11-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19797#discussion_r152719657
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -87,35 +87,41 @@ private [sql] object GenArrayData {
   elementType: DataType,
   elementsCode: Seq[ExprCode],
   isMapKey: Boolean): (String, Seq[String], String, String) = {
-val arrayName = ctx.freshName("array")
 val arrayDataName = ctx.freshName("arrayData")
 val numElements = elementsCode.length
 
 if (!ctx.isPrimitiveType(elementType)) {
+  val arrayName = "arrayObject"
   val genericArrayClass = classOf[GenericArrayData].getName
-  ctx.addMutableState("Object[]", arrayName,
-s"$arrayName = new Object[$numElements];")
+  if (!ctx.mutableStates.exists(s => s._1 == arrayName)) {
+ctx.addMutableState("Object[]", arrayName)
+  }
 
   val assignments = elementsCode.zipWithIndex.map { case (eval, i) =>
-val isNullAssignment = if (!isMapKey) {
-  s"$arrayName[$i] = null;"
+val isNullAssignment = if (eval.isNull == "false") {
+  ""
--- End diff --

This don't really simplify the code since you still have the `else` block 
below. It complicates the codes here actually. So I think it is better to be 
unchanged.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

2017-11-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19797#discussion_r152719475
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -799,9 +799,11 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 
   private[this] def castToByteCode(from: DataType, ctx: CodegenContext): 
CastFunction = from match {
 case StringType =>
-  val wrapper = ctx.freshName("wrapper")
-  ctx.addMutableState("UTF8String.IntWrapper", wrapper,
-s"$wrapper = new UTF8String.IntWrapper();")
+  val wrapper = "intWrapper"
+  if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
+ctx.addMutableState("UTF8String.IntWrapper", wrapper,
+  s"$wrapper = new UTF8String.IntWrapper();")
+  }
--- End diff --

Add a help method to `CodegenContext`? Like `reuseOrAddMutableState`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

2017-11-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19797#discussion_r152719412
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -799,9 +799,11 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 
   private[this] def castToByteCode(from: DataType, ctx: CodegenContext): 
CastFunction = from match {
 case StringType =>
-  val wrapper = ctx.freshName("wrapper")
-  ctx.addMutableState("UTF8String.IntWrapper", wrapper,
-s"$wrapper = new UTF8String.IntWrapper();")
+  val wrapper = "intWrapper"
--- End diff --

Should we worry about name collision?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19757: [SPARK-22529] [SQL] Relation stats should be consistent ...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19757
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84123/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19757: [SPARK-22529] [SQL] Relation stats should be consistent ...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19757
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19757: [SPARK-22529] [SQL] Relation stats should be consistent ...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19757
  
**[Test build #84123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84123/testReport)**
 for PR 19757 at commit 
[`4c7d12e`](https://github.com/apache/spark/commit/4c7d12ee0d7f4026265e3b0177d72c041133969a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-11-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19621#discussion_r152717925
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -130,21 +160,49 @@ class StringIndexer @Since("1.4.0") (
   @Since("1.4.0")
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /** @group setParam */
+  @Since("2.3.0")
+  def setInputCols(value: Array[String]): this.type = set(inputCols, value)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setOutputCols(value: Array[String]): this.type = set(outputCols, 
value)
+
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): StringIndexerModel = {
 transformSchema(dataset.schema, logging = true)
-val values = dataset.na.drop(Array($(inputCol)))
-  .select(col($(inputCol)).cast(StringType))
-  .rdd.map(_.getString(0))
-val labels = $(stringOrderType) match {
-  case StringIndexer.frequencyDesc => 
values.countByValue().toSeq.sortBy(-_._2)
-.map(_._1).toArray
-  case StringIndexer.frequencyAsc => 
values.countByValue().toSeq.sortBy(_._2)
-.map(_._1).toArray
-  case StringIndexer.alphabetDesc => 
values.distinct.collect.sortWith(_ > _)
-  case StringIndexer.alphabetAsc => values.distinct.collect.sortWith(_ 
< _)
+
+val inputCols = getInOutCols._1
+
+val zeroState = Array.fill(inputCols.length)(new OpenHashMap[String, 
Long]())
+
+val countByValueArray = dataset.na.drop(inputCols)
+  .select(inputCols.map(col(_).cast(StringType)): _*)
+  .rdd.aggregate(zeroState)(
--- End diff --

Is `treeAggregate` better? I think it should be faster?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19621: [SPARK-11215][ML] Add multiple columns support to...

2017-11-22 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19621#discussion_r152717985
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala ---
@@ -130,21 +160,49 @@ class StringIndexer @Since("1.4.0") (
   @Since("1.4.0")
   def setOutputCol(value: String): this.type = set(outputCol, value)
 
+  /** @group setParam */
+  @Since("2.3.0")
+  def setInputCols(value: Array[String]): this.type = set(inputCols, value)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setOutputCols(value: Array[String]): this.type = set(outputCols, 
value)
+
   @Since("2.0.0")
   override def fit(dataset: Dataset[_]): StringIndexerModel = {
 transformSchema(dataset.schema, logging = true)
-val values = dataset.na.drop(Array($(inputCol)))
-  .select(col($(inputCol)).cast(StringType))
-  .rdd.map(_.getString(0))
-val labels = $(stringOrderType) match {
-  case StringIndexer.frequencyDesc => 
values.countByValue().toSeq.sortBy(-_._2)
-.map(_._1).toArray
-  case StringIndexer.frequencyAsc => 
values.countByValue().toSeq.sortBy(_._2)
-.map(_._1).toArray
-  case StringIndexer.alphabetDesc => 
values.distinct.collect.sortWith(_ > _)
-  case StringIndexer.alphabetAsc => values.distinct.collect.sortWith(_ 
< _)
+
+val inputCols = getInOutCols._1
+
+val zeroState = Array.fill(inputCols.length)(new OpenHashMap[String, 
Long]())
+
+val countByValueArray = dataset.na.drop(inputCols)
+  .select(inputCols.map(col(_).cast(StringType)): _*)
+  .rdd.aggregate(zeroState)(
+  (state: Array[OpenHashMap[String, Long]], row: Row) => {
+for (i <- 0 until inputCols.length) {
+  state(i).changeValue(row.getString(i), 1L, _ + 1)
+}
+state
+  },
+  (state1: Array[OpenHashMap[String, Long]], state2: 
Array[OpenHashMap[String, Long]]) => {
+for (i <- 0 until inputCols.length) {
+  state2(i).foreach { case (key: String, count: Long) =>
+state1(i).changeValue(key, count, _ + count)
+  }
+}
+state1
+  }
+)
+val labelsArray = countByValueArray.map { countByValue =>
+  $(stringOrderType) match {
+case StringIndexer.frequencyDesc => 
countByValue.toSeq.sortBy(-_._2).map(_._1).toArray
+case StringIndexer.frequencyAsc => 
countByValue.toSeq.sortBy(_._2).map(_._1).toArray
+case StringIndexer.alphabetDesc => 
countByValue.toSeq.map(_._1).sortWith(_ > _).toArray
+case StringIndexer.alphabetAsc => 
countByValue.toSeq.map(_._1).sortWith(_ < _).toArray
--- End diff --

If the dataset is large, it might be. We can leave it as it is. If we find 
it is a bottleneck, we still can revisit it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19518: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-11-22 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19518
  
@bdrillard @cloud-fan @maropu
I created and run a benchmark program. I think that to use an array for a 
compaction is slower than to use scalar instance variables. In the following my 
case, 20% slower in the best time. 

Thus, I would like to use an approach to create inner classes to keep in 
scalar instance variables.  
WDYT? Any comments are very appreciated.

Here are 
[Test.java](https://gist.github.com/kiszk/63c2829488cb777d7ca78d45d20c021f) and 
[myInsntance.py](https://gist.github.com/kiszk/049a62f5d1259481c400a86299bd0228)
 that I used.

```
$ cat /proc/cpuinfo | grep "model name" | uniq
model name  : Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
$ python myInstance.py > MyInstance.java && javac Test.java && java Test


Result(us): Array
   0: 145333.227
   1: 144288.262
   2: 144233.871
   3: 144536.350
   4: 144503.269
   5: 144836.117
   6: 18.053
   7: 144744.725
   8: 144688.652
   9: 144727.823
  10: 17.789
  11: 144500.638
  12: 144641.592
  13: 144464.106
  14: 144518.914
  15: 144844.639
  16: 144780.464
  17: 144617.363
  18: 144463.271
  19: 144508.170
  20: 144929.451
  21: 144529.697
  22: 144273.167
  23: 144362.926
  24: 144296.854
  25: 144398.665
  26: 144490.813
  27: 144435.732
  28: 144675.997
  29: 144483.581
BEST: 144233.871000, AVG: 144566.806

Result(us): Vars
   0: 120375.384
   1: 119800.238
   2: 119822.842
   3: 119830.761
   4: 119836.781
   5: 120185.751
   6: 120208.140
   7: 120274.925
   8: 120112.109
   9: 120082.120
  10: 120063.456
  11: 120112.493
  12: 120144.937
  13: 119964.356
  14: 119941.633
  15: 119825.758
  16: 119677.506
  17: 119833.236
  18: 119749.781
  19: 119723.932
  20: 120197.394
  21: 120052.820
  22: 120006.650
  23: 119939.335
  24: 119857.469
  25: 120176.229
  26: 120153.605
  27: 120345.581
  28: 120163.129
  29: 120038.673
BEST: 119677.506, AVG: 120016.567
```

Small MyInstance.java (N = 16, M = 4)
```
class MyInstance {
  final int N = 16;
  int[] instance = new int[N];
  void accessArrays0() {
instance[8] = instance[0];
instance[9] = instance[1];
instance[10] = instance[2];
instance[11] = instance[3];
  }
  void accessArrays1() {
instance[12] = instance[4];
instance[13] = instance[5];
instance[14] = instance[6];
instance[15] = instance[7];
  }
  void accessArrays2() {
instance[0] = instance[8];
instance[1] = instance[9];
instance[2] = instance[10];
instance[3] = instance[11];
  }
  void accessArrays3() {
instance[4] = instance[12];
instance[5] = instance[13];
instance[6] = instance[14];
instance[7] = instance[15];
  }
  void accessArray() {
accessArrays0();
accessArrays1();
accessArrays2();
accessArrays3();
  }

  int instance0;
  int instance1;
  int instance2;
  int instance3;
  int instance4;
  int instance5;
  int instance6;
  int instance7;
  int instance8;
  int instance9;
  int instance00010;
  int instance00011;
  int instance00012;
  int instance00013;
  int instance00014;
  int instance00015;
  void accessVars0() {
instance8 = instance0;
instance9 = instance1;
instance00010 = instance2;
instance00011 = instance3;
  }
  void accessVars1() {
instance00012 = instance4;
instance00013 = instance5;
instance00014 = instance6;
instance00015 = instance7;
  }
  void accessVars2() {
instance0 = instance8;
instance1 = instance9;
instance2 = instance00010;
instance3 = instance00011;
  }
  void accessVars3() {
instance4 = instance00012;
instance5 = instance00013;
instance6 = instance00014;
instance7 = instance00015;
  }
  void accessVars() {
accessVars0();
accessVars1();
accessVars2();
accessVars3();
  }
}
```



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comman

[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19746
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84122/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19746
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19746
  
**[Test build #84122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84122/testReport)**
 for PR 19746 at commit 
[`2b1ed0a`](https://github.com/apache/spark/commit/2b1ed0a3a85385f9b4042415889335942b65b9c9).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19780: [SPARK-22551][SQL] Prevent possible 64kb compile ...

2017-11-22 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/19780


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19780: [SPARK-22551][SQL] Prevent possible 64kb compile error f...

2017-11-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19780
  
I can't reproduce the issue after #19767 is merged. Hopefully it solves 
this issue too, so I will close this. If not and I can reproduce it later, I 
will re-open this. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19795: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Backport PR...

2017-11-22 Thread vinodkc
Github user vinodkc commented on the issue:

https://github.com/apache/spark/pull/19795
  
Thank you


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19795: [SPARK-17920][SPARK-19580][SPARK-19878][SQL] Back...

2017-11-22 Thread vinodkc
Github user vinodkc closed the pull request at:

https://github.com/apache/spark/pull/19795


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19756: [SPARK-22527][SQL] Reuse coordinated exchanges if possib...

2017-11-22 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/19756
  
My first thought is also to fix `Exchange.sameResult`. But I realize soon 
that it is not a right fix. The reason is, the results of coordinated exchanges 
are dependent with all exchanges coordinated by the same coordinator. Even 
`Exchange.sameResult` returns true for two exchanges by excluding their 
coordinator, if they have different exchange siblings, the coordinated shuffle 
results are different.

For example, consider two groups of coordinated exchanges:

```
RootNode
-Node 1
-- Exchange A (coordinator 1)
-- Exchange B (coordinator 1)
...
-Node 2
-- Exchange C (coordinator 2)
-- Exchange D (coordinator 2)
-- Exchange E (coordinator 2)

Says exchange A and D has same result if we don't consider coordinator. In 
this case we can't replace exchange D with exchange A, because exchange D's 
shuffle partitions might be different than exchange A due to coordination.









---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84121/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84121 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84121/testReport)**
 for PR 19717 at commit 
[`60234a2`](https://github.com/apache/spark/commit/60234a29846955b8a6e8cb6fbb1dc35da3c3b4f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `require(mainClass.isDefined, \"Main class must be specified via 
--main-class\")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-22 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19621
  
@MLnick Ah, I don't express it exactly, the first case, what I mean is, 
sort by frequency, but if the case frequency equal, sort by alphabet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19797
  
**[Test build #84124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84124/testReport)**
 for PR 19797 at commit 
[`4d9657a`](https://github.com/apache/spark/commit/4d9657ada452f8fee85e89818026cf15aea3aafc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19792
  
@gberger, BTW, just to be clear, IIRC the type inference and merging code 
path here are shared for other data types, for example, dict, namedtuple, row 
and etc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

2017-11-22 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19797#discussion_r152712011
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -851,9 +855,11 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 
   private[this] def castToIntCode(from: DataType, ctx: CodegenContext): 
CastFunction = from match {
 case StringType =>
-  val wrapper = ctx.freshName("wrapper")
-  ctx.addMutableState("UTF8String.IntWrapper", wrapper,
-s"$wrapper = new UTF8String.IntWrapper();")
+  val wrapper = "intWrapper"
+  if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
+ctx.addMutableState("UTF8String.IntWrapper", wrapper,
+  s"$wrapper = new UTF8String.IntWrapper();")
+  }
   (c, evPrim, evNull) =>
 s"""
   if ($c.toInt($wrapper)) {
--- End diff --

It would work well, too. We may have some overhead to create a small object 
`IntWrapper` and collect it at GC.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19757: [SPARK-22529] [SQL] Relation stats should be consistent ...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19757
  
**[Test build #84123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84123/testReport)**
 for PR 19757 at commit 
[`4c7d12e`](https://github.com/apache/spark/commit/4c7d12ee0d7f4026265e3b0177d72c041133969a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19792
  
Thanks @ueshin. Yup, +1 for adding some tests.

I just wonder if we could have a similar form of error message in type 
verification. I remember I fixed a similar issue for type verification - 
https://github.com/apache/spark/pull/18521 (see the links in "Before" and 
"After"), for example:

```
field c in field b in field a: can not merge type IntegerType and StringType
element in array element in array field a: can not merge type IntegerType 
and StringType
```

Let's make sure there is no performance regression as well (even I was 
about to make the mistake before).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/19370
  
Build started: [SparkR] `ALL` 
[![PR-19370](https://ci.appveyor.com/api/projects/status/github/spark-test/spark?branch=A5AAFE0C-D09F-4FBE-B834-811E8AC9FF06&svg=true)](https://ci.appveyor.com/project/spark-test/spark/branch/A5AAFE0C-D09F-4FBE-B834-811E8AC9FF06)
Diff: 
https://github.com/apache/spark/compare/master...spark-test:A5AAFE0C-D09F-4FBE-B834-811E8AC9FF06


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19798: [SPARK-22583] First delegation token renewal time is not...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19798
  
**[Test build #3990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3990/testReport)**
 for PR 19798 at commit 
[`988edf7`](https://github.com/apache/spark/commit/988edf7703fa8e36f57b016fb2f2a558f094cc42).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19798: [SPARK-22583] First delegation token renewal time is not...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19798
  
**[Test build #3990 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3990/testReport)**
 for PR 19798 at commit 
[`988edf7`](https://github.com/apache/spark/commit/988edf7703fa8e36f57b016fb2f2a558f094cc42).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19798: [SPARK-22583] First delegation token renewal time is not...

2017-11-22 Thread ArtRand
Github user ArtRand commented on the issue:

https://github.com/apache/spark/pull/19798
  
@kalvinnchau thanks for catching this mistake. @vanzin can we get a quick 
merge? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-11-22 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19439


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19370
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19370
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84119/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19439
  
Merging with master
This is awesome to get in---thanks a lot @imatiach-msft and everyone who 
contributed and reviewed!!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #84119 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84119/testReport)**
 for PR 19370 at commit 
[`b58f740`](https://github.com/apache/spark/commit/b58f74054f9c02b0548254984dfe46516fe14e18).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...

2017-11-22 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19763#discussion_r152702214
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -472,15 +475,66 @@ private[spark] class MapOutputTrackerMaster(
 shuffleStatuses.get(shuffleId).map(_.findMissingPartitions())
   }
 
+  /**
+   * Grouped function of Range, this is to avoid traverse of all elements 
of Range using
+   * IterableLike's grouped function.
+   */
+  def rangeGrouped(range: Range, size: Int): Seq[Range] = {
+val start = range.start
+val step = range.step
+val end = range.end
+for (i <- start.until(end, size * step)) yield {
+  i.until(i + size * step, step)
+}
+  }
+
+  /**
+   * To equally divide n elements into m buckets, basically each bucket 
should have n/m elements,
+   * for the remaining n%m elements, add one more element to the first n%m 
buckets each.
+   */
+  def equallyDivide(numElements: Int, numBuckets: Int): Seq[Seq[Int]] = {
+val elementsPerBucket = numElements / numBuckets
+val remaining = numElements % numBuckets
+val splitPoint = (elementsPerBucket + 1) * remaining
+if (elementsPerBucket == 0) {
+  rangeGrouped(0.until(splitPoint), elementsPerBucket + 1)
+} else {
+  rangeGrouped(0.until(splitPoint), elementsPerBucket + 1) ++
+rangeGrouped(splitPoint.until(numElements), elementsPerBucket)
+}
+  }
+
   /**
* Return statistics about all of the outputs for a given shuffle.
*/
   def getStatistics(dep: ShuffleDependency[_, _, _]): MapOutputStatistics 
= {
 shuffleStatuses(dep.shuffleId).withMapStatuses { statuses =>
   val totalSizes = new Array[Long](dep.partitioner.numPartitions)
-  for (s <- statuses) {
-for (i <- 0 until totalSizes.length) {
-  totalSizes(i) += s.getSizeForBlock(i)
+  val parallelAggThreshold = conf.get(
+SHUFFLE_MAP_OUTPUT_PARALLEL_AGGREGATION_THRESHOLD)
+  val parallelism = math.min(
+Runtime.getRuntime.availableProcessors(),
+statuses.length * totalSizes.length / parallelAggThreshold + 1)
--- End diff --

`statuses.length.toLong`. It's easy to overflow here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-22 Thread MrBago
Github user MrBago commented on a diff in the pull request:

https://github.com/apache/spark/pull/19588#discussion_r152701758
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala ---
@@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] (
   // TODO: Check more carefully about whether this whole class will be 
included in a closure.
 
   /** Per-vector transform function */
-  private val transformFunc: Vector => Vector = {
+  private lazy val transformFunc: Vector => Vector = {
--- End diff --

@WeichenXu123 Thanks for the clarification, I'm still working on 
understanding Params and when they can & can't be modified :).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19763: [SPARK-22537][core] Aggregation of map output statistics...

2017-11-22 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/19763
  
> We can shut down the pool after some certain idle time, but not sure if 
it's worth the complexity

Yeah, that's just what the cached thread pool does :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19763: [SPARK-22537][core] Aggregation of map output statistics...

2017-11-22 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/19763
  
OK to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r152699261
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
 ---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.io._
+import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
+import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.datasources.orc.OrcUtils.withNullSafe
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[orc] class OrcDeserializer(
+dataSchema: StructType,
+requiredSchema: StructType,
+missingColumnNames: Seq[String]) {
+
+  private[this] val mutableRow = new 
SpecificInternalRow(requiredSchema.map(_.dataType))
+
+  private[this] val length = requiredSchema.length
+
+  private[this] val unwrappers = requiredSchema.map { f =>
+if (missingColumnNames.contains(f.name)) {
+  (value: Any, row: InternalRow, ordinal: Int) => 
row.setNullAt(ordinal)
--- End diff --

Yep. That's correct. It was a defensive code. I'll return null.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new O...

2017-11-22 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/19651#discussion_r152699116
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcDeserializer.scala
 ---
@@ -0,0 +1,216 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import scala.collection.JavaConverters._
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.io._
+import org.apache.orc.mapred.{OrcList, OrcMap, OrcStruct, OrcTimestamp}
+import org.apache.orc.storage.serde2.io.{DateWritable, HiveDecimalWritable}
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions.SpecificInternalRow
+import org.apache.spark.sql.catalyst.util._
+import org.apache.spark.sql.execution.datasources.orc.OrcUtils.withNullSafe
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+private[orc] class OrcDeserializer(
+dataSchema: StructType,
+requiredSchema: StructType,
+missingColumnNames: Seq[String]) {
+
+  private[this] val mutableRow = new 
SpecificInternalRow(requiredSchema.map(_.dataType))
+
+  private[this] val length = requiredSchema.length
+
+  private[this] val unwrappers = 
requiredSchema.map(_.dataType).map(unwrapperFor).toArray
+
+  def deserialize(orcStruct: OrcStruct): InternalRow = {
+var i = 0
+val names = orcStruct.getSchema.getFieldNames
+while (i < length) {
+  val name = requiredSchema(i).name
+  val writable = if (missingColumnNames.contains(name)) {
+null
+  } else {
+if (names.contains(name)) {
+  orcStruct.getFieldValue(name)
+} else {
+  orcStruct.getFieldValue("_col" + dataSchema.fieldIndex(name))
+}
+  }
+  if (writable == null) {
+mutableRow.setNullAt(i)
+  } else {
+unwrappers(i)(writable, mutableRow, i)
+  }
+  i += 1
+}
+mutableRow
+  }
+
+  private[this] def unwrapperFor(dataType: DataType): (Any, InternalRow, 
Int) => Unit =
+dataType match {
+  case NullType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setNullAt(ordinal)
+
+  case BooleanType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setBoolean(ordinal, value.asInstanceOf[BooleanWritable].get)
+
+  case ByteType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setByte(ordinal, value.asInstanceOf[ByteWritable].get)
+
+  case ShortType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setShort(ordinal, value.asInstanceOf[ShortWritable].get)
+
+  case IntegerType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setInt(ordinal, value.asInstanceOf[IntWritable].get)
+
+  case LongType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setLong(ordinal, value.asInstanceOf[LongWritable].get)
+
+  case FloatType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setFloat(ordinal, value.asInstanceOf[FloatWritable].get)
+
+  case DoubleType =>
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row.setDouble(ordinal, value.asInstanceOf[DoubleWritable].get)
+
+  case _ =>
+val unwrapper = getValueUnwrapper(dataType)
+(value: Any, row: InternalRow, ordinal: Int) =>
+  row(ordinal) = unwrapper(value)
--- End diff --

I see. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...

2017-11-22 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19763#discussion_r152693924
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -472,15 +475,66 @@ private[spark] class MapOutputTrackerMaster(
 shuffleStatuses.get(shuffleId).map(_.findMissingPartitions())
   }
 
+  /**
+   * Grouped function of Range, this is to avoid traverse of all elements 
of Range using
+   * IterableLike's grouped function.
+   */
+  def rangeGrouped(range: Range, size: Int): Seq[Range] = {
+val start = range.start
+val step = range.step
+val end = range.end
+for (i <- start.until(end, size * step)) yield {
+  i.until(i + size * step, step)
+}
+  }
+
+  /**
+   * To equally divide n elements into m buckets, basically each bucket 
should have n/m elements,
+   * for the remaining n%m elements, add one more element to the first n%m 
buckets each.
+   */
+  def equallyDivide(numElements: Int, numBuckets: Int): Seq[Seq[Int]] = {
+val elementsPerBucket = numElements / numBuckets
+val remaining = numElements % numBuckets
+val splitPoint = (elementsPerBucket + 1) * remaining
+if (elementsPerBucket == 0) {
+  rangeGrouped(0.until(splitPoint), elementsPerBucket + 1)
+} else {
+  rangeGrouped(0.until(splitPoint), elementsPerBucket + 1) ++
+rangeGrouped(splitPoint.until(numElements), elementsPerBucket)
+}
+  }
+
   /**
* Return statistics about all of the outputs for a given shuffle.
*/
   def getStatistics(dep: ShuffleDependency[_, _, _]): MapOutputStatistics 
= {
 shuffleStatuses(dep.shuffleId).withMapStatuses { statuses =>
   val totalSizes = new Array[Long](dep.partitioner.numPartitions)
-  for (s <- statuses) {
-for (i <- 0 until totalSizes.length) {
-  totalSizes(i) += s.getSizeForBlock(i)
+  val parallelAggThreshold = conf.get(
+SHUFFLE_MAP_OUTPUT_PARALLEL_AGGREGATION_THRESHOLD)
+  val parallelism = math.min(
+Runtime.getRuntime.availableProcessors(),
+statuses.length * totalSizes.length / parallelAggThreshold + 1)
+  if (parallelism <= 1) {
+for (s <- statuses) {
+  for (i <- 0 until totalSizes.length) {
+totalSizes(i) += s.getSizeForBlock(i)
+  }
+}
+  } else {
+try {
+  val threadPool = 
ThreadUtils.newDaemonFixedThreadPool(parallelism, "map-output-aggregate")
+  implicit val executionContext = 
ExecutionContext.fromExecutor(threadPool)
+  val mapStatusSubmitTasks = equallyDivide(totalSizes.length, 
parallelism).map {
+reduceIds => Future {
+  for (s <- statuses; i <- reduceIds) {
+totalSizes(i) += s.getSizeForBlock(i)
+  }
+}
+  }
+  ThreadUtils.awaitResult(Future.sequence(mapStatusSubmitTasks), 
Duration.Inf)
+} finally {
+  threadpool.shutdown()
--- End diff --

@gczsjdy could you fix the compile error?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19763: [SPARK-22537][core] Aggregation of map output sta...

2017-11-22 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/19763#discussion_r152693851
  
--- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala ---
@@ -472,15 +475,66 @@ private[spark] class MapOutputTrackerMaster(
 shuffleStatuses.get(shuffleId).map(_.findMissingPartitions())
   }
 
+  /**
+   * Grouped function of Range, this is to avoid traverse of all elements 
of Range using
+   * IterableLike's grouped function.
+   */
+  def rangeGrouped(range: Range, size: Int): Seq[Range] = {
+val start = range.start
+val step = range.step
+val end = range.end
+for (i <- start.until(end, size * step)) yield {
+  i.until(i + size * step, step)
+}
+  }
+
+  /**
+   * To equally divide n elements into m buckets, basically each bucket 
should have n/m elements,
+   * for the remaining n%m elements, add one more element to the first n%m 
buckets each.
+   */
+  def equallyDivide(numElements: Int, numBuckets: Int): Seq[Seq[Int]] = {
+val elementsPerBucket = numElements / numBuckets
+val remaining = numElements % numBuckets
+val splitPoint = (elementsPerBucket + 1) * remaining
+if (elementsPerBucket == 0) {
+  rangeGrouped(0.until(splitPoint), elementsPerBucket + 1)
+} else {
+  rangeGrouped(0.until(splitPoint), elementsPerBucket + 1) ++
+rangeGrouped(splitPoint.until(numElements), elementsPerBucket)
+}
+  }
+
   /**
* Return statistics about all of the outputs for a given shuffle.
*/
   def getStatistics(dep: ShuffleDependency[_, _, _]): MapOutputStatistics 
= {
 shuffleStatuses(dep.shuffleId).withMapStatuses { statuses =>
   val totalSizes = new Array[Long](dep.partitioner.numPartitions)
-  for (s <- statuses) {
-for (i <- 0 until totalSizes.length) {
-  totalSizes(i) += s.getSizeForBlock(i)
+  val parallelAggThreshold = conf.get(
+SHUFFLE_MAP_OUTPUT_PARALLEL_AGGREGATION_THRESHOLD)
+  val parallelism = math.min(
+Runtime.getRuntime.availableProcessors(),
+statuses.length * totalSizes.length / parallelAggThreshold + 1)
+  if (parallelism <= 1) {
+for (s <- statuses) {
+  for (i <- 0 until totalSizes.length) {
+totalSizes(i) += s.getSizeForBlock(i)
+  }
+}
+  } else {
+try {
+  val threadPool = 
ThreadUtils.newDaemonFixedThreadPool(parallelism, "map-output-aggregate")
+  implicit val executionContext = 
ExecutionContext.fromExecutor(threadPool)
+  val mapStatusSubmitTasks = equallyDivide(totalSizes.length, 
parallelism).map {
+reduceIds => Future {
+  for (s <- statuses; i <- reduceIds) {
+totalSizes(i) += s.getSizeForBlock(i)
+  }
+}
+  }
+  ThreadUtils.awaitResult(Future.sequence(mapStatusSubmitTasks), 
Duration.Inf)
+} finally {
+  threadpool.shutdown()
--- End diff --

I'm fine to create a thread pool every time since this code path seems not 
run pretty frequently because 
- Using a shared cached thread poll is just like creating new thread pool 
since the idle time of a thread is pretty large and is likely killed before the 
next call.
- Using a shared fixed thread pool is totally a waste for most of use cases.
- The cost of creating threads is trivial comparing the total time of a job.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19746
  
**[Test build #84122 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84122/testReport)**
 for PR 19746 at commit 
[`2b1ed0a`](https://github.com/apache/spark/commit/2b1ed0a3a85385f9b4042415889335942b65b9c9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19746: [SPARK-22346][ML] VectorSizeHint Transformer for using V...

2017-11-22 Thread MrBago
Github user MrBago commented on the issue:

https://github.com/apache/spark/pull/19746
  
jenkins retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19439
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84118/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19439
  
**[Test build #84118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84118/testReport)**
 for PR 19439 at commit 
[`a76496b`](https://github.com/apache/spark/commit/a76496be9ebc8b4aba1cd1cd4e3132411649597e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84121 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84121/testReport)**
 for PR 19717 at commit 
[`60234a2`](https://github.com/apache/spark/commit/60234a29846955b8a6e8cb6fbb1dc35da3c3b4f2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19439
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84117/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19439
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19439
  
**[Test build #84117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84117/testReport)**
 for PR 19439 at commit 
[`3bcedcd`](https://github.com/apache/spark/commit/3bcedcdff173af6f707ebc6c7e59d8fbe8c2aca3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19717
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84120/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84120 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84120/testReport)**
 for PR 19717 at commit 
[`f38144b`](https://github.com/apache/spark/commit/f38144bcd11a2a261c857d09dc220d11da5ead5f).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `require(mainClass.isDefined, \"Main class must be specified via 
--main-class\")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19717: [SPARK-18278] [Submission] Spark on Kubernetes - basic s...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19717
  
**[Test build #84120 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84120/testReport)**
 for PR 19717 at commit 
[`f38144b`](https://github.com/apache/spark/commit/f38144bcd11a2a261c857d09dc220d11da5ead5f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84116/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84116 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84116/testReport)**
 for PR 19468 at commit 
[`cb12fec`](https://github.com/apache/spark/commit/cb12fecb9cc8b6686b08ef1e82de3e62f32b4b73).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19468
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84114/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19468
  
**[Test build #84114 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84114/testReport)**
 for PR 19468 at commit 
[`e5a6a67`](https://github.com/apache/spark/commit/e5a6a67a704bdaf8cab9beb21b628eb279cc865d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/19789
  
ok to test

On Wed, Nov 22, 2017 at 2:49 PM, Daroo  wrote:

> Cool. Could you please authorize it for testing?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread daroo
Github user daroo commented on the issue:

https://github.com/apache/spark/pull/19789
  
Cool. Could you please authorize it for testing?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19370
  
**[Test build #84119 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84119/testReport)**
 for PR 19370 at commit 
[`b58f740`](https://github.com/apache/spark/commit/b58f74054f9c02b0548254984dfe46516fe14e18).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19370: [SPARK-22495] Fix setup of SPARK_HOME variable on Window...

2017-11-22 Thread jsnowacki
Github user jsnowacki commented on the issue:

https://github.com/apache/spark/pull/19370
  
Thanks for looking into it again. I've followed your suggestions and 
updated PR. It also seems to work for me now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/19789
  
Seems reasonable.

On Wed, Nov 22, 2017 at 1:52 PM, Daroo  wrote:

> It fails on the current master branch and doesn't after the patch
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19390: [SPARK-18935][MESOS] Fix dynamic reservations on mesos

2017-11-22 Thread ArtRand
Github user ArtRand commented on the issue:

https://github.com/apache/spark/pull/19390
  
Just to reiterate our conversation here. 

The `AllocationInfo` tells the scheduler which `role`’s allocation of 
resources the `Offer` is coming from, whereas the `ReservationInfo` tells the 
scheduler whether or not the `Offer` contains reserved resources. We needed the 
latter to effectively user dynamically reserved resources.

From your (old) PR it looks like the `AllocationInfo` is mostly just 
“forwarded”, i.e. there isn’t any logic around it’s contents. So your 
new PR removes references to `AllocationInfo` to prevent breaking with Mesos 
1.3- and remain non-MULTI_ROLE.

The purpose of `AllocationInfo` is (in the future) to be able to have some 
logic around which tasks get launched on resources allocated to various roles 
that a framework is subscribed as - but we don’t have that yet and isn't 
addressed in this patch. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19797: [SPARK-22570][SQL] Avoid to create a lot of globa...

2017-11-22 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19797#discussion_r152664308
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ---
@@ -851,9 +855,11 @@ case class Cast(child: Expression, dataType: DataType, 
timeZoneId: Option[String
 
   private[this] def castToIntCode(from: DataType, ctx: CodegenContext): 
CastFunction = from match {
 case StringType =>
-  val wrapper = ctx.freshName("wrapper")
-  ctx.addMutableState("UTF8String.IntWrapper", wrapper,
-s"$wrapper = new UTF8String.IntWrapper();")
+  val wrapper = "intWrapper"
+  if (!ctx.mutableStates.exists(s => s._1 == wrapper)) {
+ctx.addMutableState("UTF8String.IntWrapper", wrapper,
+  s"$wrapper = new UTF8String.IntWrapper();")
+  }
   (c, evPrim, evNull) =>
 s"""
   if ($c.toInt($wrapper)) {
--- End diff --

what if we create a new wrapper every time?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread daroo
Github user daroo commented on the issue:

https://github.com/apache/spark/pull/19789
  
It fails on the current master branch and doesn't after the patch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread koeninger
Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/19789
  
What are you actually asserting in that test and/or does it reliably fail
if run on the version of your code before the patch?

On Wed, Nov 22, 2017 at 1:33 PM, Daroo  wrote:

> I've added a test. @koeninger  is it
> something you had in mind?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19439
  
Thanks!  LGTM pending tests


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18692: [SPARK-21417][SQL] Infer join conditions using pr...

2017-11-22 Thread aokolnychyi
Github user aokolnychyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/18692#discussion_r152660385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/joins.scala 
---
@@ -152,3 +152,71 @@ object EliminateOuterJoin extends Rule[LogicalPlan] 
with PredicateHelper {
   if (j.joinType == newJoinType) f else Filter(condition, 
j.copy(joinType = newJoinType))
   }
 }
+
+/**
+ * A rule that uses propagated constraints to infer join conditions. The 
optimization is applicable
+ * only to CROSS joins.
+ *
+ * For instance, if there is a CROSS join, where the left relation has 'a 
= 1' and the right
+ * relation has 'b = 1', then the rule infers 'a = b' as a join predicate.
+ */
+object InferJoinConditionsFromConstraints extends Rule[LogicalPlan] with 
PredicateHelper {
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (SQLConf.get.constraintPropagationEnabled) {
+  inferJoinConditions(plan)
+} else {
+  plan
+}
+  }
+
+  private def inferJoinConditions(plan: LogicalPlan): LogicalPlan = plan 
transform {
+case join @ Join(left, right, Cross, conditionOpt) =>
+  val leftConstraints = 
join.constraints.filter(_.references.subsetOf(left.outputSet))
+  val rightConstraints = 
join.constraints.filter(_.references.subsetOf(right.outputSet))
+  val inferredJoinPredicates = inferJoinPredicates(leftConstraints, 
rightConstraints)
+
+  val newConditionOpt = conditionOpt match {
+case Some(condition) =>
+  val existingPredicates = splitConjunctivePredicates(condition)
+  val newPredicates = findNewPredicates(inferredJoinPredicates, 
existingPredicates)
+  if (newPredicates.nonEmpty) Some(And(newPredicates.reduce(And), 
condition)) else None
+case None =>
+  inferredJoinPredicates.reduceOption(And)
+  }
+  if (newConditionOpt.isDefined) Join(left, right, Inner, 
newConditionOpt) else join
--- End diff --

@gatorsmile Thanks for getting back.

``CheckCartesianProducts`` identifies a join of type ``Inner | LeftOuter | 
RightOuter | FullOuter`` as a cartesian product if there is no join predicate 
that has references to both relations.

If we agree to ignore joins of type Cross that have a condition (in this 
PR), then the use case in this 
[discussion](https://github.com/apache/spark/pull/18692#discussion_r144466472) 
is no longer possible (even if you remove t1.col1 >= t2.col1). Correct? 
``PushPredicateThroughJoin`` will push ``t1.col1 = t1.col2 + t2.col2 and 
t2.col1 = t1.col2 + t2.col2`` into the join condition and the proposed rule 
will not infer anything and the 
final join will be of type Cross with a condition that covers both 
relations. According to the logic of ``CheckCartesianProducts``, it is not 
considered to be a cartesian product (since there exists a join predicate that 
covers both relations, e.g. ``t1.col1 = t1.col2 + t2.col2``).

So, if I have a confirmation that we need to consider only joins of type 
Cross and without any join conditions, I can update the PR accordingly.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread daroo
Github user daroo commented on the issue:

https://github.com/apache/spark/pull/19789
  
I've added a test. @koeninger is it something you had in mind?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-11-22 Thread erikerlandson
Github user erikerlandson commented on the issue:

https://github.com/apache/spark/pull/19468
  
@reviewers, just as a PSA we have to rebase the second PR (containing the 
submission client) _after_ this one merges, and then we can submit the second 
one against this upstream. This is a series of large PRs and so review needs to 
be thorough; but if/when reviewer comments are addressed, this one blocks the 
following one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19439
  
**[Test build #84118 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84118/testReport)**
 for PR 19439 at commit 
[`a76496b`](https://github.com/apache/spark/commit/a76496be9ebc8b4aba1cd1cd4e3132411649597e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/19439
  
@jkbradley good catch - I added the missing link to the license file and I 
rebased the code against the very very latest master.  Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19439
  
**[Test build #84117 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84117/testReport)**
 for PR 19439 at commit 
[`3bcedcd`](https://github.com/apache/spark/commit/3bcedcdff173af6f707ebc6c7e59d8fbe8c2aca3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread imatiach-msft
Github user imatiach-msft commented on the issue:

https://github.com/apache/spark/pull/19439
  
good catch, it's from here:
https://ccsearch.creativecommons.org/image/detail/B2CVP_j5KjwZm7UAVJ3Hvw==
let me add it to the list


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of ti...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19607#discussion_r152646370
  
--- Diff: python/setup.py ---
@@ -201,7 +201,7 @@ def _supports_symlinks():
 extras_require={
 'ml': ['numpy>=1.7'],
 'mllib': ['numpy>=1.7'],
-'sql': ['pandas>=0.13.0']
+'sql': ['pandas>=0.19.2']
--- End diff --

Document this requirement and behavior changes in `Migration Guide`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19607: [SPARK-22395][SQL][PYTHON] Fix the behavior of ti...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19607#discussion_r152645369
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -997,6 +997,14 @@ object SQLConf {
   .intConf
   .createWithDefault(1)
 
+  val PANDAS_RESPECT_SESSION_LOCAL_TIMEZONE =
+buildConf("spark.sql.execution.pandas.respectSessionTimeZone")
+  .internal()
+  .doc("When true, make Pandas DataFrame with timestamp type 
respecting session local " +
+"timezone when converting to/from Pandas DataFrame.")
--- End diff --

Emphasize the conf will be deprecated?

> When true, make Pandas DataFrame with timestamp type respecting session 
local timezone when converting to/from Pandas DataFrame. This configuration 
will be deprecated in the future releases.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19798: [SPARK-22583] First delegation token renewal time is not...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19798
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19798: [SPARK-22583] First delegation token renewal time...

2017-11-22 Thread kalvinnchau
GitHub user kalvinnchau opened a pull request:

https://github.com/apache/spark/pull/19798

[SPARK-22583] First delegation token renewal time is not 75% of renewal 
time in Mesos

The first scheduled renewal time is is set to the exact expiration time,
and all subsequent renewal times are 75% of the renewal time. This makes
it so that the inital renewal time is also 75%.

## What changes were proposed in this pull request?

Set the initial renewal time to be 75% of renewal time.

## How was this patch tested?

Tested locally in a test HDFS cluster, checking various renewal times.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kalvinnchau/spark fix-inital-renewal-time

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19798.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19798


commit 988edf7703fa8e36f57b016fb2f2a558f094cc42
Author: Kalvin Chau 
Date:   2017-11-22T18:09:52Z

first renewal time to be 75% of renewal time

The first scheduled renewal time is is set to the exact expiration time,
and all subsequent renewal times are 75% of the renewal time. This makes
it so that the inital renewal time is also 75%.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19752: [SPARK-22520][SQL] Support code generation for la...

2017-11-22 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19752#discussion_r152643842
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
 ---
@@ -211,111 +231,61 @@ abstract class CaseWhenBase(
 val elseCase = elseValue.map(" ELSE " + _.sql).getOrElse("")
 "CASE" + cases + elseCase + " END"
   }
-}
-
-
-/**
- * Case statements of the form "CASE WHEN a THEN b [WHEN c THEN d]* [ELSE 
e] END".
- * When a = true, returns b; when c = true, returns d; else returns e.
- *
- * @param branches seq of (branch condition, branch value)
- * @param elseValue optional value for the else branch
- */
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "CASE WHEN expr1 THEN expr2 [WHEN expr3 THEN expr4]* [ELSE 
expr5] END - When `expr1` = true, returns `expr2`; else when `expr3` = true, 
returns `expr4`; else returns `expr5`.",
-  arguments = """
-Arguments:
-  * expr1, expr3 - the branch condition expressions should all be 
boolean type.
-  * expr2, expr4, expr5 - the branch value expressions and else value 
expression should all be
-  same type or coercible to a common type.
-  """,
-  examples = """
-Examples:
-  > SELECT CASE WHEN 1 > 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
-   1
-  > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 > 0 THEN 2.0 ELSE 1.2 END;
-   2
-  > SELECT CASE WHEN 1 < 0 THEN 1 WHEN 2 < 0 THEN 2.0 ELSE null END;
-   NULL
-  """)
-// scalastyle:on line.size.limit
-case class CaseWhen(
-val branches: Seq[(Expression, Expression)],
-val elseValue: Option[Expression] = None)
-  extends CaseWhenBase(branches, elseValue) with CodegenFallback with 
Serializable {
-
-  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-super[CodegenFallback].doGenCode(ctx, ev)
-  }
-
-  def toCodegen(): CaseWhenCodegen = {
-CaseWhenCodegen(branches, elseValue)
-  }
-}
-
-/**
- * CaseWhen expression used when code generation condition is satisfied.
- * OptimizeCodegen optimizer replaces CaseWhen into CaseWhenCodegen.
- *
- * @param branches seq of (branch condition, branch value)
- * @param elseValue optional value for the else branch
- */
-case class CaseWhenCodegen(
-val branches: Seq[(Expression, Expression)],
-val elseValue: Option[Expression] = None)
-  extends CaseWhenBase(branches, elseValue) with Serializable {
 
   override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
-// Generate code that looks like:
-//
-// condA = ...
-// if (condA) {
-//   valueA
-// } else {
-//   condB = ...
-//   if (condB) {
-// valueB
-//   } else {
-// condC = ...
-// if (condC) {
-//   valueC
-// } else {
-//   elseValue
-// }
-//   }
-// }
+val conditionMet = ctx.freshName("caseWhenConditionMet")
+ctx.addMutableState("boolean", ev.isNull, "")
+ctx.addMutableState(ctx.javaType(dataType), ev.value, "")
 val cases = branches.map { case (condExpr, valueExpr) =>
   val cond = condExpr.genCode(ctx)
   val res = valueExpr.genCode(ctx)
   s"""
-${cond.code}
-if (!${cond.isNull} && ${cond.value}) {
-  ${res.code}
-  ${ev.isNull} = ${res.isNull};
-  ${ev.value} = ${res.value};
+if(!$conditionMet) {
+  ${cond.code}
+  if (!${cond.isNull} && ${cond.value}) {
+${res.code}
+${ev.isNull} = ${res.isNull};
+${ev.value} = ${res.value};
+$conditionMet = true;
+  }
 }
   """
 }
 
-var generatedCode = cases.mkString("", "\nelse {\n", "\nelse {\n")
-
-elseValue.foreach { elseExpr =>
+val elseCode = elseValue.map { elseExpr =>
   val res = elseExpr.genCode(ctx)
-  generatedCode +=
-s"""
+  s"""
+if(!$conditionMet) {
   ${res.code}
   ${ev.isNull} = ${res.isNull};
   ${ev.value} = ${res.value};
-"""
-}
+}
+  """
+}.getOrElse("")
 
-generatedCode += "}\n" * cases.size
+val casesCode = if (ctx.INPUT_ROW == null || ctx.currentVars != null) {
+  cases.mkString("\n")
+} else {
+  ctx.splitExpressions(cases, "caseWhen",
--- End diff --

Then, could you show us a test case? Can be a performance test if the 
function is hard to hit a 64KB limit. 


---

---

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-11-22 Thread jkbradley
Github user jkbradley commented on the issue:

https://github.com/apache/spark/pull/19439
  
I just noticed: Where is data/mllib/images/kittens/DP153539.jpg from?  
(It's missing in the license list.)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19797: [SPARK-22570][SQL] Avoid to create a lot of global varia...

2017-11-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19797
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84115/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >