[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98383/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98383 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98383/testReport)**
 for PR 22482 at commit 
[`b6ccecd`](https://github.com/apache/spark/commit/b6ccecdfa3a3d31305667541b8d8fd761e5d3aee).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22847
  
**[Test build #98384 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98384/testReport)**
 for PR 22847 at commit 
[`65a5355`](https://github.com/apache/spark/commit/65a5355a352ad228786b930e41628e0f255e9b59).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22847
  
@cloud-fan @rednaxelafx I missed that! Please help review.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98383 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98383/testReport)**
 for PR 22482 at commit 
[`b6ccecd`](https://github.com/apache/spark/commit/b6ccecdfa3a3d31305667541b8d8fd761e5d3aee).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22864
  
**[Test build #4406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4406/testReport)**
 for PR 22864 at commit 
[`72cf70a`](https://github.com/apache/spark/commit/72cf70a47bef979e3e625edc8fb8610632f886d3).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98379/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98379 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98379/testReport)**
 for PR 22482 at commit 
[`ee67bca`](https://github.com/apache/spark/commit/ee67bcaf6fa2d1ab17e755cb7d5edd5dd10115bc).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98380/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98380 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98380/testReport)**
 for PR 22482 at commit 
[`ee67bca`](https://github.com/apache/spark/commit/ee67bcaf6fa2d1ab17e755cb7d5edd5dd10115bc).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22906: [SPARK-25895][Core]Adding testcase to compare Lz4...

2018-11-01 Thread Udbhav30
Github user Udbhav30 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22906#discussion_r230269652
  
--- Diff: 
core/src/test/scala/org/apache/spark/io/CompressionCodecSuite.scala ---
@@ -128,6 +130,69 @@ class CompressionCodecSuite extends SparkFunSuite {
 }
   }
 
+  test("SPARK-25895 Zstd shuffle Read/Write/spill comparison w.r.t lz4") {
--- End diff --

Thanks for your suggestion, i will go through the benchmarks and update the 
PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/22893
  
> Hm, actually that's the best case. You're exercising the case where the 
code path you prefer is fast. And the case where the precision bound applies is 
exactly the case where the branch you deleted helps. As I say, you'd have to 
show this is not impacting other cases significantly, and I think it should. 
Consider the sparse-sparse case.

There is above test result:

First we need define some branch path logic for sparse-sparse and 
sparse-dense case
if meet precisionBound1, we define it as LOGIC1
if not meet precisionBound1, and not meet precisionBound2, we define it as 
LOGIC2
if not meet precisionBound1, but meet precisionBound2, we define it as 
LOGIC3
(There is a trick, you can manually change the precision value to meet 
above situation)

**sparse- sparse case time cost situation (milliseconds)**

LOGIC1
Before add patch:  7786, 7970, 8086
After add patch: 7729, 7653, 7903
LOGIC2
Before add patch: 8412, 9029, 8606
After add patch: 8603, 8724, 9024
LOGIC3
Before add patch:  19365, 19146, 19351
After add patch:  18917, 19007, 19074

**sparse-dense case time cost situation (milliseconds)**

LOGIC1
Before add patch: 4195, 4014, 4409
After add patch: 4081,3971, 4151
LOGIC2
Before add patch: 4968, 5579, 5080
After add patch: 4980, 5472, 5148
LOGIC3
Before add patch: 11848, 12077, 12168
After add patch: 11718, 11874, 11743

And for dense-dense case like we already discussed in comment, only use 
sqdist to calculate distance

**dense-dense case time cost situation (milliseconds)**

Before add patch: 7340, 7816, 7672
After add patch: 5752, 5800, 5753

The above result based on fastSquaredDistance which is showed below  

`

private[mllib] def fastSquaredDistance(
v1: Vector,
norm1: Double,
v2: Vector,
norm2: Double,
precision: Double = 1e-6): Double = {
  val n = v1.size
  require(v2.size == n)
  require(norm1 >= 0.0 && norm2 >= 0.0)
  val sumSquaredNorm = norm1 * norm1 + norm2 * norm2
  val normDiff = norm1 - norm2
  var sqDist = 0.0
  /*
   * The relative error is
   * 
   * EPSILON * ( \|a\|_2^2 + \|b\\_2^2 + 2 |a^T b|) / ( \|a - b\|_2^2 ),
   * 
   * which is bounded by
   * 
   * 2.0 * EPSILON * ( \|a\|_2^2 + \|b\|_2^2 ) / ( (\|a\|_2 - 
\|b\|_2)^2 ).
   * 
   * The bound doesn't need the inner product, so we can use it as a 
sufficient condition to
   * check quickly whether the inner product approach is accurate.
   */
  val precisionBound1 = 2.0 * EPSILON * sumSquaredNorm / (normDiff * 
normDiff + EPSILON)
  
  if (precisionBound1 < precision && (!v1.isInstanceOf[DenseVector]
|| !v2.isInstanceOf[DenseVector])) {
  sqDist = sumSquaredNorm - 2.0 * dot(v1, v2)
  } else if (v1.isInstanceOf[SparseVector] || 
v2.isInstanceOf[SparseVector]) {
val dotValue = dot(v1, v2)
sqDist = math.max(sumSquaredNorm - 2.0 * dotValue, 0.0)
val precisionBound2 = EPSILON * (sumSquaredNorm + 2.0 * 
math.abs(dotValue)) /
  (sqDist + EPSILON)
if (precisionBound2 > precision) {
  sqDist = Vectors.sqdist(v1, v2)
}
  } else {
sqDist = Vectors.sqdist(v1, v2)
  }
  
  sqDist
}
`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22927
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22927
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98377/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22927
  
**[Test build #98377 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98377/testReport)**
 for PR 22927 at commit 
[`85a5864`](https://github.com/apache/spark/commit/85a5864a5b6a910f3cc702d0407a5e015de2efcc).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  
Also cc @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  
@dongjoon-hyun Do you want to take this over?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22255: [SPARK-25102][Spark Core] Write Spark version informatio...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22255
  

https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L902

@rdblue  Can we use created_by?

```
  /** String for application that wrote this file.  This should be in the 
format
   *  version  (build ).
   * e.g. impala version 1.0 (build 
6cf94d29b2b7115df4de2c06e2ab4326d721eb55)
   **/
  6: optional string created_by
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/22893
  
> Hm, actually that's the best case. You're exercising the case where the 
code path you prefer is fast. And the case where the precision bound applies is 
exactly the case where the branch you deleted helps. As I say, you'd have to 
show this is not impacting other cases significantly, and I think it should. 
Consider the sparse-sparse case.

There is my test for sparse-sparse, dense-dense, sparse-dense case

[SparkMLlibTest.txt](https://github.com/apache/spark/files/2541007/SparkMLlibTest.txt)

`

  import org.apache.spark.{SparkConf, SparkContext}
  import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
  import com.github.fommil.netlib.F2jBLAS
  import java.util.Date


  object SparkMLlibTest {
class VectorWithNorm(val vector: Vector, val norm: Double) extends 
Serializable {

  def this(vector: Vector) = this(vector, Vectors.norm(vector, 2.0))

  def this(array: Array[Double]) = this(Vectors.dense(array))

  /** Converts the vector to a dense vector. */
  def toDense: VectorWithNorm = new 
VectorWithNorm(Vectors.dense(vector.toArray), norm)
}

def dot(x: Vector, y: Vector): Double = {
  require(x.size == y.size,
"BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching 
sizes:" +
  " x.size = " + x.size + ", y.size = " + y.size)
  (x, y) match {
case (dx: DenseVector, dy: DenseVector) =>
  dot(dx, dy)
case (sx: SparseVector, dy: DenseVector) =>
  dot(sx, dy)
case (dx: DenseVector, sy: SparseVector) =>
  dot(sy, dx)
case (sx: SparseVector, sy: SparseVector) =>
  dot(sx, sy)
case _ =>
  throw new IllegalArgumentException(s"dot doesn't support 
(${x.getClass}, ${y.getClass}).")
  }
}

def dot(x: DenseVector, y: DenseVector): Double = {
  val n = x.size
  new F2jBLAS().ddot(n, x.values, 1, y.values, 1)
}

def dot(x: SparseVector, y: DenseVector): Double = {
  val xValues = x.values
  val xIndices = x.indices
  val yValues = y.values
  val nnz = xIndices.length

  var sum = 0.0
  var k = 0
  while (k < nnz) {
sum += xValues(k) * yValues(xIndices(k))
k += 1
  }
  sum
}

/**
  * dot(x, y)
  */
def dot(x: SparseVector, y: SparseVector): Double = {
  val xValues = x.values
  val xIndices = x.indices
  val yValues = y.values
  val yIndices = y.indices
  val nnzx = xIndices.length
  val nnzy = yIndices.length

  var kx = 0
  var ky = 0
  var sum = 0.0
  // y catching x
  while (kx < nnzx && ky < nnzy) {
val ix = xIndices(kx)
while (ky < nnzy && yIndices(ky) < ix) {
  ky += 1
}
if (ky < nnzy && yIndices(ky) == ix) {
  sum += xValues(kx) * yValues(ky)
  ky += 1
}
kx += 1
  }
  sum
}

lazy val EPSILON = {
  var eps = 1.0
  while ((1.0 + (eps / 2.0)) != 1.0) {
eps /= 2.0
  }
  eps
}

def fastSquaredDistanceAddedPatch(
v1: Vector,
norm1: Double,
v2: Vector,
norm2: Double,
precision: Double = 1e-6): 
Double = {
  val n = v1.size
  require(v2.size == n)
  require(norm1 >= 0.0 && norm2 >= 0.0)
  val sumSquaredNorm = norm1 * norm1 + norm2 * norm2
  val normDiff = norm1 - norm2
  var sqDist = 0.0
  /*
   * The relative error is
   * 
   * EPSILON * ( \|a\|_2^2 + \|b\\_2^2 + 2 |a^T b|) / ( \|a - b\|_2^2 ),
   * 
   * which is bounded by
   * 
   * 2.0 * EPSILON * ( \|a\|_2^2 + \|b\|_2^2 ) / ( (\|a\|_2 - 
\|b\|_2)^2 ).
   * 
   * The bound doesn't need the inner product, so we can use it as a 
sufficient condition to
   * check quickly whether the inner product approach is accurate.
   */
  val precisionBound1 = 2.0 * EPSILON * sumSquaredNorm / (normDiff * 
normDiff + EPSILON)

  if (precisionBound1 < precision && (!v1.isInstanceOf[DenseVector]
|| !v2.isInstanceOf[DenseVector])) {
sqDist = sumSquaredNorm - 2.0 * dot(v1, v2)
  } else if (v1.isInstanceOf[SparseVector] || 

[GitHub] spark issue #22893: [SPARK-25868][MLlib] One part of Spark MLlib Kmean Logic...

2018-11-01 Thread KyleLi1985
Github user KyleLi1985 commented on the issue:

https://github.com/apache/spark/pull/22893
  
There is my test for situation sparse-sparse, dense-dense, sparse-dense case
`

  import org.apache.spark.{SparkConf, SparkContext}
  import org.apache.spark.mllib.linalg.{DenseVector, SparseVector, Vector, 
Vectors}
  import com.github.fommil.netlib.F2jBLAS
  import java.util.Date
  
  
  object SparkMLlibTest {
class VectorWithNorm(val vector: Vector, val norm: Double) extends 
Serializable {
  
  def this(vector: Vector) = this(vector, Vectors.norm(vector, 2.0))
  
  def this(array: Array[Double]) = this(Vectors.dense(array))
  
  /** Converts the vector to a dense vector. */
  def toDense: VectorWithNorm = new 
VectorWithNorm(Vectors.dense(vector.toArray), norm)
}
  
def dot(x: Vector, y: Vector): Double = {
  require(x.size == y.size,
"BLAS.dot(x: Vector, y:Vector) was given Vectors with non-matching 
sizes:" +
  " x.size = " + x.size + ", y.size = " + y.size)
  (x, y) match {
case (dx: DenseVector, dy: DenseVector) =>
  dot(dx, dy)
case (sx: SparseVector, dy: DenseVector) =>
  dot(sx, dy)
case (dx: DenseVector, sy: SparseVector) =>
  dot(sy, dx)
case (sx: SparseVector, sy: SparseVector) =>
  dot(sx, sy)
case _ =>
  throw new IllegalArgumentException(s"dot doesn't support 
(${x.getClass}, ${y.getClass}).")
  }
}
  
def dot(x: DenseVector, y: DenseVector): Double = {
  val n = x.size
  new F2jBLAS().ddot(n, x.values, 1, y.values, 1)
}
  
def dot(x: SparseVector, y: DenseVector): Double = {
  val xValues = x.values
  val xIndices = x.indices
  val yValues = y.values
  val nnz = xIndices.length
  
  var sum = 0.0
  var k = 0
  while (k < nnz) {
sum += xValues(k) * yValues(xIndices(k))
k += 1
  }
  sum
}
  
/**
  * dot(x, y)
  */
def dot(x: SparseVector, y: SparseVector): Double = {
  val xValues = x.values
  val xIndices = x.indices
  val yValues = y.values
  val yIndices = y.indices
  val nnzx = xIndices.length
  val nnzy = yIndices.length
  
  var kx = 0
  var ky = 0
  var sum = 0.0
  // y catching x
  while (kx < nnzx && ky < nnzy) {
val ix = xIndices(kx)
while (ky < nnzy && yIndices(ky) < ix) {
  ky += 1
}
if (ky < nnzy && yIndices(ky) == ix) {
  sum += xValues(kx) * yValues(ky)
  ky += 1
}
kx += 1
  }
  sum
}
  
lazy val EPSILON = {
  var eps = 1.0
  while ((1.0 + (eps / 2.0)) != 1.0) {
eps /= 2.0
  }
  eps
}
  
def fastSquaredDistanceAddedPatch(
v1: Vector,
norm1: Double,
v2: Vector,
norm2: Double,
precision: Double = 1e-6): 
Double = {
  val n = v1.size
  require(v2.size == n)
  require(norm1 >= 0.0 && norm2 >= 0.0)
  val sumSquaredNorm = norm1 * norm1 + norm2 * norm2
  val normDiff = norm1 - norm2
  var sqDist = 0.0
  /*
   * The relative error is
   * 
   * EPSILON * ( \|a\|_2^2 + \|b\\_2^2 + 2 |a^T b|) / ( \|a - b\|_2^2 ),
   * 
   * which is bounded by
   * 
   * 2.0 * EPSILON * ( \|a\|_2^2 + \|b\|_2^2 ) / ( (\|a\|_2 - 
\|b\|_2)^2 ).
   * 
   * The bound doesn't need the inner product, so we can use it as a 
sufficient condition to
   * check quickly whether the inner product approach is accurate.
   */
  val precisionBound1 = 2.0 * EPSILON * sumSquaredNorm / (normDiff * 
normDiff + EPSILON)
  
  if (precisionBound1 < precision && (!v1.isInstanceOf[DenseVector]
|| !v2.isInstanceOf[DenseVector])) {
sqDist = sumSquaredNorm - 2.0 * dot(v1, v2)
  } else if (v1.isInstanceOf[SparseVector] || 
v2.isInstanceOf[SparseVector]) {
val dotValue = dot(v1, v2)
sqDist = math.max(sumSquaredNorm - 2.0 * dotValue, 0.0)
val precisionBound2 = EPSILON * (sumSquaredNorm + 2.0 * 
math.abs(dotValue)) /
  (sqDist + EPSILON)
if (precisionBound2 > precision) {
  sqDist = Vectors.sqdist(v1, v2)
}
  } else {
sqDist = 

[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98382 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98382/testReport)**
 for PR 22482 at commit 
[`5a76383`](https://github.com/apache/spark/commit/5a7638397554f4082d5ffd99fe06955a14854ede).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98382/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22482
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98382 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98382/testReport)**
 for PR 22482 at commit 
[`5a76383`](https://github.com/apache/spark/commit/5a7638397554f4082d5ffd99fe06955a14854ede).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22847
  
did you address 
https://github.com/apache/spark/pull/22847#issuecomment-434836278 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22788
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98376/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22788
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22788
  
**[Test build #98376 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98376/testReport)**
 for PR 22788 at commit 
[`3c81840`](https://github.com/apache/spark/commit/3c81840f80432c4d341bf94ce80a399c43a0ef4e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test status success
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4714/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4714/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98381/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98374/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
**[Test build #98381 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98381/testReport)**
 for PR 22897 at commit 
[`da4856e`](https://github.com/apache/spark/commit/da4856ee09017be0ffe4c9b66c64b9cfcee16e4b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22504
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22504: [SPARK-25118][Submit] Persist Driver Logs in Client mode...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22504
  
**[Test build #98374 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98374/testReport)**
 for PR 22504 at commit 
[`4df08bd`](https://github.com/apache/spark/commit/4df08bd56b4cd51c4072aa026bf7f46bc574421d).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22878: [SPARK-25789][SQL] Support for Dataset of Avro

2018-11-01 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/22878#discussion_r230041592
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroEncoder.scala ---
@@ -0,0 +1,533 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.avro
+
+import java.io._
+import java.util.{Map => JMap}
+
+import scala.collection.JavaConverters._
+import scala.language.existentials
+import scala.reflect.ClassTag
+
+import org.apache.avro.Schema
+import org.apache.avro.Schema.Parser
+import org.apache.avro.Schema.Type._
+import org.apache.avro.generic.{GenericData, IndexedRecord}
+import org.apache.avro.reflect.ReflectData
+import org.apache.avro.specific.SpecificRecord
+
+import org.apache.spark.sql.Encoder
+import org.apache.spark.sql.avro.SchemaConverters._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.{GetColumnByOrdinal, 
UnresolvedExtractValue}
+import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.expressions.codegen.Block._
+import org.apache.spark.sql.catalyst.expressions.objects.{LambdaVariable 
=> _, _}
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
GenericArrayData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+/**
+ * A Spark-SQL Encoder for Avro objects
+ */
+object AvroEncoder {
+  /**
+   * Provides an Encoder for Avro objects of the given class
+   *
+   * @param avroClass the class of the Avro object for which to generate 
the Encoder
+   * @tparam T the type of the Avro class, must implement SpecificRecord
+   * @return an Encoder for the given Avro class
+   */
+  def of[T <: SpecificRecord](avroClass: Class[T]): Encoder[T] = {
+AvroExpressionEncoder.of(avroClass)
+  }
+  /**
+   * Provides an Encoder for Avro objects implementing the given schema
+   *
+   * @param avroSchema the Schema of the Avro object for which to generate 
the Encoder
+   * @tparam T the type of the Avro class that implements the Schema, must 
implement IndexedRecord
+   * @return an Encoder for the given Avro Schema
+   */
+  def of[T <: IndexedRecord](avroSchema: Schema): Encoder[T] = {
--- End diff --

In `from_avro`, we are using avro schema in json format string, should we 
consider change to that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4714/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22608
  
It depends on how we're getting the Hadoop images. If we're building 
everything from scratch, we could run everything in one container - though 
having a container run more than one process simultaneously isn't common. It's 
more common to have a single container have a single responsibility / process. 
But you can group multiple containers that have related responsibilities into a 
single pod, hence we'll use 3 containers in one pod here.

If we're pulling Hadoop images from elsewhere - which it sounds like we 
aren't doing in the Apache ecosystem in general though - then we'd need to 
build our own separate image for the KDC anyways.

Multiple containers in the same pod all share the same resource footprint 
and limit boundaries.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22864
  
**[Test build #4405 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4405/testReport)**
 for PR 22864 at commit 
[`72cf70a`](https://github.com/apache/spark/commit/72cf70a47bef979e3e625edc8fb8610632f886d3).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
**[Test build #98381 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98381/testReport)**
 for PR 22897 at commit 
[`da4856e`](https://github.com/apache/spark/commit/da4856ee09017be0ffe4c9b66c64b9cfcee16e4b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-11-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r230255203
  
--- Diff: 
resource-managers/kubernetes/docker/src/test/hadoop/conf/yarn-site.xml ---
@@ -0,0 +1,26 @@
+
+
+
+
+
+
+
+  
--- End diff --

You could put this in hdfs-site.xml and avoid having to deal with this 
extra file.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22847
  
@cloud-fan @gatorsmile How about merging this PR first? And then we can 
dissuss those performance issue in other PR?
1. One PR to improve WideTableBenchmark #22823 WIP.
2. One PR to add more tests in WideTableBenchmark.
3. If we can figure out good split threshold based on 2, another PR to 
update that value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-11-01 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/22608
  
> Think we want different images for each

You don't need to, right? You can have a single image with all the stuff 
needed. That would also make setting up the test faster (less images to build).

> just run a pod with those three containers 

That's mostly me still getting used to names here; to me pod == one 
container running with some stuff.

But in any case, my main concern in this case is resource utilization - it 
we can keep things slimmer by running less containers, I think that's better. 
Individually, the NN, DN and the KDC don't need a lot of resources for this 
particular test to run.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15599: [SPARK-18022][SQL] java.lang.NullPointerException instea...

2018-11-01 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15599
  
@shatestest Your problem is different from the issue this PR tries to 
resolve. If you can provide a test case to reproduce it, feel free to open a 
JIRA


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22921
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22921
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98373/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22921: [SPARK-25908][CORE][SQL] Remove old deprecated items in ...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22921
  
**[Test build #98373 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98373/testReport)**
 for PR 22921 at commit 
[`bd4f5ab`](https://github.com/apache/spark/commit/bd4f5ab56f0999b915432d07303dde91b258fc6b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4713/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4713/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22919: [SPARK-25906][SHELL] Restores '-i' option's behaviour in...

2018-11-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22919
  
Let me go ahead with documenting one then tomorrow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98380 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98380/testReport)**
 for PR 22482 at commit 
[`ee67bca`](https://github.com/apache/spark/commit/ee67bcaf6fa2d1ab17e755cb7d5edd5dd10115bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4713/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22905#discussion_r230250684
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala
 ---
@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.types.StructType
+
+/**
+ * An optional mix-in for columnar [[FileFormat]]s. This trait provides 
some helpful metadata when
+ * debugging a physical query plan.
+ */
+private[sql] trait ColumnarFileFormat {
--- End diff --

Thanks for explanation. Mind changing it to `private[datasources]`? I am 
still not clear:

1. If this information is worth enough for metadata and/or why it should be 
there
2. If this information can be generalised
3. The purpose of this info - is this purpose to check if the columns are 
actually being pruned or not?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue:

https://github.com/apache/spark/pull/22482
  
retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22482: WIP - [SPARK-10816][SS] Support session window natively

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22482
  
**[Test build #98379 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98379/testReport)**
 for PR 22482 at commit 
[`ee67bca`](https://github.com/apache/spark/commit/ee67bcaf6fa2d1ab17e755cb7d5edd5dd10115bc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22818
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22818
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98371/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22818: [SPARK-25904][CORE] Allocate arrays smaller than Int.Max...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22818
  
**[Test build #98371 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98371/testReport)**
 for PR 22818 at commit 
[`ca3efd8`](https://github.com/apache/spark/commit/ca3efd8f636706abf8c994cb75c14432f4e4939a).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22927
  
For SparkR failure, https://issues.apache.org/jira/browse/SPARK-25923 is 
filed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22905#discussion_r230249905
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 
---
@@ -306,7 +306,15 @@ case class FileSourceScanExec(
   withOptPartitionCount
 }
 
-withSelectedBucketsCount
+val withOptColumnCount = relation.fileFormat match {
+  case columnar: ColumnarFileFormat =>
+val sqlConf = relation.sparkSession.sessionState.conf
+val columnCount = columnar.columnCountForSchema(sqlConf, 
requiredSchema)
+withSelectedBucketsCount + ("ColumnCount" -> columnCount.toString)
--- End diff --

The purpose of this info is to check the number of columns actually 
selected, and that information can be shown via logging, no? Why should it be 
exposed in metadata then?

Maybe debug logging that shows the number of columns that actually being 
selected via the underlying source.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integration T...

2018-11-01 Thread mccheah
Github user mccheah commented on the issue:

https://github.com/apache/spark/pull/22608
  
> You seem to be running different pods for KDC, NN and DN. Is there an 
advantage to that?
> 
> Seems to me you could do the same thing with a single pod and simplify 
things here.
> 
> The it README also mentions "3 CPUs and 4G of memory". Is that still 
enough with these new things that are run?

Think we want different images for each, but that's fine - just run a pod 
with those three containers in it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
**[Test build #98378 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98378/testReport)**
 for PR 22897 at commit 
[`4152c76`](https://github.com/apache/spark/commit/4152c76872f8c3145c1ac5038652ea8c4be13441).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98370/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #98370 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98370/testReport)**
 for PR 21688 at commit 
[`66a14bc`](https://github.com/apache/spark/commit/66a14bcc89207834e39cb290cc422dbaa252acb0).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22847: [SPARK-25850][SQL] Make the split threshold for t...

2018-11-01 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22847#discussion_r230248773
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -812,6 +812,17 @@ object SQLConf {
 .intConf
 .createWithDefault(65535)
 
+  val CODEGEN_METHOD_SPLIT_THRESHOLD = 
buildConf("spark.sql.codegen.methodSplitThreshold")
+.internal()
+.doc("The threshold of source code length without comment of a single 
Java function by " +
+  "codegen to be split. When the generated Java function source code 
exceeds this threshold" +
+  ", it will be split into multiple small functions. We can't know how 
many bytecode will " +
+  "be generated, so use the code length as metric. A function's 
bytecode should not go " +
+  "beyond 8KB, otherwise it will not be JITted; it also should not be 
too small, otherwise " +
+  "there will be many function calls.")
+.intConf
--- End diff --

@rednaxelafx the wide table benchmark I used has 400 columns, whole stage 
codegen is disabled by default.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22847: [SPARK-25850][SQL] Make the split threshold for the code...

2018-11-01 Thread yucai
Github user yucai commented on the issue:

https://github.com/apache/spark/pull/22847
  
I used the WideTableBenchmark to test this configuration.
4 scenarioes are tested, `2048` is always better than `1024`, overall it is 
also good and looks more safe to avoid hitting 8KB limitaion. 

**Scenario**
1. projection on wide table: simple
```
val N = 1 << 20
val df = spark.range(N)
val columns = (0 until 400).map{ i => s"id as id$i"}
df.selectExpr(columns: _*).foreach(identity(_))
```
2. projection on wide table: long alias names
```
val longName = "averylongaliasname" * 20
val columns = (0 until 400).map{ i => s"id as ${longName}_id$i"}
df.selectExpr(columns: _*).foreach(identity(_))
```
3. projection on wide table: many complex expressions
```
// 400 columns, whole stage codegen is disabled for 
spark.sql.codegen.maxFields
val columns = (0 until 400).map{ i => s"case when id = $i then $i else 800 
end as id$i"}
df.selectExpr(columns: _*).foreach(identity(_))
```
4. projection on wide table: a big complex expressions
```
// Because of spark.sql.subexpressionElimination.enabled,
// the whole case when codes will be put into one function,
// and it will be invoked once only.
val columns = (0 until 400).map{ i =>
s"case when id = ${N + 1} then 1
   when id = ${N + 2} then 1
   ...
   when id = ${N + 6} then 1
   else sqrt(N) end as id$i"}
df.selectExpr(columns: _*).foreach(identity(_))
```

**Perf Results**
```


projection on wide table: simple



Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
projection on wide table: simple:Best/Avg Time(ms)Rate(M/s)   
Per Row(ns)   Relative


split threshold 107553 / 7676  0.1  
  7202.7   1.0X
split threshold 100   5463 / 5504  0.2  
  5210.0   1.4X
split threshold 1024  2981 / 3017  0.4  
  2843.0   2.5X
split threshold 2048  2857 / 2897  0.4  
  2724.2   2.6X
split threshold 4096  3128 / 3187  0.3  
  2983.3   2.4X
split threshold 8196  3755 / 3793  0.3  
  3581.3   2.0X
split threshold 65536   27616 / 27685  0.0  
 26336.2   0.3X




projection on wide table: long alias names



Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
projection on wide table: long alias names: Best/Avg Time(ms)Rate(M/s)  
 Per Row(ns)   Relative


split threshold 107513 / 7566  0.1  
  7164.6   1.0X
split threshold 100   5363 / 5410  0.2  
  5114.4   1.4X
split threshold 1024  2966 / 2998  0.4  
  2828.3   2.5X
split threshold 2048  2840 / 2864  0.4  
  2708.0   2.6X
split threshold 4096  3126 / 3166  0.3  
  2981.2   2.4X
split threshold 8196  3756 / 3823  0.3  
  3582.3   2.0X
split threshold 65536   27542 / 27729  0.0  
 26266.4   0.3X




projection on wide table: many complex expressions



Java HotSpot(TM) 64-Bit Server VM 1.8.0_162-b12 on Mac OS X 10.13.6
Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
projection on wide table: complex expressions 1: Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative


split threshold 108758 / 9007  0.1  
  8352.3   1.0X
split threshold 100   8675 / 8754  0.1  
  8272.9   

[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22927
  
Thank you, @squito and @gatorsmile . I address the review comments.
The SparkR failure looks irrelevant to this. I also observed that in 
another unrelated PR (https://github.com/apache/spark/pull/22924), too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22927
  
**[Test build #98377 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98377/testReport)**
 for PR 22927 at commit 
[`85a5864`](https://github.com/apache/spark/commit/85a5864a5b6a910f3cc702d0407a5e015de2efcc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22864
  
**[Test build #4406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4406/testReport)**
 for PR 22864 at commit 
[`72cf70a`](https://github.com/apache/spark/commit/72cf70a47bef979e3e625edc8fb8610632f886d3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22927
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22927
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4712/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22788
  
**[Test build #98376 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98376/testReport)**
 for PR 22788 at commit 
[`3c81840`](https://github.com/apache/spark/commit/3c81840f80432c4d341bf94ce80a399c43a0ef4e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22788
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4711/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22788: [SPARK-25769][SQL]make UnresolvedAttribute.sql escape ne...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22788
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22917: [SPARK-25827][CORE] Avoid converting incoming encrypted ...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22917
  
**[Test build #4404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4404/testReport)**
 for PR 22917 at commit 
[`d485c9a`](https://github.com/apache/spark/commit/d485c9a6ce21565ebb7fe734305862415bcdc814).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22927
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22927
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98368/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22927: [SPARK-25918][SQL] LOAD DATA LOCAL INPATH should handle ...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22927
  
**[Test build #98368 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport)**
 for PR 22927 at commit 
[`efb99da`](https://github.com/apache/spark/commit/efb99da8fb505aaeeb0d95fff99c245bd3c0a0b8).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22864: [SPARK-25861][Minor][WEBUI] Remove unused refreshInterva...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22864
  
**[Test build #4403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4403/testReport)**
 for PR 22864 at commit 
[`72cf70a`](https://github.com/apache/spark/commit/72cf70a47bef979e3e625edc8fb8610632f886d3).
 * This patch **fails SparkR unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22919: [SPARK-25906][SHELL] Restores '-i' option's behaviour in...

2018-11-01 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22919
  
True. but it's a bit difficult to say it's a downside because `-i` has a 
quite major issue as described above - user's code suddenly does not work with 
implicit method (like symbol `'id` or `.toDF` which are pretty common) in minor 
release bumpup. Also, looks the only reason why changed `:load` to `:paste` in 
`-i` option is a bug (SI-7898).

For documentation, I can mention we can use Scala shell flags. Looks 
@cloud-fan is going to leave this JIRA as a known issue so I guess we are good. 
Maybe I can leave an additional note in migration guide as well if we're not 
going ahead to fix it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4710/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4710/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-11-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r230227737
  
--- Diff: dev/make-distribution.sh ---
@@ -191,7 +191,8 @@ fi
 # Only create and copy the dockerfiles directory if the kubernetes 
artifacts were built.
 if [ -d "$SPARK_HOME"/resource-managers/kubernetes/core/target/ ]; then
   mkdir -p "$DISTDIR/kubernetes/"
-  cp -a 
"$SPARK_HOME"/resource-managers/kubernetes/docker/src/main/dockerfiles 
"$DISTDIR/kubernetes/"
+  cp -a "$SPARK_HOME"/resource-managers/kubernetes/docker/src 
"$DISTDIR/kubernetes/"
+  cp -a 
"$SPARK_HOME"/resource-managers/kubernetes/integration-tests/scripts 
"$DISTDIR/kubernetes/"
--- End diff --

This is following the existing pattern in the line below; but is there a 
purpose in packaging these test artifacts with a binary Spark distribution?

Seems to me like they should be left in the source package and that's it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-11-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r230228691
  
--- Diff: 
resource-managers/kubernetes/docker/src/test/scripts/run-kerberos-test.sh ---
@@ -0,0 +1,40 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+sed -i -e 's/#//' -e 's/default_ccache_name/# default_ccache_name/' 
/etc/krb5.conf
+export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true 
-Dsun.security.krb5.debug=true"
+export HADOOP_JAAS_DEBUG=true
+export HADOOP_ROOT_LOGGER=DEBUG,console
+cp ${TMP_KRB_LOC} /etc/krb5.conf
+cp ${TMP_CORE_LOC} /opt/spark/hconf/core-site.xml
+cp ${TMP_HDFS_LOC} /opt/spark/hconf/hdfs-site.xml
+mkdir -p /etc/krb5.conf.d
+/opt/spark/bin/spark-submit \
+  --deploy-mode cluster \
+  --class ${CLASS_NAME} \
+  --master k8s://${MASTER_URL} \
+  --conf spark.kubernetes.namespace=${NAMESPACE} \
+  --conf spark.executor.instances=1 \
+  --conf spark.app.name=spark-hdfs \
+  --conf 
spark.driver.extraClassPath=/opt/spark/hconf/core-site.xml:/opt/spark/hconf/hdfs-site.xml:/opt/spark/hconf/yarn-site.xml:/etc/krb5.conf
 \
--- End diff --

Adding files to the classpath does not do anything.

```
$ scala -cp /etc/krb5.conf
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = null

$ scala -cp /etc
scala> getClass().getResource("/krb5.conf")
res0: java.net.URL = file:/etc/krb5.conf
```

So this seems not needed. Also because I'd expect spark-submit or the k8s 
backend code to add the hadoop conf to the driver's classpath somehow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22608: [SPARK-25750][K8S][TESTS] Kerberos Support Integr...

2018-11-01 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22608#discussion_r230228286
  
--- Diff: 
resource-managers/kubernetes/docker/src/test/scripts/populate-data.sh ---
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+export JAVA_HOME=/usr/lib/jvm/jre-1.8.0-openjdk
+export PATH=/hadoop/bin:$PATH
+export HADOOP_CONF_DIR=/hadoop/etc/hadoop
+export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true 
-Dsun.security.krb5.debug=true ${HADOOP_OPTS}"
+export KRB5CCNAME=KRBCONF
+mkdir -p /hadoop/etc/data
+cp ${TMP_KRB_LOC} /etc/krb5.conf
+cp ${TMP_CORE_LOC} /hadoop/etc/hadoop/core-site.xml
+cp ${TMP_HDFS_LOC} /hadoop/etc/hadoop/hdfs-site.xml
+
+until kinit -kt /var/keytabs/hdfs.keytab 
hdfs/nn.${NAMESPACE}.svc.cluster.local; do sleep 2; done
+
+until (echo > /dev/tcp/nn.${NAMESPACE}.svc.cluster.local/9000) >/dev/null 
2>&1; do sleep 2; done
+
+hdfs dfsadmin -safemode wait
+
+
+hdfs dfs -mkdir -p /user/userone/
+hdfs dfs -copyFromLocal /people.txt /user/userone
+
+hdfs dfs -chmod -R 755 /user/userone
+hdfs dfs -chown -R ifilonenko /user/userone
--- End diff --

`ifilonenko`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
**[Test build #98375 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98375/testReport)**
 for PR 22897 at commit 
[`5601d16`](https://github.com/apache/spark/commit/5601d16fde34bb6b35d773466332b400905a8e30).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22897
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98375/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22924
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98362/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22924
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22924: [SPARK-25891][PYTHON] Upgrade to Py4J 0.10.8.1

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22924
  
**[Test build #98362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98362/testReport)**
 for PR 22924 at commit 
[`78127fd`](https://github.com/apache/spark/commit/78127fdbafd4ec6bb9d5a6aeb6f86fbb480d7742).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22897: [SPARK-25875][k8s] Merge code to set up driver command i...

2018-11-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22897
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/4710/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >