[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21889
  
Just FYI, we are unable to merge it if it has a correctness bug. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93654 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93654/testReport)**
 for PR 21103 at commit 
[`e902974`](https://github.com/apache/spark/commit/e9029746a9cbc204d043cb7a0f9c1c3285284b54).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21857
  
**[Test build #93656 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93656/testReport)**
 for PR 21857 at commit 
[`1f107aa`](https://github.com/apache/spark/commit/1f107aaa1fb4e6f261c1720058877b943c46706d).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21837
  
**[Test build #93658 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93658/testReport)**
 for PR 21837 at commit 
[`5f83902`](https://github.com/apache/spark/commit/5f83902e2876745f8be245681e7cb41d69421778).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21879
  
**[Test build #93655 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93655/testReport)**
 for PR 21879 at commit 
[`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21103
  
**[Test build #93653 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93653/testReport)**
 for PR 21103 at commit 
[`4d01c98`](https://github.com/apache/spark/commit/4d01c9848e021006e2412ebb2db3e37782b5f41a).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21516: [SPARK-24501][MESOS] Add Dispatcher and Driver metrics

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21516
  
**[Test build #93657 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93657/testReport)**
 for PR 21516 at commit 
[`50c1c1e`](https://github.com/apache/spark/commit/50c1c1e810fa27480ae7e72640cc8f67b44a60f1).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93658/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21516: [SPARK-24501][MESOS] Add Dispatcher and Driver metrics

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21516
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93657/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93656/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93655/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21516: [SPARK-24501][MESOS] Add Dispatcher and Driver metrics

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21516
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21857: [SPARK-21274][SQL] Implement EXCEPT ALL clause.

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21857
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93653/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21103
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93654/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21837
  
jenkins, retest this, please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21837
  
**[Test build #93659 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93659/testReport)**
 for PR 21837 at commit 
[`5f83902`](https://github.com/apache/spark/commit/5f83902e2876745f8be245681e7cb41d69421778).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205685897
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
--- End diff --

WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205685498
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+(nullable, avroType.getType) match {
+  case (false, Type.UNION) | (true, Type.UNION) =>
+// avro uses union to represent nullable type.
+val fieldTypes = avroType.getTypes.asScala
+
+// If we're nullable, we need to have at least two types.  Cases 
with more than two types
+// are captured in test("read read-write, read-write w/ schema, 
read") w/ test.avro input
+assert(fieldTypes.length >= 2)
+
+val actualType = catalystType match {
+  case NullType => fieldTypes.filter(_.getType == Type.NULL)
+  case BooleanType => fieldTypes.filter(_.getType == Type.BOOLEAN)
+  case ByteType => fieldTypes.filter(_.getType == Type.INT)
+  case BinaryType =>
+val at = fieldTypes.filter(x => x.getType == Type.BYTES || 
x.getType == Type.FIXED)
+if (at.length > 1) {
+  throw new IncompatibleSchemaException(
+s"Cannot resolve schema of ${catalystType} against union 
${avroType.toString}")
+} else {
+  at
+}
+  case ShortType | IntegerType => fieldTypes.filter(_.getType == 
Type.INT)
+  case LongType => fieldTypes.filter(_.getType == Type.LONG)
+  case FloatType => fieldTypes.filter(_.getType == Type.FLOAT)
+  case DoubleType => fieldTypes.filter(_.getType == Type.DOUBLE)
+  case d: DecimalType => fieldTypes.filter(_.getType == 
Type.STRING)
+  case StringType => fieldTypes
+.filter(x => x.getType == Type.STRING || x.getType == 
Type.ENUM)
+  case DateType => fieldTypes.filter(x => x.getType == Type.INT || 
x.getType == Type.LONG)
+  case TimestampType => fieldTypes.filter(_.getType == Type.LONG)
+  case ArrayType(et, containsNull) =>
+// Find array that matches the element type specified
+fieldTypes.filter(x => x.getType == Type.ARRAY
+  && typeMatchesSchema(et, x.getElementType))
+  case st: StructType => // Find the matching record!
+val recordTypes = fieldTypes.filter(x => x.getType == 
Type.RECORD)
+if (recordTypes.length > 1) {
+  throw new IncompatibleSchemaException(
+"Unions of multiple record types are NOT supported with 
user-specified schema")
+}
+recordTypes
+  case MapType(kt, vt, valueContainsNull) =>
+// Find the map that matches the value type.  Maps in Avro are 
always key type string
+fieldTypes.filter(x => x.getType == Type.MAP && 
typeMatchesSchema(vt, x.getValueType))
+  case other =>
+throw new IncompatibleSchemaException(s"Unexpected type: 
$other")
+}
+
+assert(actualType.length == 1)
+actualType.head
+  case (false, _) | (true, _) => avroType
--- End diff --

case _ => avroType


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205685302
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
--- End diff --

rename to `resolveUnionType`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205648911
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -87,17 +88,33 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   case d: DecimalType =>
 (getter, ordinal) => getter.getDecimal(ordinal, d.precision, 
d.scale).toString
   case StringType =>
-(getter, ordinal) => new 
Utf8(getter.getUTF8String(ordinal).getBytes)
+(getter, ordinal) =>
+  if (avroType.getType == Type.ENUM) {
+new GenericData.EnumSymbol(avroType, 
getter.getUTF8String(ordinal).toString)
+  } else {
+new Utf8(getter.getUTF8String(ordinal).getBytes)
+  }
   case BinaryType =>
-(getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
+(getter, ordinal) =>
+  val data = getter.getBinary(ordinal)
+  if (avroType.getType == Type.FIXED) {
+// Handles fixed-type fields in output schema.  Test case is 
included in test.avro
+// as it includes several fixed fields that would fail if we 
specify schema
+// on-write without this condition
+val fixed = new GenericData.Fixed(avroType)
+fixed.bytes(data)
+fixed
+  } else {
+ByteBuffer.wrap(data)
+  }
--- End diff --

This might be slow. In the executors, when each row is going to be 
serialized, the whole `if-else` will be executed again and agin to get a 
specialized converter. We can consider to resolve the specialized types earlier 
in driver by
```scala
import org.apache.avro.generic.GenericData.{Fixed, EnumSymbol}
...
  case StringType =>
if (avroType.getType == Type.ENUM) {
  (getter, ordinal) => new EnumSymbol(avroType, 
getter.getUTF8String(ordinal).toString)
} else {
  (getter, ordinal) => new 
Utf8(getter.getUTF8String(ordinal).getBytes)
}
  case BinaryType =>
if (avroType.getType == Type.FIXED) {
  (getter, ordinal) => new Fixed(avroType, 
getter.getBinary(ordinal))
} else {
  (getter, ordinal) => ByteBuffer.wrap(getter.getBinary(ordinal))
}
```
so the returned lambda expression will not have any check on `FIXED` or 
`ENUM` types.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205685728
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+(nullable, avroType.getType) match {
+  case (false, Type.UNION) | (true, Type.UNION) =>
+// avro uses union to represent nullable type.
+val fieldTypes = avroType.getTypes.asScala
+
+// If we're nullable, we need to have at least two types.  Cases 
with more than two types
+// are captured in test("read read-write, read-write w/ schema, 
read") w/ test.avro input
+assert(fieldTypes.length >= 2)
--- End diff --

When it's non-nullable, is it possible to have `fieldTypes.length == 1`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205683257
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -148,7 +165,8 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
 val avroFields = avroStruct.getFields
 assert(avroFields.size() == catalystStruct.length)
 val fieldConverters = catalystStruct.zip(avroFields.asScala).map {
-  case (f1, f2) => newConverter(f1.dataType, 
resolveNullableType(f2.schema(), f1.nullable))
+  case (f1, f2) => newConverter(f1.dataType, resolveNullableType(
+f2.schema(), f1.dataType, f1.nullable))
--- End diff --

Nit, formating,

```scala
case (f1, f2) => 
  newConverter(f1.dataType, resolveNullableType(f2.schema(), f1.dataType, 
f1.nullable))

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21879
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO support shou...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/21847
  
Since the data type in Spark is certain, why do we need to support output 
Avro schema like `["int", "long", "null"]`? 
Can we just forbid such usage by having a rule for the schema: If the Avro 
type is Union, it has at most two types and one of it is Null type.
Otherwise things can be complicated. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205684257
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+(nullable, avroType.getType) match {
--- End diff --

Since the code is complicated and long, maybe it's easier to read with just 
old fashion `if-else`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205687174
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+(nullable, avroType.getType) match {
+  case (false, Type.UNION) | (true, Type.UNION) =>
+// avro uses union to represent nullable type.
+val fieldTypes = avroType.getTypes.asScala
+
+// If we're nullable, we need to have at least two types.  Cases 
with more than two types
+// are captured in test("read read-write, read-write w/ schema, 
read") w/ test.avro input
+assert(fieldTypes.length >= 2)
+
+val actualType = catalystType match {
+  case NullType => fieldTypes.filter(_.getType == Type.NULL)
+  case BooleanType => fieldTypes.filter(_.getType == Type.BOOLEAN)
+  case ByteType => fieldTypes.filter(_.getType == Type.INT)
+  case BinaryType =>
+val at = fieldTypes.filter(x => x.getType == Type.BYTES || 
x.getType == Type.FIXED)
+if (at.length > 1) {
+  throw new IncompatibleSchemaException(
+s"Cannot resolve schema of ${catalystType} against union 
${avroType.toString}")
+} else {
+  at
+}
+  case ShortType | IntegerType => fieldTypes.filter(_.getType == 
Type.INT)
+  case LongType => fieldTypes.filter(_.getType == Type.LONG)
+  case FloatType => fieldTypes.filter(_.getType == Type.FLOAT)
+  case DoubleType => fieldTypes.filter(_.getType == Type.DOUBLE)
+  case d: DecimalType => fieldTypes.filter(_.getType == 
Type.STRING)
+  case StringType => fieldTypes
+.filter(x => x.getType == Type.STRING || x.getType == 
Type.ENUM)
+  case DateType => fieldTypes.filter(x => x.getType == Type.INT || 
x.getType == Type.LONG)
+  case TimestampType => fieldTypes.filter(_.getType == Type.LONG)
+  case ArrayType(et, containsNull) =>
+// Find array that matches the element type specified
+fieldTypes.filter(x => x.getType == Type.ARRAY
+  && typeMatchesSchema(et, x.getElementType))
+  case st: StructType => // Find the matching record!
+val recordTypes = fieldTypes.filter(x => x.getType == 
Type.RECORD)
+if (recordTypes.length > 1) {
+  throw new IncompatibleSchemaException(
+"Unions of multiple record types are NOT supported with 
user-specified schema")
+}
+recordTypes
+  case MapType(kt, vt, valueContainsNull) =>
+// Find the map that matches the value type.  Maps in Avro are 
always key type string
+fieldTypes.filter(x => x.getType == Type.MAP && 
typeMatchesSchema(vt, x.getValueType))
+  case other =>
+throw new IncompatibleSchemaException(s"Unexpected type: 
$other")
+}
+
+assert(actualType.length == 1)
--- End diff --

We need to show error message if the length is not 1.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1397/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21879
  
**[Test build #93660 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93660/testReport)**
 for PR 21879 at commit 
[`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205692778
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+(nullable, avroType.getType) match {
+  case (false, Type.UNION) | (true, Type.UNION) =>
+// avro uses union to represent nullable type.
+val fieldTypes = avroType.getTypes.asScala
+
+// If we're nullable, we need to have at least two types.  Cases 
with more than two types
+// are captured in test("read read-write, read-write w/ schema, 
read") w/ test.avro input
+assert(fieldTypes.length >= 2)
+
+val actualType = catalystType match {
+  case NullType => fieldTypes.filter(_.getType == Type.NULL)
+  case BooleanType => fieldTypes.filter(_.getType == Type.BOOLEAN)
+  case ByteType => fieldTypes.filter(_.getType == Type.INT)
+  case BinaryType =>
+val at = fieldTypes.filter(x => x.getType == Type.BYTES || 
x.getType == Type.FIXED)
+if (at.length > 1) {
+  throw new IncompatibleSchemaException(
+s"Cannot resolve schema of ${catalystType} against union 
${avroType.toString}")
+} else {
+  at
+}
+  case ShortType | IntegerType => fieldTypes.filter(_.getType == 
Type.INT)
+  case LongType => fieldTypes.filter(_.getType == Type.LONG)
+  case FloatType => fieldTypes.filter(_.getType == Type.FLOAT)
+  case DoubleType => fieldTypes.filter(_.getType == Type.DOUBLE)
+  case d: DecimalType => fieldTypes.filter(_.getType == 
Type.STRING)
+  case StringType => fieldTypes
+.filter(x => x.getType == Type.STRING || x.getType == 
Type.ENUM)
+  case DateType => fieldTypes.filter(x => x.getType == Type.INT || 
x.getType == Type.LONG)
+  case TimestampType => fieldTypes.filter(_.getType == Type.LONG)
+  case ArrayType(et, containsNull) =>
+// Find array that matches the element type specified
+fieldTypes.filter(x => x.getType == Type.ARRAY
+  && typeMatchesSchema(et, x.getElementType))
+  case st: StructType => // Find the matching record!
+val recordTypes = fieldTypes.filter(x => x.getType == 
Type.RECORD)
+if (recordTypes.length > 1) {
+  throw new IncompatibleSchemaException(
+"Unions of multiple record types are NOT supported with 
user-specified schema")
+}
+recordTypes
+  case MapType(kt, vt, valueContainsNull) =>
+// Find the map that matches the value type.  Maps in Avro are 
always key type string
+fieldTypes.filter(x => x.getType == Type.MAP && 
typeMatchesSchema(vt, x.getValueType))
+  case other =>
+throw new IncompatibleSchemaException(s"Unexpected type: 
$other")
+}
+
+assert(actualType.length == 1)
--- End diff --

Can you elaborate when  `actualType.length == 0` or `actualType.length > 1`?

Is it possible that `catalystType` is `Int`, and `fieldTypes` only contains 
`Long`? Do we want to do the promotion? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21847: [SPARK-24855][SQL][EXTERNAL]: Built-in AVRO suppo...

2018-07-27 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/21847#discussion_r205692946
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroSerializer.scala ---
@@ -165,16 +183,112 @@ class AvroSerializer(rootCatalystType: DataType, 
rootAvroType: Schema, nullable:
   result
   }
 
-  private def resolveNullableType(avroType: Schema, nullable: Boolean): 
Schema = {
-if (nullable) {
-  // avro uses union to represent nullable type.
-  val fields = avroType.getTypes.asScala
-  assert(fields.length == 2)
-  val actualType = fields.filter(_.getType != NULL)
-  assert(actualType.length == 1)
-  actualType.head
+  // Resolve an Avro union against a supplied DataType, i.e. a LongType 
compared against
+  // a ["null", "long"] should return a schema of type Schema.Type.LONG
+  // This function also handles resolving a DataType against unions of 2 
or more types, i.e.
+  // an IntType resolves against a ["int", "long", "null"] will correctly 
return a schema of
+  // type Schema.Type.LONG
+  private def resolveNullableType(avroType: Schema, catalystType: DataType,
+  nullable: Boolean): Schema = {
+(nullable, avroType.getType) match {
+  case (false, Type.UNION) | (true, Type.UNION) =>
+// avro uses union to represent nullable type.
+val fieldTypes = avroType.getTypes.asScala
+
+// If we're nullable, we need to have at least two types.  Cases 
with more than two types
+// are captured in test("read read-write, read-write w/ schema, 
read") w/ test.avro input
+assert(fieldTypes.length >= 2)
+
+val actualType = catalystType match {
+  case NullType => fieldTypes.filter(_.getType == Type.NULL)
+  case BooleanType => fieldTypes.filter(_.getType == Type.BOOLEAN)
+  case ByteType => fieldTypes.filter(_.getType == Type.INT)
+  case BinaryType =>
+val at = fieldTypes.filter(x => x.getType == Type.BYTES || 
x.getType == Type.FIXED)
+if (at.length > 1) {
+  throw new IncompatibleSchemaException(
+s"Cannot resolve schema of ${catalystType} against union 
${avroType.toString}")
+} else {
+  at
+}
+  case ShortType | IntegerType => fieldTypes.filter(_.getType == 
Type.INT)
+  case LongType => fieldTypes.filter(_.getType == Type.LONG)
+  case FloatType => fieldTypes.filter(_.getType == Type.FLOAT)
+  case DoubleType => fieldTypes.filter(_.getType == Type.DOUBLE)
+  case d: DecimalType => fieldTypes.filter(_.getType == 
Type.STRING)
+  case StringType => fieldTypes
+.filter(x => x.getType == Type.STRING || x.getType == 
Type.ENUM)
+  case DateType => fieldTypes.filter(x => x.getType == Type.INT || 
x.getType == Type.LONG)
+  case TimestampType => fieldTypes.filter(_.getType == Type.LONG)
+  case ArrayType(et, containsNull) =>
+// Find array that matches the element type specified
+fieldTypes.filter(x => x.getType == Type.ARRAY
+  && typeMatchesSchema(et, x.getElementType))
+  case st: StructType => // Find the matching record!
+val recordTypes = fieldTypes.filter(x => x.getType == 
Type.RECORD)
+if (recordTypes.length > 1) {
+  throw new IncompatibleSchemaException(
+"Unions of multiple record types are NOT supported with 
user-specified schema")
+}
+recordTypes
+  case MapType(kt, vt, valueContainsNull) =>
+// Find the map that matches the value type.  Maps in Avro are 
always key type string
+fieldTypes.filter(x => x.getType == Type.MAP && 
typeMatchesSchema(vt, x.getValueType))
+  case other =>
+throw new IncompatibleSchemaException(s"Unexpected type: 
$other")
+}
+
+assert(actualType.length == 1)
+actualType.head
+  case (false, _) | (true, _) => avroType
+}
+  }
+
+  // Given a Schema and a DataType, do they match?
+  private def typeMatchesSchema(catalystType: DataType, avroSchema: 
Schema): Boolean = {
+if (catalystType.isInstanceOf[StructType]) {
+  val avroFields = resolveNullableType(avroSchema, catalystType,
+avroSchema.getType == Type.UNION)
+.getFields
+  if (avroFields.size() == 
catalystType.asInstanceOf[StructType].length) {
+
catalystType.asInstanceOf[StructType].zip(avroFields.asScala).forall {
+  case (f1, f2) => typeMatchesSchema(f1.dataType, f2.schema)
+   

[GitHub] spark issue #21067: [SPARK-23980][K8S] Resilient Spark driver on Kubernetes

2018-07-27 Thread baluchicken
Github user baluchicken commented on the issue:

https://github.com/apache/spark/pull/21067
  
Thanks for the responses, I learned a lot from this:) I am going to close 
this PR for now, and maybe collaborate on the Kubernetes ticket raised by this 
PR. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21067: [SPARK-23980][K8S] Resilient Spark driver on Kube...

2018-07-27 Thread baluchicken
Github user baluchicken closed the pull request at:

https://github.com/apache/spark/pull/21067


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
@gatorsmile, just for clarification, you mean some regressions about 
correctness bug in existing features, right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21891: [SPARK-24931][CORE]CoarseGrainedExecutorBackend s...

2018-07-27 Thread bingbai0912
GitHub user bingbai0912 opened a pull request:

https://github.com/apache/spark/pull/21891

[SPARK-24931][CORE]CoarseGrainedExecutorBackend send wrong 'Reason' w…

TaskSetManager## What changes were proposed in this pull request?

When CoarseGrainedExecutorBackend find the executor not available, it will 
send a "RemoveExecutor" message of "ExecutorExited" instead 
"ExecutorLossReason". So it call tell driver whether is the executor 
"exitCausedByApp" which should be false. So when dirver(TaskSetManager) can 
"handleFailedTask" correctly to avoid task failed time up to the 
"maxTaskFailures" and finally cause job failed.

## How was this patch tested?

tested in my own cluster


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bingbai0912/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21891.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21891


commit 3b3f224d6ac2dc3d3a0c21ed14502329af3cbae8
Author: baibing 
Date:   2018-07-27T07:49:50Z

[SPARK-24931][CORE]CoarseGrainedExecutorBackend send wrong 'Reason' when 
executor exits which leading to job failed.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21891: [SPARK-24931][CORE]CoarseGrainedExecutorBackend send wro...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21891
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21891: [SPARK-24931][CORE]CoarseGrainedExecutorBackend send wro...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21891
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21891: [SPARK-24931][CORE]CoarseGrainedExecutorBackend send wro...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21891
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20451: [SPARK-23146][WIP] Support client mode for Kubernetes in...

2018-07-27 Thread echarles
Github user echarles commented on the issue:

https://github.com/apache/spark/pull/20451
  
See #21748


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20451: [SPARK-23146][WIP] Support client mode for Kubern...

2018-07-27 Thread echarles
Github user echarles closed the pull request at:

https://github.com/apache/spark/pull/20451


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21802#discussion_r205702898
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2382,6 +2382,20 @@ def array_sort(col):
 return Column(sc._jvm.functions.array_sort(_to_java_column(col)))
 
 
+@since(2.4)
+def shuffle(col):
+"""
+Collection function: Generates a random permutation of the given array.
+
+.. note:: The function is non-deterministic because its results 
depends on order of rows which
--- End diff --

typo: `results depends` found while reading this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21802#discussion_r205703459
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -1184,6 +1184,110 @@ case class ArraySort(child: Expression) extends 
UnaryExpression with ArraySortLi
   override def prettyName: String = "array_sort"
 }
 
+/**
+ * Returns a random permutation of the given array.
+ */
+@ExpressionDescription(
+  usage = "_FUNC_(array) - Returns a random permutation of the given 
array.",
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 20, 3, 5));
+   [3, 1, 5, 20]
+  > SELECT _FUNC_(array(1, 20, null, 3));
+   [20, null, 3, 1]
+  """, since = "2.4.0")
--- End diff --

We could add `note` here too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21802#discussion_r205703558
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -3545,6 +3545,14 @@ object functions {
*/
   def array_max(e: Column): Column = withExpr { ArrayMax(e.expr) }
 
+  /**
+   * Returns a random permutation of the given array.
+   *
+   * @group collection_funcs
+   * @since 2.4.0
--- End diff --

Shall we match the documentation here as well?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21802: [SPARK-23928][SQL] Add shuffle collection functio...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21802#discussion_r205704109
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -2382,6 +2382,20 @@ def array_sort(col):
 return Column(sc._jvm.functions.array_sort(_to_java_column(col)))
 
 
+@since(2.4)
+def shuffle(col):
+"""
+Collection function: Generates a random permutation of the given array.
+
+.. note:: The function is non-deterministic because its results 
depends on order of rows which
+may be non-deterministic after a shuffle.
+
+:param col: name of column or expression
--- End diff --

Python doctest looks missing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21802
  
Looks good to me too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21826: [SPARK-24872] Replace the symbol '||' of Or opera...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21826#discussion_r205713034
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/PredicateSuite.scala
 ---
@@ -455,4 +456,10 @@ class PredicateSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 interpreted.initialize(0)
 assert(interpreted.eval(new UnsafeRow()))
   }
+
+  test("[SPARK-24872] Replace the symbol '||' of Or operator with 'or'") {
--- End diff --

tiny nit: `[SPARK-24872] ` -> `SPARK-24872: `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-07-27 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/21631
  
> do we still hit the bug when parsing csv data?

I have checked uniVocity 2.7.2, there is no problem on this version.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21802
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21802
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21802
  
**[Test build #93661 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93661/testReport)**
 for PR 21802 at commit 
[`4135690`](https://github.com/apache/spark/commit/4135690f2cf1eea375a1a4f1697c0ffdb7436627).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21631: [SPARK-24645][SQL] Skip parsing when csvColumnPruning en...

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21631
  
@MaxGekk, thanks. mind opening a PR to  upgrade?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.

2018-07-27 Thread skonto
Github user skonto commented on a diff in the pull request:

https://github.com/apache/spark/pull/21748#discussion_r205721972
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest
+
+import org.scalatest.concurrent.Eventually
+import scala.collection.JavaConverters._
+
+import 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, 
INTERVAL, TIMEOUT}
+
+trait ClientModeTestsSuite { k8sSuite: KubernetesSuite =>
+
+  test("Run in client mode.", k8sTestTag) {
+val labels = Map("spark-app-selector" -> driverPodName)
+val driverPort = 7077
+val blockManagerPort = 1
+val driverService = testBackend
+  .getKubernetesClient
+  .services()
+  .inNamespace(kubernetesTestComponents.namespace)
+  .createNew()
+.withNewMetadata()
+  .withName(s"$driverPodName-svc")
+  .endMetadata()
+.withNewSpec()
+  .withClusterIP("None")
+  .withSelector(labels.asJava)
+  .addNewPort()
+.withName("driver-port")
+.withPort(driverPort)
+.withNewTargetPort(driverPort)
+.endPort()
+  .addNewPort()
+.withName("block-manager")
+.withPort(blockManagerPort)
+.withNewTargetPort(blockManagerPort)
+.endPort()
+  .endSpec()
+.done()
+try {
+  val driverPod = testBackend
+.getKubernetesClient
+.pods()
+.inNamespace(kubernetesTestComponents.namespace)
+.createNew()
+  .withNewMetadata()
+  .withName(driverPodName)
+  .withLabels(labels.asJava)
+  .endMetadata()
+.withNewSpec()
+  .withServiceAccountName("default")
--- End diff --

@mccheah if people use spark-rbac.yaml this will fail.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21844: Spark 24873

2018-07-27 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21844
  
@hejiefang can you close this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21103: [SPARK-23915][SQL] Add array_except function

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21103#discussion_r205733831
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3805,3 +3799,233 @@ object ArrayUnion {
 new GenericArrayData(arrayBuffer)
   }
 }
+
+/**
+ * Returns an array of the elements in the intersect of x and y, without 
duplicates
+ */
+@ExpressionDescription(
+  usage = """
+  _FUNC_(array1, array2) - Returns an array of the elements in array1 but 
not in array2,
+without duplicates.
+  """,
+  examples = """
+Examples:
+  > SELECT _FUNC_(array(1, 2, 3), array(1, 3, 5));
+   array(2)
+  """,
+  since = "2.4.0")
+case class ArrayExcept(left: Expression, right: Expression) extends 
ArraySetLike {
--- End diff --

we can overwrite `dataType` while still extending 
`ComplexTypeMergingExpression` to use the checks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21882: [SPARK-24934][SQL] Explicitly whitelist supported...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21882#discussion_r205734835
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -183,6 +183,18 @@ case class InMemoryTableScanExec(
   private val stats = relation.partitionStatistics
   private def statsFor(a: Attribute) = stats.forAttribute(a)
 
+  // Currently, only use statistics from atomic types except binary type 
only.
+  private object ExtractableLiteral {
+def unapply(expr: Expression): Option[Literal] = expr match {
+  case lit: Literal => lit.dataType match {
+case BinaryType => None
--- End diff --

can we also add test for binary type?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21882: [SPARK-24934][SQL] Explicitly whitelist supported types ...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21882
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21844: Spark 24873

2018-07-27 Thread hejiefang
Github user hejiefang closed the pull request at:

https://github.com/apache/spark/pull/21844


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21816: [SPARK-24794][CORE] Driver launched through rest should ...

2018-07-27 Thread bsikander
Github user bsikander commented on the issue:

https://github.com/apache/spark/pull/21816
  
@vanzin Could you please have a look on this change?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21844: Spark 24873

2018-07-27 Thread hejiefang
Github user hejiefang commented on the issue:

https://github.com/apache/spark/pull/21844
  
OK. Sorry, I didn't know I could close it


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21854: [SPARK-24896][SQL] Uuid should produce different values ...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21854
  
regardless of the implementation, is it expected to produce different UUID 
for different micro batches? Personally I think it's reasonable, micro batch 
and continuous execution should produce same result.

cc @tdas @zsxwing @jose-torres 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21850: [SPARK-24892] [SQL] Simplify `CaseWhen` to `If` w...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21850#discussion_r205737337
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -414,6 +414,9 @@ object SimplifyConditionals extends Rule[LogicalPlan] 
with PredicateHelper {
 // these branches can be pruned away
 val (h, t) = branches.span(_._1 != TrueLiteral)
 CaseWhen( h :+ t.head, None)
+
+  case CaseWhen(Seq((cond, trueValue)), elseValue) =>
+If(cond, trueValue, elseValue.getOrElse(Literal(null, 
trueValue.dataType)))
--- End diff --

> optimization rules in If which may not be implemented for CaseWhen case.

shall we just implement more optimizer rules for CASE WHEN to cover all the 
cases?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20405: [SPARK-23229][SQL] Dataset.hint should use planWithBarri...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20405
  
I think this can be removed in favor of 
https://github.com/apache/spark/pull/21822


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21706: [SPARK-24702] Fix Unable to cast to calendar interval in...

2018-07-27 Thread dmateusp
Github user dmateusp commented on the issue:

https://github.com/apache/spark/pull/21706
  
hey @HyukjinKwon thanks for coming back to me on this :)

I'll close the PR now, and start a thread later today on the dev mailing 
list


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21706: [SPARK-24702] Fix Unable to cast to calendar inte...

2018-07-27 Thread dmateusp
Github user dmateusp closed the pull request at:

https://github.com/apache/spark/pull/21706


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21837
  
**[Test build #93659 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93659/testReport)**
 for PR 21837 at commit 
[`5f83902`](https://github.com/apache/spark/commit/5f83902e2876745f8be245681e7cb41d69421778).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21879
  
**[Test build #93660 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93660/testReport)**
 for PR 21879 at commit 
[`93c34da`](https://github.com/apache/spark/commit/93c34da713136eb7b4ed8bb8775353c8219efa22).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21837: [SPARK-24881][SQL] New Avro option - compression

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21837
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93659/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21879: [SPARK-24927][BUILD][BRANCH-2.3] The scope of snappy-jav...

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21879
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93660/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...

2018-07-27 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21732
  
Again, can we always support `Option[Product]` with some special handling 
for top-level encoder expression?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.

2018-07-27 Thread ifilonenko
Github user ifilonenko commented on a diff in the pull request:

https://github.com/apache/spark/pull/21748#discussion_r205752304
  
--- Diff: 
resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/ClientModeTestsSuite.scala
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.deploy.k8s.integrationtest
+
+import org.scalatest.concurrent.Eventually
+import scala.collection.JavaConverters._
+
+import 
org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.{k8sTestTag, 
INTERVAL, TIMEOUT}
+
+trait ClientModeTestsSuite { k8sSuite: KubernetesSuite =>
+
+  test("Run in client mode.", k8sTestTag) {
+val labels = Map("spark-app-selector" -> driverPodName)
+val driverPort = 7077
+val blockManagerPort = 1
+val driverService = testBackend
+  .getKubernetesClient
+  .services()
+  .inNamespace(kubernetesTestComponents.namespace)
+  .createNew()
+.withNewMetadata()
+  .withName(s"$driverPodName-svc")
+  .endMetadata()
+.withNewSpec()
+  .withClusterIP("None")
+  .withSelector(labels.asJava)
+  .addNewPort()
+.withName("driver-port")
+.withPort(driverPort)
+.withNewTargetPort(driverPort)
+.endPort()
+  .addNewPort()
+.withName("block-manager")
+.withPort(blockManagerPort)
+.withNewTargetPort(blockManagerPort)
+.endPort()
+  .endSpec()
+.done()
+try {
+  val driverPod = testBackend
+.getKubernetesClient
+.pods()
+.inNamespace(kubernetesTestComponents.namespace)
+.createNew()
+  .withNewMetadata()
+  .withName(driverPodName)
+  .withLabels(labels.asJava)
+  .endMetadata()
+.withNewSpec()
+  .withServiceAccountName("default")
--- End diff --

+1 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21809: [SPARK-24851][UI] Map a Stage ID to it's Associat...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21809#discussion_r205755070
  
--- Diff: core/src/main/scala/org/apache/spark/status/AppStatusStore.scala 
---
@@ -112,10 +112,14 @@ private[spark] class AppStatusStore(
 }
   }
 
-  def stageAttempt(stageId: Int, stageAttemptId: Int, details: Boolean = 
false): v1.StageData = {
+  def stageAttempt(stageId: Int, stageAttemptId: Int,
--- End diff --

Changing the return type to (StageData, jobIds) might be simpler.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21809: [SPARK-24851][UI] Map a Stage ID to it's Associat...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21809#discussion_r205752305
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -105,16 +105,29 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 val stageAttemptId = parameterAttempt.toInt
 
 val stageHeader = s"Details for Stage $stageId (Attempt 
$stageAttemptId)"
-val stageDataWrapper = parent.store.stageAttempt(stageId, 
stageAttemptId, details = false)
-val stageData = parent.store
-  .asOption(stageDataWrapper.info)
-  .getOrElse {
+var stageDataWrapper: StageDataWrapper = null
+try {
+  stageDataWrapper = parent.store.stageAttempt(stageId, 
stageAttemptId, details = false)
+} catch {
+  case e: NoSuchElementException => e.getMessage
+}
+var stageData: StageData = null
+if (stageDataWrapper != null) {
+  stageData = parent.store
+.asOption(stageDataWrapper.info)
+.get
+} else {
+  stageData = {
--- End diff --

this code branch is unreachable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21809: [SPARK-24851][UI] Map a Stage ID to it's Associat...

2018-07-27 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/21809#discussion_r205754677
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -182,6 +198,15 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
   {Utils.bytesToString(stageData.diskBytesSpilled)}
 
   }}
+  {if (!stageJobIds.isEmpty) {
+
+  Associated Job Ids: 
+  {for (jobId <- stageJobIds) yield {val detailUrl = 
"%s/jobs/job/?id=%s".format(
--- End diff --

Using `map` is more readable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #93662 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93662/testReport)**
 for PR 21584 at commit 
[`131f11f`](https://github.com/apache/spark/commit/131f11f8deb96fa7fa4f78522b73e5bbf2b9345e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #93663 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93663/testReport)**
 for PR 21584 at commit 
[`1f0cba5`](https://github.com/apache/spark/commit/1f0cba59b650e4458e9472933068928d52a54777).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1399/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #93662 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93662/testReport)**
 for PR 21584 at commit 
[`131f11f`](https://github.com/apache/spark/commit/131f11f8deb96fa7fa4f78522b73e5bbf2b9345e).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93662/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #93663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93663/testReport)**
 for PR 21584 at commit 
[`1f0cba5`](https://github.com/apache/spark/commit/1f0cba59b650e4458e9472933068928d52a54777).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait BarrierTaskContext extends TaskContext `
  * `class BarrierTaskInfo(val address: String)`
  * `class RDDBarrier[T: ClassTag](rdd: RDD[T]) `
  * `case class WorkerOffer(`
  * `trait AnalysisHelper extends QueryPlan[LogicalPlan] `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93663/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread ifilonenko
Github user ifilonenko commented on the issue:

https://github.com/apache/spark/pull/21584
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #93664 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93664/testReport)**
 for PR 21584 at commit 
[`1f0cba5`](https://github.com/apache/spark/commit/1f0cba59b650e4458e9472933068928d52a54777).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1400/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21584
  
**[Test build #93664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93664/testReport)**
 for PR 21584 at commit 
[`1f0cba5`](https://github.com/apache/spark/commit/1f0cba59b650e4458e9472933068928d52a54777).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait BarrierTaskContext extends TaskContext `
  * `class BarrierTaskInfo(val address: String)`
  * `class RDDBarrier[T: ClassTag](rdd: RDD[T]) `
  * `case class WorkerOffer(`
  * `trait AnalysisHelper extends QueryPlan[LogicalPlan] `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93664/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21584
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21802
  
**[Test build #93661 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93661/testReport)**
 for PR 21802 at commit 
[`4135690`](https://github.com/apache/spark/commit/4135690f2cf1eea375a1a4f1697c0ffdb7436627).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21802
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21802: [SPARK-23928][SQL] Add shuffle collection function.

2018-07-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21802
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93661/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >