[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21739: [SPARK-22187][SS] Update unsaferow format for saved stat...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21739
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21451
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93255/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21739: [SPARK-22187][SS] Update unsaferow format for saved stat...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21739
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93258/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21451: [SPARK-24296][CORE][WIP] Replicate large blocks as a str...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21451
  
**[Test build #93255 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93255/testReport)**
 for PR 21451 at commit 
[`335e26d`](https://github.com/apache/spark/commit/335e26d168dc99e7317175da8732ff691ff512f2).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `public class UploadBlockStream extends BlockTransferMessage `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21739: [SPARK-22187][SS] Update unsaferow format for saved stat...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21739
  
**[Test build #93258 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93258/testReport)**
 for PR 21739 at commit 
[`c262e87`](https://github.com/apache/spark/commit/c262e87afe8736febcb546827f0af22da14a02d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21774: [SPARK-24811][SQL]Avro: add new function from_avro and t...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21774
  
Since Spark doesn't have a persistent UDF API like Hive UDF, I think this 
is the best we can do now. In the future we should migrate this to UDF API so 
that we can register it with a name and use it in SQL.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203607394
  
--- Diff: 
external/avro/src/test/scala/org/apache/spark/sql/avro/AvroCatalystDataConversionSuite.scala
 ---
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.avro
+
+import org.apache.avro.Schema
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.sql.{AvroDataToCatalyst, CatalystDataToAvro, 
RandomDataGenerator}
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions.{ExpressionEvalHelper, 
GenericInternalRow, Literal}
+import org.apache.spark.sql.catalyst.util.{ArrayBasedMapData, 
GenericArrayData, MapData}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+class AvroCatalystDataConversionSuite extends SparkFunSuite with 
ExpressionEvalHelper {
+
+  private def roundTripTest(data: Literal): Unit = {
+val avroType = SchemaConverters.toAvroType(data.dataType, 
data.nullable)
+checkResult(data, avroType, data.eval())
+  }
+
+  private def checkResult(data: Literal, avroType: Schema, expected: Any): 
Unit = {
+checkEvaluation(
+  AvroDataToCatalyst(CatalystDataToAvro(data), new 
SerializableSchema(avroType)),
+  prepareExpectedResult(expected))
+  }
+
+  private def assertFail(data: Literal, avroType: Schema): Unit = {
+intercept[java.io.EOFException] {
+  AvroDataToCatalyst(CatalystDataToAvro(data), new 
SerializableSchema(avroType)).eval()
+}
+  }
+
+  private val testingTypes = Seq(
+BooleanType,
+ByteType,
+ShortType,
+IntegerType,
+LongType,
+FloatType,
+DoubleType,
+DecimalType(8, 0),   // 32 bits decimal without fraction
+DecimalType(8, 4),   // 32 bits decimal
+DecimalType(16, 0),  // 64 bits decimal without fraction
+DecimalType(16, 11), // 64 bits decimal
+DecimalType(38, 0),
+DecimalType(38, 38),
+StringType,
+BinaryType)
+
+  protected def prepareExpectedResult(expected: Any): Any = expected match 
{
+// Spark decimal is converted to avro string=
+case d: Decimal => UTF8String.fromString(d.toString)
+// Spark byte and short both map to avro int
+case b: Byte => b.toInt
+case s: Short => s.toInt
+case row: GenericInternalRow => 
InternalRow.fromSeq(row.values.map(prepareExpectedResult))
+case array: GenericArrayData => new 
GenericArrayData(array.array.map(prepareExpectedResult))
+case map: MapData =>
+  val keys = new GenericArrayData(
+
map.keyArray().asInstanceOf[GenericArrayData].array.map(prepareExpectedResult))
+  val values = new GenericArrayData(
+
map.valueArray().asInstanceOf[GenericArrayData].array.map(prepareExpectedResult))
+  new ArrayBasedMapData(keys, values)
+case other => other
+  }
+
+  testingTypes.foreach { dt =>
+val seed = scala.util.Random.nextLong()
+test(s"single $dt with seed $seed") {
+  val rand = new scala.util.Random(seed)
+  val data = RandomDataGenerator.forType(dt, rand = rand).get.apply()
+  val converter = CatalystTypeConverters.createToCatalystConverter(dt)
+  val input = Literal.create(converter(data), dt)
+  roundTripTest(input)
+}
+  }
+
+  for (_ <- 1 to 5) {
+val seed = scala.util.Random.nextLong()
+val rand = new scala.util.Random(seed)
+val schema = RandomDataGenerator.randomSchema(rand, 5, testingTypes)
+test(s"flat schema ${schema.catalogString} with seed $seed") {
+  val data = RandomDataGenerator.randomRow(rand, schema)
+  val converter = 
CatalystTypeConverters.createToCatalystConverter(schema)
+  val input = Literal.create(converter(data), schema)
+  

[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203606947
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/package.scala ---
@@ -36,4 +40,27 @@ package object avro {
 @scala.annotation.varargs
 def avro(sources: String*): DataFrame = 
reader.format("avro").load(sources: _*)
   }
+
--- End diff --

because avro data source is an external package like kafka data source. 
It's not available in `org.apache.spark.sql.functions`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203606818
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/CatalystDataToAvro.scala 
---
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.io.ByteArrayOutputStream
+
+import org.apache.avro.generic.GenericDatumWriter
+import org.apache.avro.io.{BinaryEncoder, EncoderFactory}
+
+import org.apache.spark.sql.avro.{AvroSerializer, SchemaConverters}
+import org.apache.spark.sql.catalyst.expressions.{Expression, 
UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
+import org.apache.spark.sql.types.{BinaryType, DataType}
+
+case class CatalystDataToAvro(child: Expression) extends UnaryExpression 
with CodegenFallback {
+
+  override lazy val dataType: DataType = BinaryType
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203606783
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.avro.generic.GenericDatumReader
+import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
+
+import org.apache.spark.sql.avro.{AvroDeserializer, SchemaConverters, 
SerializableSchema}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+
+case class AvroDataToCatalyst(child: Expression, avroType: 
SerializableSchema)
+  extends UnaryExpression with CodegenFallback with ExpectsInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(BinaryType)
+
+  override lazy val dataType: DataType =
--- End diff --

the `dataType` is needed in executor side to build `AvroDeserializer`, it's 
better to serialize it instead of recomputing it at executor side.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203606677
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.avro.generic.GenericDatumReader
+import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
+
+import org.apache.spark.sql.avro.{AvroDeserializer, SchemaConverters, 
SerializableSchema}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+
+case class AvroDataToCatalyst(child: Expression, avroType: 
SerializableSchema)
+  extends UnaryExpression with CodegenFallback with ExpectsInputTypes {
--- End diff --

good point. Since the implementation is short, I think it should be easy to 
codegen it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21774: [SPARK-24811][SQL]Avro: add new function from_avr...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21774#discussion_r203606284
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDataToCatalyst.scala 
---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.avro.generic.GenericDatumReader
+import org.apache.avro.io.{BinaryDecoder, DecoderFactory}
+
+import org.apache.spark.sql.avro.{AvroDeserializer, SchemaConverters, 
SerializableSchema}
+import org.apache.spark.sql.catalyst.expressions.{ExpectsInputTypes, 
Expression, UnaryExpression}
+import org.apache.spark.sql.catalyst.expressions.codegen.CodegenFallback
+import org.apache.spark.sql.types.{AbstractDataType, BinaryType, DataType}
+
--- End diff --

This is not a function expression like the ones in SQL core, so 
`ExpressionDescription` can't apply here.  I think we can leave it for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20838
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20838
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93253/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20838
  
**[Test build #93253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93253/testReport)**
 for PR 20838 at commit 
[`2c4f15c`](https://github.com/apache/spark/commit/2c4f15c13efa8b181c8c53bd9a90f4f578a40169).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21700
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21700
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93256/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21700: [SPARK-24717][SS] Split out max retain version of state ...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21700
  
**[Test build #93256 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93256/testReport)**
 for PR 21700 at commit 
[`cf78a2a`](https://github.com/apache/spark/commit/cf78a2a25791a683c0ee36b08bdc79edd54f212a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21782: [SPARK-24816][SQL] SQL interface support repartit...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21782#discussion_r203604973
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/FilterPushdownBenchmark.scala
 ---
@@ -394,6 +394,41 @@ class FilterPushdownBenchmark extends SparkFunSuite 
with BenchmarkBeforeAndAfter
   }
 }
   }
+
+  ignore("Pushdown benchmark for RANGE PARTITION BY/DISTRIBUTE BY") {
--- End diff --

how is this related to pushdown?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21469
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93257/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21469
  
**[Test build #93257 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93257/testReport)**
 for PR 21469 at commit 
[`32d0418`](https://github.com/apache/spark/commit/32d041878b7dcd20794250853063dab4bac09118).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21813: [SPARK 24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21813
  
**[Test build #93260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93260/testReport)**
 for PR 21813 at commit 
[`b5ada3f`](https://github.com/apache/spark/commit/b5ada3feb7d243859714c04ec4fb8c225c1781e0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21813: [SPARK 24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified//
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21813: [SPARK 24424][SQL] Support ANSI-SQL compliant syntax for...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21813
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21813: [SPARK 24424] Support ANSI-SQL compliant syntax f...

2018-07-18 Thread dilipbiswal
GitHub user dilipbiswal opened a pull request:

https://github.com/apache/spark/pull/21813

[SPARK 24424] Support ANSI-SQL compliant syntax for GROUPING SET

## What changes were proposed in this pull request?

Enhances the parser and analyzer to support ANSI compliant syntax for 
GROUPING SET. As part of this change we derive the grouping expressions from 
user supplied groupings in the grouping sets clause.

```SQL
SELECT c1, c2, max(c3) 
FROM t1
GROUP BY GROUPING SETS ((c1), (c1, c2))
```


## How was this patch tested?
Added tests in SQLQueryTestSuite and ResolveGroupingAnalyticsSuite.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dilipbiswal/spark spark-24424

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21813.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21813


commit b5ada3feb7d243859714c04ec4fb8c225c1781e0
Author: Dilip Biswal 
Date:   2018-07-19T05:12:33Z

[SPARK-24424] Support ANSI-SQL compliant syntax for GROUPING SET




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21732: [SPARK-24762][SQL] Aggregator should be able to use Opti...

2018-07-18 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21732
  
ping @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21698
  
> > checkpoint can not guarantee that you shall always get the same output 
...
> 
> IIRC we can checkpoint to HDFS? Then it becomes reliable.

Sure, thanks for clarify on that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21758
  
@mridulm Sorry I missed that message, now I've updated the comment, we can 
continue the discussion on that thread.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21698
  
> checkpoint can not guarantee that you shall always get the same output ...

IIRC we can checkpoint to HDFS? Then it becomes reliable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20856: [SPARK-23731][SQL] FileSourceScanExec throws NullPointer...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20856
  
@HyukjinKwon good analysis!

Currently Spark is a little messy about what shall be serialized and sent 
to executors. Sometimes we just send an entire query tree but only read a few 
properties of it.

It seems to me it would be better to always do codegen at driver side, to 
avoid complex expression/plan operations at executor side.(not sure if it's 
possible, cc @viirya @rednaxelafx @kiszk).

For this particular problem, I think we can just change these `val`s to 
`lazy val` or `def` in `FileSourceScanExec`, with a unit test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21698
  
> I see some discussion about making shuffles deterministic, but it proved 
to be very difficult. Is there a prior discussion on this you can point me to? 
Is it that even if you used fetch-to-disk and had the shuffle-fetch side read 
the map-output in a set order, you'd still have random variations in spills?

Related discussion can be found https://github.com/apache/spark/pull/20414 
. Also, let me list some of the scenarios that might generate non-deterministic 
row ordering below:
**Random Shuffle Blocks Fetch**
We randomize the ordering of block fetches on purpose, for avoiding all the 
nodes fetching from the same node at the same time. That means, we send out 
multiple concurrent pull requests, and the fetched blocks are processed in 
FIFO. Therefore, the row ordering of shuffle output are non-deterministic.
**Shuffle Merge With Spills**
The shuffle using Aggregator (for instance, combineByKey) uses 
ExternalAppendOnlyMap to combine the values. The ExternalAppendOnlyMap claims 
that it keeps the row orders, but it actually uses the hash to compare the 
elements (i.e., HashComparator). Even though the sort algorithm is stable, the 
map sizes can be different when the spilling happens. The requests for 
additional memory might be in different orders. The spilling could be non 
deterministic and thus the resulting order can still be non-deterministic.
**Read From External Data Source**
Some external data sources might generate different row ordering of outputs 
on different read request.

> since we only need to do this sort on RDDs post shuffle

IIUC this is not the case in RDD.repartition(), see 
https://github.com/apache/spark/blob/94c67a76ec1fda908a671a47a2a1fa63b3ab1b06/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L453~L461
 , it requires the input rows are ordered then perform a round-robin style data 
transformation, so I don't see what we can do if the input data type is not 
sortable.

> on a fetch-failure in repartition, fail the entire job

Currently I can't figure out a case that a customer may vote for this 
behavior change, esp. FetchFailure tends to occur more often on  long-running 
jobs on big datasets compared to interactive queries.

> We could add logic to detect whether even an order-dependent operation 
was safe to retry -- eg. repartition just after reading from hdfs or after a 
checkpoint can be done as it is now. Each stage would need to know this based 
on extra properties of all the RDDs it was defined from.

This is something I'm also trying to figure out, that we shall enable users 
to tell Spark that an RDD will generate deterministic output, so you don't have 
to worry about data correctness issue over these RDDs. Please also note that 
actually checkpoint can not guarantee that you shall always get the same output 
on each read operation, because you may have executorLost, and then you have to 
recompute the partitions thus may fetch different data.

> Honestly I consider this bug so serious I'd consider loudly warning from 
every api which suffers from this if we can't fix -- make them deprecated and 
log a warning.

We shall definitely update the comments, but shall we make the apis 
deprecated? I can't say I agree or disagree on this. I'm still trying to extend 
the current approach to allow data correctness, and the code changes shall be 
well flagged off. Maybe we can revisit the deprecated apis proposal after that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21795: [SPARK-24840][SQL] do not use dummy filter to swi...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21795


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21795: [SPARK-24840][SQL] do not use dummy filter to switch cod...

2018-07-18 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21795
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21635: [SPARK-24594][YARN] Introducing metrics for YARN

2018-07-18 Thread attilapiros
Github user attilapiros commented on a diff in the pull request:

https://github.com/apache/spark/pull/21635#discussion_r203594956
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMasterSource.scala
 ---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy.yarn
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.metrics.source.Source
+
+private[spark] class ApplicationMasterSource(yarnAllocator: YarnAllocator) 
extends Source {
+
+  override val sourceName: String = "applicationMaster"
--- End diff --

I like the idea to make the metric names more app specific. So I will 
prepend the app ID to the sourcename. And rerun my test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...

2018-07-18 Thread markhamstra
Github user markhamstra commented on the issue:

https://github.com/apache/spark/pull/21589
  
Thank you, @HyukjinKwon 

There are a significant number of Spark users who use the Job Scheduler 
model with a SparkContext shared across many users and many Jobs. Promoting 
tools and patterns based upon the number of core or executors that a 
SparkContext has access to, encouraging users to create Jobs that try to use 
all of the available cores, very much leads those users in the wrong direction.

As much as possible, the public API should target policy that addresses 
real user problems (all users, not just a subset), and avoid targeting the 
particulars of Spark's internal implementation. A `repartition` that is 
extended to support policy or goal declarations (things along the lines of 
`repartition(availableCores)`, `repartition(availableDataNodes)`, 
`repartition(availableExecutors)`, `repartition(unreservedCores)`, etc.), 
relying upon Spark's internals (with it's compete knowledge of the total number 
of cores and executors, scheduling pool shares, number of reserved Task nodes 
sought in barrier scheduling, number of active Jobs, Stages, Tasks and 
Sessions, etc.) may be something that I can get behind. Exposing a couple of 
current Spark scheduler implementation details in the expectation that some 
subset of users in some subset of use cases will be able to make correct use of 
them is not. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21638: [SPARK-22357][CORE] SparkContext.binaryFiles igno...

2018-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/21638#discussion_r203589083
  
--- Diff: 
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
   def setMinPartitions(sc: SparkContext, context: JobContext, 
minPartitions: Int) {
 val defaultMaxSplitBytes = 
sc.getConf.get(config.FILES_MAX_PARTITION_BYTES)
 val openCostInBytes = sc.getConf.get(config.FILES_OPEN_COST_IN_BYTES)
-val defaultParallelism = sc.defaultParallelism
+val defaultParallelism = Math.max(sc.defaultParallelism, minPartitions)
--- End diff --

I metioned `BinaryFileRDD` not this method, you can check the code to see 
how it handles the default value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21533
  
Please also update the title and PR description because we changed the 
proposed solution in the middle.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21455: [SPARK-24093][DStream][Minor]Make some fields of ...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21455


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21812: SPARK UI K8S : this parameter's illustration(spar...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21812


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21777: [WIP][SPARK-24498][SQL] Add JDK compiler for runtime cod...

2018-07-18 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/21777
  
btw, it seems this pr exceeds the current timeout... Any way to 
temporarily make the timeout longer?   We always need to configure timeout in 
the Jenkins-side like 
https://github.com/apache/spark/pull/20222#issuecomment-357004091?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18477: [SPARK-21261][DOCS]SQL Regex document fix

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18477


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21781: [INFRA] Close stale PR

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21781


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21787: [SPARK-24568] Code refactoring for DataType equal...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21787


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21095: [SPARK-23529][K8s] Support mounting hostPath volu...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21095


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19233: [Spark-22008][Streaming]Spark Streaming Dynamic A...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19233


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21240: [SPARK-21274][SQL] Add a new generator function r...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21240


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12904: [SPARK-15125][SQL] Changing CSV data source mappi...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12904


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20100: [SPARK-22913][SQL] Improved Hive Partition Prunin...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20100


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16910: [SPARK-19575][SQL]Reading from or writing to a hi...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16910


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21731: Update example to work locally

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21731


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21453: Test branch to see how Scala 2.11.12 performs

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21453


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #12951: [SPARK-15176][Core] Add maxShares setting to Pool...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12951


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18918: [SPARK-21707][SQL]Improvement a special case for ...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18918


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13143: [SPARK-15359] [Mesos] Mesos dispatcher should han...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13143


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21726: Branch 2.3

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21726


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17422: [SPARK-20087] Attach accumulators / metrics to 'T...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17422


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19510


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18268: [SPARK-21054] [SQL] Reset Command support reset s...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18268


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20437: [SPARK-23270][Streaming][WEB-UI]FileInputDStream ...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20437


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18034: [SPARK-20797][MLLIB]fix LocalLDAModel.save() bug.

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18034


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20319: [SPARK-22884][ML][TESTS] ML test for StructuredSt...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20319


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20543: [SPARK-23357][CORE] 'SHOW TABLE EXTENDED LIKE pat...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20543


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19758: [SPARK-3162][MLlib] Local Tree Training Pt 1: Ref...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19758


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19274: [SPARK-22056][Streaming] Add subconcurrency for K...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19274


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17973: [SPARK-20731][SQL] Add ability to change or omit ...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17973


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18229: [SPARK-20691][CORE] Difference between Storage Me...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18229


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20304: [SPARK-23139]Read eventLog file with mixed encodi...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20304


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21261: [SPARK-24203][core] Add spark.executor.bindAddres...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21261


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20177: [SPARK-22954][SQL] Fix the exception thrown by An...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20177


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17894: [WIP][SPARK-17134][ML] Use level 2 BLAS operation...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17894


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19420: [SPARK-22191] [SQL] Add hive serde example with s...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19420


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18125: [SPARK-20891][SQL] Reduce duplicate code typedagg...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18125


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19456: [SPARK] [Scheduler] Configurable default scheduli...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19456


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17092: [SPARK-18450][ML] Scala API Change for LSH AND-am...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17092


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17619: [SPARK-19755][Mesos] Blacklist is always active f...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17619


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14653: [SPARK-10931][PYSPARK][ML] PySpark ML Models shou...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14653


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20090: [SPARK-22907]Clean broadcast garbage when IOExcep...

2018-07-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20090


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21781: [INFRA] Close stale PR

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21781
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21812: SPARK UI K8S : this parameter's illustration(spark.kuber...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21812
  
@hehuiyuan, please ask a question via a mailing list. See also 
https://spark.apache.org/community.html


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11005
  
BTW, datasource v2 is in progress too to allow more push downs (see 
[SPARK-22386](https://issues.apache.org/jira/browse/SPARK-22386)). You might 
want to take a look


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #11005: [SPARK-12506][SPARK-12126][SQL]use CatalystScan for JDBC...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/11005
  
Technical reason: It's kind of risky to rely on `CatalystScan` and 
completely replace the interface. I think I already see some tests were 
disabled here. Also, there look potential better suggestions above.

Practical reason: there are too many pending PRs as you see. If the author 
is not responsive and the PR is inactive to review comments, we better leave 
them closed for now - seems it's already stuck in few technical reasons. The 
author is welcome to reopen and other contributors are welcome to take over.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21804: [SPARK-24268][SQL] Use datatype.catalogString in ...

2018-07-18 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21804#discussion_r203585703
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala 
---
@@ -145,7 +145,7 @@ abstract class NumericType extends AtomicType {
 }
 
 
-private[sql] object NumericType extends AbstractDataType {
+private[spark] object NumericType extends AbstractDataType {
--- End diff --

(This is just a question...) Is it ok for some types to have 
`private[spark]` and the others to have `private[sql]`? I feel a little 
inconsistent policy for that. Since the other components (e.g., `ml` and 
`mllib`) depend on the `sql` type system, is it bad to make all the  modifiers 
in their types `private[spark]` consistently?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21782: [SPARK-24816][SQL] SQL interface support repartitionByRa...

2018-07-18 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/21782
  
cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21638: [SPARK-22357][CORE] SparkContext.binaryFiles ignore minP...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21638
  
Yea, it's internal to Spark. Might be good to keep it but that concern 
should be secondary IMHO.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20146
  
Meanwhile, will try to take another look to reduce the time, or see if we 
can split the test, or we can request the time limit increase again as a last 
resort.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20146: [SPARK-11215][ML] Add multiple columns support to String...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20146
  
Yea, that looks due to time limit. I am seeing sometimes it hit the time 
limit issue.

The problem is, it's kind of difficult to increase the build time (see also 
https://github.com/appveyor/ci/issues/517). I already increased it from 1 to 
1.5 hours before and looks they encourage to split the tests when it hits the 
time limit which is quite difficult for our case because it takes most of time 
when it builds.

It probably wouldn't not happen often in the master branch build because 
they allow cache but the cache does not work in PR builder.

Usually I tried to reduce the time it takes in SparkR tests (for instance, 
https://github.com/apache/spark/pull/19816).

I was thinking we have 20ish mins left given the AppVeyor build history 
(https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/history) given 
my observation so far. So I roughly guess the time limit issue is temporarily 
happening in the AppVeyor .. can you close and reopen it here to retrigger the 
AppVeyor build?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21799: [SPARK-24852][ML] Update spark.ml to use Instrumentation...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21799
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21799: [SPARK-24852][ML] Update spark.ml to use Instrumentation...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21799
  
**[Test build #93254 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93254/testReport)**
 for PR 21799 at commit 
[`dddccf6`](https://github.com/apache/spark/commit/dddccf6090413867c1be5e8714acd5e463d0970a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21799: [SPARK-24852][ML] Update spark.ml to use Instrumentation...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21799
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93254/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21533
  
**[Test build #93259 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93259/testReport)**
 for PR 21533 at commit 
[`eb46ccf`](https://github.com/apache/spark/commit/eb46ccfec084c2439a26eee38015381f091fe164).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21533
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1110/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21734: [SPARK-24149][YARN][FOLLOW-UP] Add a config to co...

2018-07-18 Thread LantaoJin
Github user LantaoJin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21734#discussion_r203584220
  
--- Diff: 
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala
 ---
@@ -193,8 +193,7 @@ object YarnSparkHadoopUtil {
   sparkConf: SparkConf,
   hadoopConf: Configuration): Set[FileSystem] = {
 val filesystemsToAccess = sparkConf.get(FILESYSTEMS_TO_ACCESS)
-  .map(new Path(_).getFileSystem(hadoopConf))
-  .toSet
+val isRequestAllDelegationTokens = filesystemsToAccess.isEmpty
--- End diff --

@wangyum spark.yarn.access.hadoopFileSystems could be set with HA. 
For example:

` --conf spark.yarn.access.namenodes hdfs://cluster1-ha,hdfs://cluster2-ha`
in hdfs-site.xml
``
`dfs.nameservices`
`cluster1-ha,cluster2-ha`
``
``
`dfs.ha.namenodes.cluster1-ha`
`nn1,nn2`
``
``
`dfs.ha.namenodes.cluster2-ha`
`nn1,nn2`
``



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21812: SPARK UI K8S : this parameter's illustration(spark.kuber...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21812
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21812: SPARK UI K8S : this parameter's illustration(spark.kuber...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21812
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21812: SPARK UI K8S : this parameter's illustration(spark.kuber...

2018-07-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21812
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21803: [SPARK-24849][SQL] Converting a value of StructTy...

2018-07-18 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/21803#discussion_r203584039
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -436,6 +436,14 @@ object StructType extends AbstractDataType {
*/
   def fromDDL(ddl: String): StructType = 
CatalystSqlParser.parseTableSchema(ddl)
 
+  /**
+   * Converts a value of StructType to a string in DDL format. For example:
+   * `StructType(Seq(StructField("a", IntegerType)))` should be converted 
to `a int`
+   */
+  def toDDL(struct: StructType): String = {
+struct.map(field => s"${quoteIdentifier(field.name)} 
${field.dataType.sql}").mkString(",")
--- End diff --

Can this also handle the special character ('\n', '\t', '\', ...) that 
needs an escape?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21533: [SPARK-24195][Core] Bug fix for local:/ path in SparkCon...

2018-07-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21533
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   >