date:20180823

[GitHub] spark pull request #22203: [SPARK-25029][BUILD][CORE] Janino "Two non-abstra...

2018-08-23 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22203#discussion_r212535154
  
--- Diff: dev/deps/spark-deps-hadoop-2.6 ---
@@ -98,7 +98,7 @@ jackson-module-jaxb-annotations-2.6.7.jar
 jackson-module-paranamer-2.7.9.jar
 jackson-module-scala_2.11-2.6.7.1.jar
 jackson-xc-1.9.13.jar
-janino-3.0.8.jar
+janino-3.0.9.jar
--- End diff --

Sorry for being late. I confirmed all of changes in the changelog. Two 
issues https://github.com/janino-compiler/janino/pull/54 and 
https://github.com/janino-compiler/janino/pull/46 are for Java 8 and later 
support. Others are bug fixes.

LGTM.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22199: [SPARK-25073][Yarn] AM and Executor Memory validation me...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22199
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95196/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22199: [SPARK-25073][Yarn] AM and Executor Memory validation me...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22199
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22199: [SPARK-25073][Yarn] AM and Executor Memory validation me...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22199
  
**[Test build #95196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95196/testReport)**
 for PR 22199 at commit 
[`3f4051c`](https://github.com/apache/spark/commit/3f4051cd0895debe1c4f288890b397d5dfe1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21912
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21912
  
**[Test build #95197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95197/testReport)**
 for PR 21912 at commit 
[`92428cd`](https://github.com/apache/spark/commit/92428cdad9757df9e48b726109da8c3edfa299e5).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21912
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95197/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21912
  
**[Test build #95197 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95197/testReport)**
 for PR 21912 at commit 
[`92428cd`](https://github.com/apache/spark/commit/92428cdad9757df9e48b726109da8c3edfa299e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21912
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2519/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21912
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22184#discussion_r212533857
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
 
+## Upgrading From Spark SQL 2.3.1 to 2.3.2 and above
+
+  - In version 2.3.1 and earlier, when reading from a Parquet table, Spark 
always returns null for any column whose column names in Hive metastore schema 
and Parquet schema are in different letter cases, no matter whether 
`spark.sql.caseSensitive` is set to true or false. Since 2.3.2, when 
`spark.sql.caseSensitive` is set to false, Spark does case insensitive column 
name resolution between Hive metastore schema and Parquet schema, so even 
column names are in different letter cases, Spark returns corresponding column 
values. An exception is thrown if there is ambiguity, i.e. more than one 
Parquet column is matched.
--- End diff --

Could you add a test case for the one you did?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22184: [SPARK-25132][SQL][DOC] Add migration doc for cas...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22184#discussion_r212533706
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1895,6 +1895,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
   - Since Spark 2.4, File listing for compute statistics is done in 
parallel by default. This can be disabled by setting 
`spark.sql.parallelFileListingInStatsComputation.enabled` to `False`.
   - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and 
temporary files are not counted as data files when calculating table size 
during Statistics computation.
 
+## Upgrading From Spark SQL 2.3.1 to 2.3.2 and above
+
+  - In version 2.3.1 and earlier, when reading from a Parquet table, Spark 
always returns null for any column whose column names in Hive metastore schema 
and Parquet schema are in different letter cases, no matter whether 
`spark.sql.caseSensitive` is set to true or false. Since 2.3.2, when 
`spark.sql.caseSensitive` is set to false, Spark does case insensitive column 
name resolution between Hive metastore schema and Parquet schema, so even 
column names are in different letter cases, Spark returns corresponding column 
values. An exception is thrown if there is ambiguity, i.e. more than one 
Parquet column is matched.
--- End diff --

We should respect `spark.sql.caseSensitive` in both modes, but also add a 
legacy SQLConf to enable users to revert back to the previous behavior. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22199: [SPARK-25073][Yarn] AM and Executor Memory validation me...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22199
  
**[Test build #95196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95196/testReport)**
 for PR 22199 at commit 
[`3f4051c`](https://github.com/apache/spark/commit/3f4051cd0895debe1c4f288890b397d5dfe1).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22199: [SPARK-25073][Yarn] AM and Executor Memory validation me...

2018-08-23 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/22199
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-23 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r212530383
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2062,8 +2062,10 @@ private[spark] object Utils extends Logging {
 try {
   val properties = new Properties()
   properties.load(inReader)
-  properties.stringPropertyNames().asScala.map(
-k => (k, properties.getProperty(k).trim)).toMap
+  properties.stringPropertyNames().asScala
+.map(k => (k, properties.getProperty(k)))
--- End diff --

@gerashegalov would you please elaborate the use case here? I saw that 
you're treating `\n` as a property value, what is the specific usage scenario 
here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-08-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22171
  
> How can this happen? When Spark writes decimal out, the external systems 
will get decimal values, not string values.

I have the same question.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17254: [SPARK-19917][SQL]qualified partition path stored in cat...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17254
  
**[Test build #95195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95195/testReport)**
 for PR 17254 at commit 
[`36a3463`](https://github.com/apache/spark/commit/36a34632dbb000799c35727c00d1542d4bb1ce00).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-23 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/20637
  
@cloud-fan @mgaido91 Do you have any other comments on this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22187: [SPARK-25178][SQL] Directly ship the StructType o...

2018-08-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22187


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22213#discussion_r212528202
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -2062,8 +2062,10 @@ private[spark] object Utils extends Logging {
 try {
   val properties = new Properties()
   properties.load(inReader)
-  properties.stringPropertyNames().asScala.map(
-k => (k, properties.getProperty(k).trim)).toMap
+  properties.stringPropertyNames().asScala
+.map(k => (k, properties.getProperty(k)))
--- End diff --

Isn't this going to break existing apps dependent on trimmed values?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21931
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95190/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-08-23 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22171
  
I don't have a strong opinion about the display, scientific notation is 
fine to me.

I'm curious about "... but also in dataset write operations. External 
databases like netezza fails to save the result ..."

How can this happen? When Spark writes decimal out, the external systems 
will get decimal values, not string values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21931
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22187: [SPARK-25178][SQL] Directly ship the StructType objects ...

2018-08-23 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22187
  
Thanks! merging to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21931
  
**[Test build #95190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95190/testReport)**
 for PR 21931 at commit 
[`6abeb06`](https://github.com/apache/spark/commit/6abeb06e52ed51ff9a30acc5801b0794ce778e2c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22207: [SPARK-25214][SS]Fix the issue that Kafka v2 sour...

2018-08-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/22207#discussion_r212526373
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala
 ---
@@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.util.Properties
+import java.util.concurrent.atomic.AtomicInteger
+
+import scala.collection.mutable
+import scala.util.Random
+
+import org.scalatest.time.SpanSugar._
+
+import org.apache.spark.SparkContext
+import org.apache.spark.sql.{DataFrame, Dataset, ForeachWriter}
+import org.apache.spark.sql.streaming.{StreamTest, Trigger}
+import org.apache.spark.sql.test.{SharedSQLContext, TestSparkSession}
+
+/**
+ * This is a basic test trait which will set up a Kafka cluster that keeps 
only several records in
+ * a topic and ages out records very quickly. This is a helper trait to 
test
+ * "failonDataLoss=false" case with missing offsets.
+ *
+ * Note: there is a hard-code 30 seconds delay 
(kafka.log.LogManager.InitialTaskDelayMs) to clean up
+ * records. Hence each class extending this trait needs to wait at least 
30 seconds (or even longer
+ * when running on a slow Jenkins machine) before records start to be 
removed. To make sure a test
+ * does see missing offsets, you can check the earliest offset in 
`eventually` and make sure it's
+ * not 0 rather than sleeping a hard-code duration.
+ */
+trait KafkaMissingOffsetsTest extends SharedSQLContext {
+
+  protected var testUtils: KafkaTestUtils = _
+
+  override def createSparkSession(): TestSparkSession = {
+// Set maxRetries to 3 to handle NPE from `poll` when deleting a topic
+new TestSparkSession(new SparkContext("local[2,3]", 
"test-sql-context", sparkConf))
+  }
+
+  override def beforeAll(): Unit = {
+super.beforeAll()
+testUtils = new KafkaTestUtils {
+  override def brokerConfiguration: Properties = {
+val props = super.brokerConfiguration
+// Try to make Kafka clean up messages as fast as possible. 
However, there is a hard-code
+// 30 seconds delay (kafka.log.LogManager.InitialTaskDelayMs) so 
this test should run at
+// least 30 seconds.
+props.put("log.cleaner.backoff.ms", "100")
+// The size of RecordBatch V2 increases to support transactional 
write.
+props.put("log.segment.bytes", "70")
+props.put("log.retention.bytes", "40")
+props.put("log.retention.check.interval.ms", "100")
+props.put("delete.retention.ms", "10")
+props.put("log.flush.scheduler.interval.ms", "10")
+props
+  }
+}
+testUtils.setup()
+  }
+
+  override def afterAll(): Unit = {
+if (testUtils != null) {
+  testUtils.teardown()
+  testUtils = null
+}
+super.afterAll()
+  }
+}
+
+class KafkaDontFailOnDataLossSuite extends KafkaMissingOffsetsTest {
+
+  import testImplicits._
+
+  private val topicId = new AtomicInteger(0)
+
+  private def newTopic(): String = 
s"failOnDataLoss-${topicId.getAndIncrement()}"
+
+  /**
+   * @param testStreamingQuery whether to test a streaming query or a 
batch query.
+   * @param writeToTable the function to write the specified [[DataFrame]] 
to the given table.
+   */
+  private def verifyMissingOffsetsDontCauseDuplicatedRecords(
+  testStreamingQuery: Boolean)(writeToTable: (DataFrame, String) => 
Unit): Unit = {
+val topic = newTopic()
+testUtils.createTopic(topic, partitions = 1)
+testUtils.sendMessages(topic, (0 until 50).map(_.toString).toArray)
+
+eventually(timeout(60.seconds)) {
+  assert(
+testUtils.getEarliestOffsets(Set(topic)).head._2 > 0,
+"Kafka didn't delete records after 1 minute")
+}
+
+

[GitHub] spark pull request #22079: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-251...

2018-08-23 Thread bersprockets

Github user bersprockets closed the pull request at:

https://github.com/apache/spark/pull/22079


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

2018-08-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/22042#discussion_r212507190
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 ---
@@ -160,6 +160,23 @@ abstract class KafkaSourceTest extends StreamTest with 
SharedSQLContext {
   s"AddKafkaData(topics = $topics, data = $data, message = $message)"
   }
 
+  object WithKafkaProducer {
+def apply(
+topic: String,
+producer: KafkaProducer[String, String])(
--- End diff --

Ping on this comment. Maybe you missed this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

2018-08-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/22042#discussion_r212522664
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 ---
@@ -161,6 +161,22 @@ abstract class KafkaSourceTest extends StreamTest with 
SharedSQLContext with Kaf
   s"AddKafkaData(topics = $topics, data = $data, message = $message)"
   }
 
+  object WithKafkaProducer {
--- End diff --

nit: This is not creating a KafkaProducer .. as most `With***` methods. The 
point of this is to force synchronization of the consumer. So maybe rename it 
to `WithOffsetSync { ... }`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

2018-08-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/22042#discussion_r212521083
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ---
@@ -239,56 +335,74 @@ private[kafka010] case class InternalKafkaConsumer(
   }
 
   /**
-   * Get the record for the given offset if available. Otherwise it will 
either throw error
-   * (if failOnDataLoss = true), or return the next available offset 
within [offset, untilOffset),
-   * or null.
+   * Get the fetched record for the given offset if available.
+   *
+   * If the record is invisible (either a  transaction message, or an 
aborted message when the
+   * consumer's `isolation.level` is `read_committed`), it will return a 
`FetchedRecord` with the
+   * next offset to fetch.
+   *
+   * This method also will try the best to detect data loss. If 
`failOnDataLoss` is true`, it will
+   * throw an exception when we detect an unavailable offset. If 
`failOnDataLoss` is `false`, this
+   * method will return `null` if the next available record is within 
[offset, untilOffset).
*
* @throws OffsetOutOfRangeException if `offset` is out of range
* @throws TimeoutException if cannot fetch the record in 
`pollTimeoutMs` milliseconds.
*/
-  private def fetchData(
+  private def fetchRecord(
   offset: Long,
   untilOffset: Long,
   pollTimeoutMs: Long,
-  failOnDataLoss: Boolean): ConsumerRecord[Array[Byte], Array[Byte]] = 
{
-if (offset != nextOffsetInFetchedData || !fetchedData.hasNext()) {
-  // This is the first fetch, or the last pre-fetched data has been 
drained.
-  // Seek to the offset because we may call seekToBeginning or 
seekToEnd before this.
-  seek(offset)
-  poll(pollTimeoutMs)
-}
-
-if (!fetchedData.hasNext()) {
-  // We cannot fetch anything after `poll`. Two possible cases:
-  // - `offset` is out of range so that Kafka returns nothing. Just 
throw
-  // `OffsetOutOfRangeException` to let the caller handle it.
-  // - Cannot fetch any data before timeout. TimeoutException will be 
thrown.
-  val range = getAvailableOffsetRange()
-  if (offset < range.earliest || offset >= range.latest) {
-throw new OffsetOutOfRangeException(
-  Map(topicPartition -> java.lang.Long.valueOf(offset)).asJava)
+  failOnDataLoss: Boolean): FetchedRecord = {
+if (offset != fetchedData.nextOffsetInFetchedData) {
+  // This is the first fetch, or the fetched data has been reset.
+  // Fetch records from Kafka and update `fetchedData`.
+  fetchData(offset, pollTimeoutMs)
+} else if (!fetchedData.hasNext) { // The last pre-fetched data has 
been drained.
+  if (offset < fetchedData.offsetAfterPoll) {
+// Offsets in [offset, fetchedData.offsetAfterPoll) are invisible. 
Return a record to ask
+// the next call to start from `fetchedData.offsetAfterPoll`.
+fetchedData.reset()
+return fetchedRecord.withRecord(null, fetchedData.offsetAfterPoll)
   } else {
-throw new TimeoutException(
-  s"Cannot fetch record for offset $offset in $pollTimeoutMs 
milliseconds")
+// Fetch records from Kafka and update `fetchedData`.
+fetchData(offset, pollTimeoutMs)
   }
+}
+
+if (!fetchedData.hasNext) {
+  // When we reach here, we have already tried to poll from Kafka. As 
`fetchedData` is still
+  // empty, all messages in [offset, fetchedData.offsetAfterPoll) are 
invisible. Return a
+  // record to ask the next call to start from 
`fetchedData.offsetAfterPoll`.
+  assert(offset <= fetchedData.offsetAfterPoll,
+s"seek to $offset and poll but the offset was reset to 
${fetchedData.offsetAfterPoll}")
+  fetchedRecord.withRecord(null, fetchedData.offsetAfterPoll)
 } else {
   val record = fetchedData.next()
-  nextOffsetInFetchedData = record.offset + 1
   // In general, Kafka uses the specified offset as the start point, 
and tries to fetch the next
   // available offset. Hence we need to handle offset mismatch.
   if (record.offset > offset) {
+val range = getAvailableOffsetRange()
+if (range.earliest <= offset) {
+  // `offset` is still valid but the corresponding message is 
invisible. We should skip it
+  // and jump to `record.offset`. Here we move `fetchedData` back 
so that the next call of
+  // `fetchRecord` can just return `record` directly.
+  fetchedData.previous()
+  return fetchedRecord.withRecord(null, record.offset)
+}
 // This

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

2018-08-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/22042#discussion_r212522432
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 ---
@@ -161,6 +161,22 @@ abstract class KafkaSourceTest extends StreamTest with 
SharedSQLContext with Kaf
   s"AddKafkaData(topics = $topics, data = $data, message = $message)"
   }
 
+  object WithKafkaProducer {
+def apply(
+topic: String,
+producer: KafkaProducer[String, String])(
+func: KafkaProducer[String, String] => Unit): AssertOnQuery = {
--- End diff --

nit: AssertOnQuery -> StreamAction


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22042: [SPARK-25005][SS]Support non-consecutive offsets ...

2018-08-23 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/22042#discussion_r212504622
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReadSupport.scala
 ---
@@ -331,6 +331,7 @@ private[kafka010] case class 
KafkaMicroBatchPartitionReader(
 offsetRange.topicPartition, executorKafkaParams, reuseKafkaConsumer)
 
   private val rangeToRead = resolveRange(offsetRange)
+
--- End diff --

unnecessary 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21087
  
**[Test build #95194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95194/testReport)**
 for PR 21087 at commit 
[`ebd9265`](https://github.com/apache/spark/commit/ebd926530c1d8b2f515a4a233f5963eafc17e460).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21087: [SPARK-23997][SQL] Configurable maximum number of...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21087#discussion_r212521169
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -164,9 +165,12 @@ case class BucketSpec(
 numBuckets: Int,
 bucketColumnNames: Seq[String],
 sortColumnNames: Seq[String]) {
-  if (numBuckets <= 0 || numBuckets >= 10) {
+  def conf: SQLConf = SQLConf.get
+
+  if (numBuckets <= 0 || numBuckets > conf.bucketingMaxBuckets) {
--- End diff --

We can merge this PR first. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21087
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21087: [SPARK-23997][SQL] Configurable maximum number of...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21087#discussion_r212521145
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -164,9 +165,12 @@ case class BucketSpec(
 numBuckets: Int,
 bucketColumnNames: Seq[String],
 sortColumnNames: Seq[String]) {
-  if (numBuckets <= 0 || numBuckets >= 10) {
+  def conf: SQLConf = SQLConf.get
+
+  if (numBuckets <= 0 || numBuckets > conf.bucketingMaxBuckets) {
--- End diff --

Could you submit a followup PR to address this message issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #95193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95193/testReport)**
 for PR 21732 at commit 
[`5f95bd0`](https://github.com/apache/spark/commit/5f95bd0cf1bd308c7df55c41caef7a9f19368f5d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2518/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21732: [SPARK-24762][SQL] Enable Option of Product encod...

2018-08-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21732#discussion_r212520685
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TypedAggregateExpression.scala
 ---
@@ -19,25 +19,85 @@ package org.apache.spark.sql.execution.aggregate
 
 import scala.language.existentials
 
-import org.apache.spark.sql.Encoder
+import org.apache.spark.sql.{AnalysisException, Encoder}
 import org.apache.spark.sql.catalyst.InternalRow
-import org.apache.spark.sql.catalyst.analysis.UnresolvedDeserializer
-import org.apache.spark.sql.catalyst.encoders.encoderFor
+import org.apache.spark.sql.catalyst.analysis.{GetColumnByOrdinal, 
UnresolvedDeserializer}
+import org.apache.spark.sql.catalyst.encoders.{encoderFor, 
ExpressionEncoder}
 import org.apache.spark.sql.catalyst.expressions._
 import 
org.apache.spark.sql.catalyst.expressions.aggregate.{AggregateFunction, 
DeclarativeAggregate, TypedImperativeAggregate}
 import 
org.apache.spark.sql.catalyst.expressions.codegen.GenerateSafeProjection
-import org.apache.spark.sql.catalyst.expressions.objects.Invoke
+import org.apache.spark.sql.catalyst.expressions.objects.{AssertNotNull, 
Invoke, NewInstance, WrapOption}
 import org.apache.spark.sql.expressions.Aggregator
 import org.apache.spark.sql.types._
 import org.apache.spark.util.Utils
 
 object TypedAggregateExpression {
+
+  // Checks if given encoder is for `Option[Product]`.
+  def isOptProductEncoder(encoder: ExpressionEncoder[_]): Boolean = {
+// Only Option[Product] is non-flat.
+encoder.clsTag.runtimeClass == classOf[Option[_]] && !encoder.flat
+  }
+
+  /**
+   * Flattens serializers and deserializer of given encoder. We only 
flatten encoder
+   * of `Option[Product]` class.
+   */
+  def flattenOptProductEncoder(encoder: ExpressionEncoder[_]): 
ExpressionEncoder[_] = {
--- End diff --

I will go to add some tests against this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2517/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #95192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95192/testReport)**
 for PR 21732 at commit 
[`9fc3f61`](https://github.com/apache/spark/commit/9fc3f6165156051142a8366a32726badaaa16bb7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22203: [SPARK-25029][BUILD][CORE] Janino "Two non-abstra...

2018-08-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22203


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22079: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22079
  
@bersprockets Could you please close this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22203: [SPARK-25029][BUILD][CORE] Janino "Two non-abstract meth...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22203
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21320


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2516/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21732
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/21732
  
@cloud-fan I made attempt to remove `topLevel` parameter. The approach is 
to flatten serializers and deserialzer at `TypedAggregateExpression`. So users 
are not aware of difference when using expression encoder.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21320
  
Thanks! Merged to master. 

BTW, we can keep thinking whether there are other better solutions for 
nested column pruning. 

Also cc @dongjoon-hyun If you are interested in the work for supporting ORC 
nested column pruning. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21732
  
**[Test build #95191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95191/testReport)**
 for PR 21732 at commit 
[`ed3d5cb`](https://github.com/apache/spark/commit/ed3d5cb697b10af2e2cf4c78ab521d4d0b2f3c9b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-08-23 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22211
  
cc @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22206: SPARK-25213: Add project to v2 scans before python filte...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22206
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95187/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22206: SPARK-25213: Add project to v2 scans before python filte...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22206
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95188/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22206: SPARK-25213: Add project to v2 scans before python filte...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22206
  
**[Test build #95187 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95187/testReport)**
 for PR 22206 at commit 
[`550368e`](https://github.com/apache/spark/commit/550368eaeebdc87f2c89bad7214f2624784eeb04).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22190: [SPARK-25188][SQL] Add WriteConfig to v2 write API.

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22190
  
**[Test build #95188 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95188/testReport)**
 for PR 22190 at commit 
[`847300f`](https://github.com/apache/spark/commit/847300f76391c5e171e3f54b21bf6f2efc177f0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21273
  
@koertkuipers you wanna make a PR to make it configuration?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-08-23 Thread gerashegalov

Github user gerashegalov commented on the issue:

https://github.com/apache/spark/pull/22213
  
@witgo please take a look since you worked on #2379 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22213
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22213
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22213: [SPARK-25221][DEPLOY] Consistent trailing whitespace tre...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22213
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22213: [SPARK-25221][DEPLOY] Consistent trailing whitesp...

2018-08-23 Thread gerashegalov

GitHub user gerashegalov opened a pull request:

https://github.com/apache/spark/pull/22213

[SPARK-25221][DEPLOY] Consistent trailing whitespace treatment of conf 
values

## What changes were proposed in this pull request?

Stop trimming values of properties loaded from a file

## How was this patch tested?

Added unit test demonstrating the issue hit in production.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gerashegalov/spark gera/SPARK-25221

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22213


commit 874b2c82a83d94c338d8b1ef9c3b37074cc2e4cc
Author: Gera Shegalov 
Date:   2018-08-23T19:09:05Z

[SPARK-25221][DEPLOY] Consistent trailing whitespace treatment of conf 
values




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22211
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95189/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22211
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22211: [SPARK-23207][SPARK-22905][SPARK-24564][SPARK-25114][SQL...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22211
  
**[Test build #95189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95189/testReport)**
 for PR 22211 at commit 
[`d269015`](https://github.com/apache/spark/commit/d26901516f414d16f8eb38183053d22dc3b056e3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21320
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95185/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2515/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #11105: [SPARK-12469][CORE] Data Property accumulators for Spark

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/11105
  
Build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21273: [SPARK-17916][SQL] Fix empty string being parsed as null...

2018-08-23 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/21273
  
i would suggest at least that when the quote character is changed that the 
empty value should change accordingly. an empty value of ```""``` makes no 
sense if the quote character is not ```"```.

also if we could agree on a quote character that means no quotes at all 
then i would suggest to change empty value back to null if that particular 
quote character is set. because no quoted empty string makes sense if the user 
is trying to write out unquoted values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21320
  
**[Test build #95185 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95185/testReport)**
 for PR 21320 at commit 
[`e6baf68`](https://github.com/apache/spark/commit/e6baf681e06e229d740af120491d1bf0f426af99).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22212: [SPARK-25220] Seperate kubernetes node selector config b...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22212
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22212: [SPARK-25220] Seperate kubernetes node selector config b...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22212
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22201: [SPARK-25209][SQL] Avoid deserializer check in Da...

2018-08-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22201


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22212: [SPARK-25220] Seperate kubernetes node selector config b...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22212
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22212: [SPARK-25220] Seperate kubernetes node selector c...

2018-08-23 Thread jweaver-personal

GitHub user jweaver-personal opened a pull request:

https://github.com/apache/spark/pull/22212

[SPARK-25220] Seperate kubernetes node selector config between driver and 
executors.

Seperated node selector config option between executors and driver.

This removes the spark.kubernetes.node.selector config option and seperates 
it to
spark.kubernetes.driver.selector and spark.kubernetes.executor.selector

to allow seperate node selectors on drivers and executors.

My personal use case for this change is that on AWS we have cheap 
spotinstances that can terminate at any moment which is okay for executors but 
not the driver.
With a single node selector option I am unable to use the spot instances to 
save costs on the executor nodes.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jweaver-personal/spark jweaver-dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22212


commit a6c6371f8d0ba5e7ff55e83f7c8c955bf24096aa
Author: ? 
Date:   2018-08-24T01:55:06Z

KUBERNETES K8S
Seperated node selector config option between executors and driver.

This removes the spark.kubernetes.node.selector config option and seperates 
it to
spark.kubernetes.driver.selector and spark.kubernetes.executor.selector

to allow seperate node selectors on drivers and executors.

My personal use case for this change is that on AWS we have cheap 
spotinstances that can terminate at any moment which is okay for executors but 
not the driver.
With a single node selector option I am unable to use the spot instances to 
save costs on the executor nodes.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21931
  
**[Test build #95190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95190/testReport)**
 for PR 21931 at commit 
[`6abeb06`](https://github.com/apache/spark/commit/6abeb06e52ed51ff9a30acc5801b0794ce778e2c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-23 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r212504250
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+nullableTypes: Option[Seq[Boolean]] = None) {
--- End diff --

+1 to all your comments. I'm overhauling this whole PR and will force push 
with a rebase once it seems to basically work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20611: [SPARK-23425][SQL]Support wildcard in HDFS path f...

2018-08-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20611


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]Support wildcard in HDFS path for load...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20611
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22195: [SPARK-25205][CORE] Fix typo in spark.network.crypto.key...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22195
  
cc @jerryshao 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22195: [SPARK-25205][CORE] Fix typo in spark.network.cry...

2018-08-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22195


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22195: [SPARK-25205][CORE] Fix typo in spark.network.crypto.key...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22195
  
Merged to master and branch-2.3.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22208: [SPARK-25216][SQL] Improve error message when a c...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22208#discussion_r212500744
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -216,8 +216,16 @@ class Dataset[T] private[sql](
   private[sql] def resolve(colName: String): NamedExpression = {
 queryExecution.analyzed.resolveQuoted(colName, 
sparkSession.sessionState.analyzer.resolver)
   .getOrElse {
-throw new AnalysisException(
-  s"""Cannot resolve column name "$colName" among 
(${schema.fieldNames.mkString(", ")})""")
+if (schema.fieldNames.contains(colName)) {
+  throw new AnalysisException(
+s"""Cannot resolve column name "$colName" among 
(${schema.fieldNames.mkString(", ")}).
+   | Try adding backticks to the column name, i.e., 
`$colName`"""
--- End diff --

I would explain, for instance, if the name parts in the column should be 
kept as the part of its column name, try to quote them by backticks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22171: [SPARK-25177][SQL] When dataframe decimal type column ha...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22171
  
cc @cloud-fan and @gatorsmile, WDYT? actually I happened to meet this and 
thought it's better be fixed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21977
  
Seems okay.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r212499322
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -40,7 +40,14 @@ import org.apache.spark.sql.types.DataType
 case class UserDefinedFunction protected[sql] (
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]]) {
+inputTypes: Option[Seq[DataType]],
+nullableTypes: Option[Seq[Boolean]] = None) {
--- End diff --

or `Option[Seq[ScalaReflection.Schema]]`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22063: [WIP][SPARK-25044][SQL] Address translation of LM...

2018-08-23 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22063#discussion_r212499164
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -57,8 +59,21 @@ case class ScalaUDF(
   children: Seq[Expression],
   inputTypes: Seq[DataType],
   udfName: Option[String]) = {
-this(
-  function, dataType, children, inputTypes, udfName, nullable = true, 
udfDeterministic = true)
+this(function, dataType, children, inputTypes, udfName,
+  nullable = true, udfDeterministic = true, nullableTypes = Nil)
+  }
+
+  // Constructor from Spark 2.3
--- End diff --

By convention, everything under catalyst package is private, so 
compatibility is not a concern here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21977#discussion_r212499194
  
--- Diff: 
core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala ---
@@ -60,14 +61,20 @@ private[spark] object PythonEvalType {
  */
 private[spark] abstract class BasePythonRunner[IN, OUT](
 funcs: Seq[ChainedPythonFunctions],
-bufferSize: Int,
-reuseWorker: Boolean,
 evalType: Int,
-argOffsets: Array[Array[Int]])
+argOffsets: Array[Array[Int]],
+conf: SparkConf)
   extends Logging {
 
   require(funcs.length == argOffsets.length, "argOffsets should have the 
same length as funcs")
 
+  private val bufferSize = conf.getInt("spark.buffer.size", 65536)
+  private val reuseWorker = conf.getBoolean("spark.python.worker.reuse", 
true)
+  // each python worker gets an equal part of the allocation. the worker 
pool will grow to the
+  // number of concurrent tasks, which is determined by the number of 
cores in this executor.
+  private val memoryMb = conf.get(PYSPARK_EXECUTOR_MEMORY)
+  .map(_ / conf.getInt("spark.executor.cores", 1))
--- End diff --

> I don't think that the DataBricks style guide applies to Apache projects.

I sent an email to dev mailing list - 
http://apache-spark-developers-list.1001551.n3.nabble.com/Porting-or-explicitly-linking-project-style-in-Apache-Spark-based-on-https-github-com-databricks-scae-td24790.html

I was thinking 2 indents for continuation lines are more common in the 
codebase and thought better follow this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22209: [SPARK-24415][Core] Fixed the issue where stage p...

2018-08-23 Thread ankuriitg

Github user ankuriitg commented on a diff in the pull request:

https://github.com/apache/spark/pull/22209#discussion_r212498716
  
--- Diff: 
core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ---
@@ -350,11 +350,16 @@ private[spark] class AppStatusListener(
 val e = it.next()
 if (job.stageIds.contains(e.getKey()._1)) {
   val stage = e.getValue()
-  stage.status = v1.StageStatus.SKIPPED
-  job.skippedStages += stage.info.stageId
-  job.skippedTasks += stage.info.numTasks
-  it.remove()
-  update(stage, now)
+  // Only update the stage if it has not finished already
+  if (v1.StageStatus.ACTIVE.equals(stage.status) ||
+  v1.StageStatus.PENDING.equals(stage.status)) {
+stage.status = v1.StageStatus.SKIPPED
+job.skippedStages += stage.info.stageId
+job.skippedTasks += stage.info.numTasks
+job.activeStages -= 1
+it.remove()
--- End diff --

I think the assumption here is that we will always receive onStageCompleted 
event before onJobEvent. If that does not occur for some reason, then any 
active stages are marked as skipped.
I don't know the scenario when onStageCompleted event is not received 
before onJobEnd event (or received at all). Let me look further into it. 
Additionally, I will also fix the bug for updating pool.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21320
  
Seems fine to me too for the similar reasons.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22206: SPARK-25213: Add project to v2 scans before pytho...

2018-08-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22206#discussion_r212497182
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -6394,6 +6394,17 @@ def test_invalid_args(self):
 df.withColumn('mean_v', mean_udf(df['v']).over(ow))
 
 
+class DataSourceV2Tests(ReusedSQLTestCase):
+def test_pyspark_udf_SPARK_25213(self):
+from pyspark.sql.functions import udf
+
+df = 
self.spark.read.format("org.apache.spark.sql.sources.v2.SimpleDataSourceV2").load()
+result = df.withColumn('x', udf(lambda x: x, 'int')(df['i']))
--- End diff --

This only tests Project with Scalar PythonUDF? Might be better to also test 
Filter case.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22206: SPARK-25213: Add project to v2 scans before pytho...

2018-08-23 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22206#discussion_r212496291
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
 ---
@@ -130,10 +133,22 @@ object DataSourceV2Strategy extends Strategy {
 config)
 
   val filterCondition = postScanFilters.reduceLeftOption(And)
-  val withFilter = filterCondition.map(FilterExec(_, 
scan)).getOrElse(scan)
+
+  val withFilter = if (filterCondition.exists(hasScalarPythonUDF)) {
+// add a projection before FilterExec to ensure that the rows are 
converted to unsafe
+val filterExpr = filterCondition.get
+FilterExec(filterExpr, ProjectExec(filterExpr.references.toSeq, 
scan))
+  } else {
+filterCondition.map(FilterExec(_, scan)).getOrElse(scan)
+  }
 
   // always add the projection, which will produce unsafe rows 
required by some operators
-  ProjectExec(project, withFilter) :: Nil
+  if (project.exists(hasScalarPythonUDF)) {
+val references = project.map(_.references).reduce(_ ++ _).toSeq
+ProjectExec(project, ProjectExec(references, withFilter)) :: Nil
--- End diff --

Ok. Let's leave as it is now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22210
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95184/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22210
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22210: [SPARK-25218][Core]Fix potential resource leaks in Trans...

2018-08-23 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22210
  
**[Test build #95184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95184/testReport)**
 for PR 22210 at commit 
[`6f2248d`](https://github.com/apache/spark/commit/6f2248d45716332ac78e44b1314011806f59deb8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22192: [SPARK-24918][Core] Executor Plugin API

2018-08-23 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22192
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 561 matches

Mail list logo