date:20181010

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@sujith71955 Thanks. I see. The case is somehow different with the problem 
this PR wants to solve. But I think it is a reasonable use case. May you want 
to create a ticket for us to track it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22687
  
@viirya @HyukjinKwon I did the code changes and then I found the condition 
is not reachable, as I have stated in PR description.

Just feel that it won't hurt to have such handling in data source module, 
the changes in code is short.

I am OK to close this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22688
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3849/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22688
  
cc @cloud-fan and @rdblue, this is more conservative but I would prefer to 
revert it at https://github.com/apache/spark/pull/22686 rather then exposing 
append mode in 2.4. I don't think it's a great idea read code paths are 
executed in write path.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22688: [SPARK-25700][SQL] Creates ReadSupport in only Append Mo...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22688
  
**[Test build #97196 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97196/testReport)**
 for PR 22688 at commit 
[`a445e16`](https://github.com/apache/spark/commit/a445e163675eb330eade07f1cbdd88b99caab117).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22688: [SPARK-25700][SQL] Creates ReadSupport in only Ap...

2018-10-10 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22688

[SPARK-25700][SQL] Creates ReadSupport in only Append Mode in Data Source 
V2 write path

## What changes were proposed in this pull request?

Alternative for https://github.com/apache/spark/pull/22686: In other save 
modes, we don't need to make a readsupport and read schema.

## How was this patch tested?

Unit test and manual tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark append-revert-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22688.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22688


commit a445e163675eb330eade07f1cbdd88b99caab117
Author: hyukjinkwon 
Date:   2018-10-10T10:38:55Z

Creates ReadSupport in only Append Mode in Data Source V2 write path




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22009#discussion_r224023061
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -169,15 +174,16 @@ object DataSourceV2Relation {
   options: Map[String, String],
   tableIdent: Option[TableIdentifier] = None,
   userSpecifiedSchema: Option[StructType] = None): 
DataSourceV2Relation = {
-val reader = source.createReader(options, userSpecifiedSchema)
+val readSupport = source.createReadSupport(options, 
userSpecifiedSchema)
--- End diff --

Another point is which schema you would expect the datasource return in 
that case. For instance, `spark.range(1).write.format("source").save()`. It's 
odd that `source` should provide a schema. I mean it's logically weird. How 
does the source provide the schema?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22687: [SPARK-25702][SQL] Push down filters with `Not` o...

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22687#discussion_r224021798
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -534,6 +534,13 @@ private[parquet] class ParquetFilters(
 createFilterHelper(nameToParquetField, rhs, 
canPartialPushDownConjuncts = false)
 } yield FilterApi.or(lhsFilter, rhsFilter)
 
+  case sources.Not(sources.Or(lhs, rhs)) if 
canPartialPushDownConjuncts =>
+createFilterHelper(nameToParquetField,
+  sources.And(sources.Not(lhs), sources.Not(rhs)), 
canPartialPushDownConjuncts = true)
+
+  case sources.Not(sources.Not(pred)) if canPartialPushDownConjuncts =>
--- End diff --

hm, is this actually reachable?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22685
  
I think you're mostly trying to target changes styles. If there is not 
actual benefit rather then just styles, I wouldn't do this and just help review 
other PRs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/16677
  
Mainly i think we are trying to interpolate the number of partitions


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [WIP][SPARK-25700][SQL] Partially revert append mode sup...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22686
  
**[Test build #97195 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97195/testReport)**
 for PR 22686 at commit 
[`227920a`](https://github.com/apache/spark/commit/227920a23caf3b9a556a35df2d01c37dcddf244b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya  I am having a usecase where a normal query is taking around 5 
seconds where same query with limit 5000 is taking around 17 sec.  when i was 
checking i could find bottleneck in the above mentioned flow.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [WIP][SPARK-25700][SQL] Partially revert append mode sup...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3848/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [WIP][SPARK-25700][SQL] Partially revert append mode sup...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22687
  
Won't such predicates be simplified at `BooleanSimplification` rule?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16677
  
@sujith71955 For `executeTake`, to optimize it we need to collect 
statistics of RDD. `executeTake` incrementally scans partitions. Ideally, it 
should just scan few partitions to return `n` rows, and remaining partitions 
can be skipped and don't need to be materialized. So going back to the 
beginning, IMHO, if we are going to collect the statistics, we will materialize 
all partitions, and that seems to be opposite to `executeTake`'s optimization.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22685
  
**[Test build #97194 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97194/testReport)**
 for PR 22685 at commit 
[`52ed24e`](https://github.com/apache/spark/commit/52ed24ea472f49bc1bf98c36348a0fe19532bc28).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22687
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3847/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22687
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224013755
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3356,21 +3356,11 @@ class Dataset[T] private[sql](
* user-registered callback functions.
*/
   private def withAction[U](name: String, qe: QueryExecution)(action: 
SparkPlan => U) = {
-try {
-  qe.executedPlan.foreach { plan =>
-plan.resetMetrics()
-  }
-  val start = System.nanoTime()
-  val result = SQLExecution.withNewExecutionId(sparkSession, qe) {
-action(qe.executedPlan)
-  }
-  val end = System.nanoTime()
-  sparkSession.listenerManager.onSuccess(name, qe, end - start)
-  result
-} catch {
-  case e: Exception =>
-sparkSession.listenerManager.onFailure(name, qe, e)
-throw e
+qe.executedPlan.foreach { plan =>
--- End diff --

can't executedPlan throw an exception? I thought it can if the original 
spark plan failed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22687: [SPARK-25702][SQL] Push down filters with `Not` operator...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22687
  
**[Test build #97193 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97193/testReport)**
 for PR 22687 at commit 
[`0f43db6`](https://github.com/apache/spark/commit/0f43db656c3567ab7f8b711c8f0b27b16caa4bf7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22687: [SPARK-25702][SQL] Push down filters with `Not` o...

2018-10-10 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/22687

[SPARK-25702][SQL] Push down filters with `Not` operator in Parquet

## What changes were proposed in this pull request?

Currently, in ParquetFilters, predicates inside `Not` operator are 
considered as unable to perform partial push down.
However, the following cases is still possible for push down:
1. `Not(Or(left, right))` can be conversed as `And(Not(left), Not(right))`
2. `Not(Not(pred))` can be conversed as `pred`

Both cases should be quite trivial, since the `Not` operator should be 
pushed down by optimization rule `BooleanSimplification` already.
But I think it should be good to handle such cases in Parquet data source 
module as well.

## How was this patch tested?

New unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark parquetNotFilters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22687


commit 0f43db656c3567ab7f8b711c8f0b27b16caa4bf7
Author: Gengliang Wang 
Date:   2018-10-10T09:40:45Z

push down more parquet filters




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3846/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224012183
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala 
---
@@ -75,95 +76,74 @@ trait QueryExecutionListener {
  */
 @Experimental
 @InterfaceStability.Evolving
-class ExecutionListenerManager private extends Logging {
-
-  private[sql] def this(conf: SparkConf) = {
-this()
+// The `session` is used to indicate which session carries this listener 
manager, and we only
+// catch SQL executions which are launched by the same session.
+// The `loadExtensions` flag is used to indicate whether we should load 
the pre-defined,
+// user-specified listeners during construction. We should not do it when 
cloning this listener
+// manager, as we will copy all listeners to the cloned listener manager.
+class ExecutionListenerManager private[sql](session: SparkSession, 
loadExtensions: Boolean)
+  extends SparkListener with Logging {
+
+  private[this] val listeners = new 
CopyOnWriteArrayList[QueryExecutionListener]
+
+  if (loadExtensions) {
+val conf = session.sparkContext.conf
 conf.get(QUERY_EXECUTION_LISTENERS).foreach { classNames =>
   Utils.loadExtensions(classOf[QueryExecutionListener], classNames, 
conf).foreach(register)
 }
   }
 
+  session.sparkContext.listenerBus.addToSharedQueue(this)
+
   /**
* Registers the specified [[QueryExecutionListener]].
*/
   @DeveloperApi
-  def register(listener: QueryExecutionListener): Unit = writeLock {
-listeners += listener
+  def register(listener: QueryExecutionListener): Unit = {
+listeners.add(listener)
   }
 
   /**
* Unregisters the specified [[QueryExecutionListener]].
*/
   @DeveloperApi
-  def unregister(listener: QueryExecutionListener): Unit = writeLock {
-listeners -= listener
+  def unregister(listener: QueryExecutionListener): Unit = {
+listeners.remove(listener)
   }
 
   /**
* Removes all the registered [[QueryExecutionListener]].
*/
   @DeveloperApi
-  def clear(): Unit = writeLock {
+  def clear(): Unit = {
 listeners.clear()
   }
 
   /**
* Get an identical copy of this listener manager.
*/
-  @DeveloperApi
-  override def clone(): ExecutionListenerManager = writeLock {
-val newListenerManager = new ExecutionListenerManager
-listeners.foreach(newListenerManager.register)
+  private[sql] def clone(session: SparkSession): ExecutionListenerManager 
= {
+val newListenerManager = new ExecutionListenerManager(session, 
loadExtensions = false)
+listeners.iterator().asScala.foreach(newListenerManager.register)
 newListenerManager
   }
 
-  private[sql] def onSuccess(funcName: String, qe: QueryExecution, 
duration: Long): Unit = {
-readLock {
-  withErrorHandling { listener =>
-listener.onSuccess(funcName, qe, duration)
+  override def onOtherEvent(event: SparkListenerEvent): Unit = event match 
{
+case e: SparkListenerSQLExecutionEnd if shouldCatchEvent(e) =>
+  val funcName = e.executionName.get
+  e.executionFailure match {
+case Some(ex) =>
+  listeners.iterator().asScala.foreach(_.onFailure(funcName, e.qe, 
ex))
+case _ =>
+  listeners.iterator().asScala.foreach(_.onSuccess(funcName, e.qe, 
e.duration))
   }
-}
-  }
 
-  private[sql] def onFailure(funcName: String, qe: QueryExecution, 
exception: Exception): Unit = {
-readLock {
-  withErrorHandling { listener =>
-listener.onFailure(funcName, qe, exception)
-  }
-}
+case _ => // Ignore
   }
 
-  private[this] val listeners = ListBuffer.empty[QueryExecutionListener]
-
-  /** A lock to prevent updating the list of listeners while we are 
traversing through them. */
-  private[this] val lock = new ReentrantReadWriteLock()
-
-  private def withErrorHandling(f: QueryExecutionListener => Unit): Unit = 
{
-for (listener <- listeners) {
-  try {
-f(listener)
-  } catch {
-case NonFatal(e) => logWarning("Error executing query execution 
listener", e)
-  }
-}
-  }
-
-  /** Acquires a read lock on the cache for the duration of `f`. */
-  private def readLock[A](f: => A): A = {
-val rl = lock.readLock()
-rl.lock()
-try f finally {
-  rl.unlock()
-}
-  }
-
-  /** Acquires a write lock on the cache for the duration of `f`. */
-  private def writeLock[A](f: => A): A = {
-val wl = lock.writeLock()
-wl.lock()
-try

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97191/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22686
  
**[Test build #97191 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97191/testReport)**
 for PR 22686 at commit 
[`54e59d5`](https://github.com/apache/spark/commit/54e59d53c0d91288536644819d8ead80e13475ea).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22686
  
**[Test build #97192 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97192/testReport)**
 for PR 22686 at commit 
[`d8de827`](https://github.com/apache/spark/commit/d8de82755590ef44f1f8e1ff0d8c30ecf2ef66f5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22686
  
**[Test build #97191 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97191/testReport)**
 for PR 22686 at commit 
[`54e59d5`](https://github.com/apache/spark/commit/54e59d53c0d91288536644819d8ead80e13475ea).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [WIP][SQL][MINOR][Refactor] Refactor on sql/core

2018-10-10 Thread sadhen

Github user sadhen commented on the issue:

https://github.com/apache/spark/pull/22685
  
@HyukjinKwon  OK, most of changes are related to parens. I will consider to 
reset it back for backporting friendliness.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3845/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22686: [SPARK-25700][SQL] Partially revert append mode support ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22686
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224008348
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -71,14 +72,35 @@ object SQLExecution {
   val callSite = sc.getCallSite()
 
   withSQLConfPropagated(sparkSession) {
-sc.listenerBus.post(SparkListenerSQLExecutionStart(
-  executionId, callSite.shortForm, callSite.longForm, 
queryExecution.toString,
-  SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan), 
System.currentTimeMillis()))
+var ex: Option[Exception] = None
+val startTime = System.currentTimeMillis()
 try {
+  sc.listenerBus.post(SparkListenerSQLExecutionStart(
+executionId = executionId,
+description = callSite.shortForm,
+details = callSite.longForm,
+physicalPlanDescription = queryExecution.toString,
+// `queryExecution.executedPlan` triggers query planning. If 
it fails, the exception
+// will be caught and reported in the 
`SparkListenerSQLExecutionEnd`
+sparkPlanInfo = 
SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan),
+time = startTime))
   body
+} catch {
+  case e: Exception =>
+ex = Some(e)
+throw e
 } finally {
-  sc.listenerBus.post(SparkListenerSQLExecutionEnd(
-executionId, System.currentTimeMillis()))
+  val endTime = System.currentTimeMillis()
+  val event = SparkListenerSQLExecutionEnd(executionId, endTime)
+  // Currently only `Dataset.withAction` and 
`DataFrameWriter.runCommand` specify the `name`
+  // parameter. The `ExecutionListenerManager` only watches SQL 
executions with name. We
+  // can specify the execution name in more places in the future, 
so that
+  // `QueryExecutionListener` can track more cases.
+  event.executionName = name
+  event.duration = endTime - startTime
--- End diff --

ah good catch!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22677: [SPARK-25683][Core] Make AsyncEventQueue.lastRepo...

2018-10-10 Thread shivusondur

Github user shivusondur commented on a diff in the pull request:

https://github.com/apache/spark/pull/22677#discussion_r224007729
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/AsyncEventQueue.scala ---
@@ -159,6 +159,11 @@ private class AsyncEventQueue(
 
 val droppedCount = droppedEventsCounter.get
 if (droppedCount > 0) {
+  // If first time the dropEvent occurs, then record current time as 
the
+  // lastReportTimestamp.
+  if (lastReportTimestamp == 0) {
+lastReportTimestamp = System.currentTimeMillis()
--- End diff --

Yes i will do


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224007732
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3356,21 +3356,11 @@ class Dataset[T] private[sql](
* user-registered callback functions.
*/
   private def withAction[U](name: String, qe: QueryExecution)(action: 
SparkPlan => U) = {
-try {
-  qe.executedPlan.foreach { plan =>
-plan.resetMetrics()
-  }
-  val start = System.nanoTime()
-  val result = SQLExecution.withNewExecutionId(sparkSession, qe) {
-action(qe.executedPlan)
-  }
-  val end = System.nanoTime()
-  sparkSession.listenerManager.onSuccess(name, qe, end - start)
-  result
-} catch {
-  case e: Exception =>
-sparkSession.listenerManager.onFailure(name, qe, e)
-throw e
+qe.executedPlan.foreach { plan =>
--- End diff --

I don't think `resetMetrics` can throw exception...


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22686: [SPARK-25700][SQL] Partially revert append mode s...

2018-10-10 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/22686

[SPARK-25700][SQL] Partially revert append mode support in Data Source V2

## What changes were proposed in this pull request?

This PR partially revert 
https://github.com/apache/spark/commit/5fef6e3513d6023a837c427d183006d153c7102b 

It happened to create a readsupport in write path, which ended up with 
reading schema from readsupport at write path.

See also https://github.com/apache/spark/pull/22009#discussion_r223982672

## How was this patch tested?

Manually facing an issue.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark append-revert

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22686


commit 54e59d53c0d91288536644819d8ead80e13475ea
Author: hyukjinkwon 
Date:   2018-10-10T08:57:53Z

Partially revert append mode support in Datasource V2




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224006828
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/util/QueryExecutionListener.scala 
---
@@ -75,95 +76,74 @@ trait QueryExecutionListener {
  */
 @Experimental
 @InterfaceStability.Evolving
-class ExecutionListenerManager private extends Logging {
-
-  private[sql] def this(conf: SparkConf) = {
-this()
+// The `session` is used to indicate which session carries this listener 
manager, and we only
+// catch SQL executions which are launched by the same session.
+// The `loadExtensions` flag is used to indicate whether we should load 
the pre-defined,
+// user-specified listeners during construction. We should not do it when 
cloning this listener
+// manager, as we will copy all listeners to the cloned listener manager.
+class ExecutionListenerManager private[sql](session: SparkSession, 
loadExtensions: Boolean)
+  extends SparkListener with Logging {
+
+  private[this] val listeners = new 
CopyOnWriteArrayList[QueryExecutionListener]
+
+  if (loadExtensions) {
+val conf = session.sparkContext.conf
 conf.get(QUERY_EXECUTION_LISTENERS).foreach { classNames =>
   Utils.loadExtensions(classOf[QueryExecutionListener], classNames, 
conf).foreach(register)
 }
   }
 
+  session.sparkContext.listenerBus.addToSharedQueue(this)
+
   /**
* Registers the specified [[QueryExecutionListener]].
*/
   @DeveloperApi
-  def register(listener: QueryExecutionListener): Unit = writeLock {
-listeners += listener
+  def register(listener: QueryExecutionListener): Unit = {
+listeners.add(listener)
   }
 
   /**
* Unregisters the specified [[QueryExecutionListener]].
*/
   @DeveloperApi
-  def unregister(listener: QueryExecutionListener): Unit = writeLock {
-listeners -= listener
+  def unregister(listener: QueryExecutionListener): Unit = {
+listeners.remove(listener)
   }
 
   /**
* Removes all the registered [[QueryExecutionListener]].
*/
   @DeveloperApi
-  def clear(): Unit = writeLock {
+  def clear(): Unit = {
 listeners.clear()
   }
 
   /**
* Get an identical copy of this listener manager.
*/
-  @DeveloperApi
-  override def clone(): ExecutionListenerManager = writeLock {
-val newListenerManager = new ExecutionListenerManager
-listeners.foreach(newListenerManager.register)
+  private[sql] def clone(session: SparkSession): ExecutionListenerManager 
= {
+val newListenerManager = new ExecutionListenerManager(session, 
loadExtensions = false)
+listeners.iterator().asScala.foreach(newListenerManager.register)
 newListenerManager
   }
 
-  private[sql] def onSuccess(funcName: String, qe: QueryExecution, 
duration: Long): Unit = {
-readLock {
-  withErrorHandling { listener =>
-listener.onSuccess(funcName, qe, duration)
+  override def onOtherEvent(event: SparkListenerEvent): Unit = event match 
{
+case e: SparkListenerSQLExecutionEnd if shouldCatchEvent(e) =>
+  val funcName = e.executionName.get
+  e.executionFailure match {
+case Some(ex) =>
+  listeners.iterator().asScala.foreach(_.onFailure(funcName, e.qe, 
ex))
+case _ =>
+  listeners.iterator().asScala.foreach(_.onSuccess(funcName, e.qe, 
e.duration))
   }
-}
-  }
 
-  private[sql] def onFailure(funcName: String, qe: QueryExecution, 
exception: Exception): Unit = {
-readLock {
-  withErrorHandling { listener =>
-listener.onFailure(funcName, qe, exception)
-  }
-}
+case _ => // Ignore
   }
 
-  private[this] val listeners = ListBuffer.empty[QueryExecutionListener]
-
-  /** A lock to prevent updating the list of listeners while we are 
traversing through them. */
-  private[this] val lock = new ReentrantReadWriteLock()
-
-  private def withErrorHandling(f: QueryExecutionListener => Unit): Unit = 
{
-for (listener <- listeners) {
-  try {
-f(listener)
-  } catch {
-case NonFatal(e) => logWarning("Error executing query execution 
listener", e)
-  }
-}
-  }
-
-  /** Acquires a read lock on the cache for the duration of `f`. */
-  private def readLock[A](f: => A): A = {
-val rl = lock.readLock()
-rl.lock()
-try f finally {
-  rl.unlock()
-}
-  }
-
-  /** Acquires a write lock on the cache for the duration of `f`. */
-  private def writeLock[A](f: => A): A = {
-val wl = lock.writeLock()
-wl.lock()
-try f

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22684#discussion_r224005596
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilterSuite.scala
 ---
@@ -382,5 +382,40 @@ class OrcFilterSuite extends OrcTest with 
SharedSQLContext {
 ))
   )).get.toString
 }
+
+// Can not remove unsupported `StringContains` predicate since it is 
under `Or` operator.
+assert(OrcFilters.createFilter(schema, Array(
+  Or(
+LessThan("a", 10),
+And(
+  StringContains("b", "prefix"),
+  GreaterThan("a", 1)
+)
+  )
+)).isEmpty)
+
+// Safely remove unsupported `StringContains` predicate and push down 
`LessThan`
+assertResult("leaf-0 = (LESS_THAN a 10), expr = leaf-0") {
+  OrcFilters.createFilter(schema, Array(
+And(
+  LessThan("a", 10),
+  StringContains("b", "prefix")
+)
+  )).get.toString
+}
+
+// Safely remove unsupported `StringContains` predicate and push down 
`LessThan`
--- End diff --

There is another predicate pushed down. We shall mention it too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22009#discussion_r224004348
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -169,15 +174,16 @@ object DataSourceV2Relation {
   options: Map[String, String],
   tableIdent: Option[TableIdentifier] = None,
   userSpecifiedSchema: Option[StructType] = None): 
DataSourceV2Relation = {
-val reader = source.createReader(options, userSpecifiedSchema)
+val readSupport = source.createReadSupport(options, 
userSpecifiedSchema)
--- End diff --

This is kind of a behavior change. Now when we append data to a data 
source, the data source must be readable, to provide a schema, which will be 
used to validate the input data schema.

I don't have a strong feeling. Data source v2 is marked as involving so 
necessary behavior change is fine. cc @rdblue 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...

2018-10-10 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/16677
  
@viirya Are we also looking to optimize CollectLimitExec part? I saw in 
SparkPlan we have an executeTake() method which basically interpolate the 
number of partitions and processes the limit query. if driver analyze some 
statistics about data then i think even this algorithm we can optimize right.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224000145
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -3356,21 +3356,11 @@ class Dataset[T] private[sql](
* user-registered callback functions.
*/
   private def withAction[U](name: String, qe: QueryExecution)(action: 
SparkPlan => U) = {
-try {
-  qe.executedPlan.foreach { plan =>
-plan.resetMetrics()
-  }
-  val start = System.nanoTime()
-  val result = SQLExecution.withNewExecutionId(sparkSession, qe) {
-action(qe.executedPlan)
-  }
-  val end = System.nanoTime()
-  sparkSession.listenerManager.onSuccess(name, qe, end - start)
-  result
-} catch {
-  case e: Exception =>
-sparkSession.listenerManager.onFailure(name, qe, e)
-throw e
+qe.executedPlan.foreach { plan =>
--- End diff --

can this throw an exception? Imagine if `df.count()` threw an exception, 
and then you run it again.
Won't this be a behavior change in that case?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22674: [SPARK-25680][SQL] SQL execution listener shouldn...

2018-10-10 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/22674#discussion_r224000809
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala ---
@@ -71,14 +72,35 @@ object SQLExecution {
   val callSite = sc.getCallSite()
 
   withSQLConfPropagated(sparkSession) {
-sc.listenerBus.post(SparkListenerSQLExecutionStart(
-  executionId, callSite.shortForm, callSite.longForm, 
queryExecution.toString,
-  SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan), 
System.currentTimeMillis()))
+var ex: Option[Exception] = None
+val startTime = System.currentTimeMillis()
 try {
+  sc.listenerBus.post(SparkListenerSQLExecutionStart(
+executionId = executionId,
+description = callSite.shortForm,
+details = callSite.longForm,
+physicalPlanDescription = queryExecution.toString,
+// `queryExecution.executedPlan` triggers query planning. If 
it fails, the exception
+// will be caught and reported in the 
`SparkListenerSQLExecutionEnd`
+sparkPlanInfo = 
SparkPlanInfo.fromSparkPlan(queryExecution.executedPlan),
+time = startTime))
   body
+} catch {
+  case e: Exception =>
+ex = Some(e)
+throw e
 } finally {
-  sc.listenerBus.post(SparkListenerSQLExecutionEnd(
-executionId, System.currentTimeMillis()))
+  val endTime = System.currentTimeMillis()
+  val event = SparkListenerSQLExecutionEnd(executionId, endTime)
+  // Currently only `Dataset.withAction` and 
`DataFrameWriter.runCommand` specify the `name`
+  // parameter. The `ExecutionListenerManager` only watches SQL 
executions with name. We
+  // can specify the execution name in more places in the future, 
so that
+  // `QueryExecutionListener` can track more cases.
+  event.executionName = name
+  event.duration = endTime - startTime
--- End diff --

duration used to be reported in nanos. Now it's millis. I would still 
report it as nanos if possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97190/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22682
  
**[Test build #97190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97190/testReport)**
 for PR 22682 at commit 
[`dafbd2e`](https://github.com/apache/spark/commit/dafbd2e26166fae67c761b747f400023e0ec1abf).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22685
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22685
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97189/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22685
  
**[Test build #97189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97189/testReport)**
 for PR 22685 at commit 
[`2d0debb`](https://github.com/apache/spark/commit/2d0debb0abd401b2fe68fe0b934a4a6939f7cb27).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22685
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22682
  
**[Test build #97190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97190/testReport)**
 for PR 22682 at commit 
[`dafbd2e`](https://github.com/apache/spark/commit/dafbd2e26166fae67c761b747f400023e0ec1abf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22685
  
**[Test build #97189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97189/testReport)**
 for PR 22685 at commit 
[`2d0debb`](https://github.com/apache/spark/commit/2d0debb0abd401b2fe68fe0b934a4a6939f7cb27).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3844/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22685
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22685: [SQL][MINOR][Refactor] Refactor sql/core

2018-10-10 Thread sadhen

GitHub user sadhen opened a pull request:

https://github.com/apache/spark/pull/22685

[SQL][MINOR][Refactor] Refactor sql/core

## What changes were proposed in this pull request?
Only minor changes on Scala syntax.

## How was this patch tested?
Existing Tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sadhen/spark minor/refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22685


commit 2d0debb0abd401b2fe68fe0b934a4a6939f7cb27
Author: Darcy Shen 
Date:   2018-10-10T08:58:22Z

refactor sql/core according to Intellij




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-10-10 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22010
  
If this is not yet in 2.4 it shouldnât be merged now.

On Wed, Oct 10, 2018 at 10:57 AM Holden Karau 
wrote:

> Open question: is this suitable for branch-2.4 since it predates the
> branch cut or not? (I know we've gone back and forth on how we do that).
>
> â
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>
-- 
-x



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...

2018-10-10 Thread holdenk

Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/22010
  
Open question: is this suitable for branch-2.4 since it predates the branch 
cut or not? (I know we've gone back and forth on how we do that).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22598: [SPARK-25501][SS] Add kafka delegation token support.

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22598
  
**[Test build #97188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97188/testReport)**
 for PR 22598 at commit 
[`fffd139`](https://github.com/apache/spark/commit/fffd139a5cb62fe1f329cf639a9212ff1f4b751c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22684
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3843/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22684
  
**[Test build #97187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97187/testReport)**
 for PR 22684 at commit 
[`28322b2`](https://github.com/apache/spark/commit/28322b2efacd5fd1787c0ca1fe64e5db4a5b25fc).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22684
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22009#discussion_r223988838
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -169,15 +174,16 @@ object DataSourceV2Relation {
   options: Map[String, String],
   tableIdent: Option[TableIdentifier] = None,
   userSpecifiedSchema: Option[StructType] = None): 
DataSourceV2Relation = {
-val reader = source.createReader(options, userSpecifiedSchema)
+val readSupport = source.createReadSupport(options, 
userSpecifiedSchema)
--- End diff --

This looks a regression comparing 2.3 - Data Source V2 is under heavy 
development so I understand but this is quite crucial. From a cursory look, 
this is introduced in 
https://github.com/apache/spark/commit/5fef6e3513d6023a837c427d183006d153c7102b

I would suggest to partially revert this commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22684: [SPARK-25699][SQL] Partially push down conjunctive predi...

2018-10-10 Thread gengliangwang

Github user gengliangwang commented on the issue:

https://github.com/apache/spark/pull/22684
  
@dbtsai @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-10 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r223987644
  
--- Diff: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/TokenUtilSuite.scala
 ---
@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import org.apache.kafka.clients.CommonClientConfigs
+import org.apache.kafka.common.config.SaslConfigs
+import org.scalatest.BeforeAndAfterEach
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.internal.config._
+
+class TokenUtilSuite extends SparkFunSuite with BeforeAndAfterEach {
+  private val bootStrapServers = "127.0.0.1:0"
+  private val plainSecurityProtocol = "SASL_PLAINTEXT"
+  private val sslSecurityProtocol = "SASL_SSL"
+  private val trustStoreLocation = "/path/to/trustStore"
+  private val trustStorePassword = "secret"
+  private val keytab = "/path/to/keytab"
+  private val kerberosServiceName = "kafka"
+  private val principal = "u...@domain.com"
+
+  private var sparkConf: SparkConf = null
+
+  override def beforeEach(): Unit = {
+super.beforeEach()
+sparkConf = new SparkConf()
+  }
+
+  test("createAdminClientProperties without bootstrap servers should throw 
exception") {
+val thrown = intercept[IllegalArgumentException] {
+  TokenUtil.createAdminClientProperties(sparkConf)
+}
+assert(thrown.getMessage contains
+  "Tried to obtain kafka delegation token but bootstrap servers not 
configured.")
+  }
+
+  test("createAdminClientProperties without SSL protocol should not take 
over truststore config") {
+sparkConf.set(KAFKA_BOOTSTRAP_SERVERS, bootStrapServers)
+sparkConf.set(KAFKA_SECURITY_PROTOCOL, plainSecurityProtocol)
+sparkConf.set(KAFKA_TRUSTSTORE_LOCATION, trustStoreLocation)
+sparkConf.set(KAFKA_TRUSTSTORE_PASSWORD, trustStoreLocation)
+
+val adminClientProperties = 
TokenUtil.createAdminClientProperties(sparkConf)
+
+
assert(adminClientProperties.get(CommonClientConfigs.BOOTSTRAP_SERVERS_CONFIG)
+  .equals(bootStrapServers))
--- End diff --

No particular reason, changed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22598: [SPARK-25501][SS] Add kafka delegation token supp...

2018-10-10 Thread gaborgsomogyi

Github user gaborgsomogyi commented on a diff in the pull request:

https://github.com/apache/spark/pull/22598#discussion_r223987456
  
--- Diff: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSecurityHelper.scala
 ---
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import org.apache.hadoop.security.UserGroupInformation
+import org.apache.hadoop.security.token.{Token, TokenIdentifier}
+import org.apache.kafka.common.security.scram.ScramLoginModule
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config._
+
+private[kafka010] object KafkaSecurityHelper extends Logging {
+  def getKeytabJaasParams(sparkConf: SparkConf): Option[String] = {
+val keytab = sparkConf.get(KEYTAB)
+if (keytab.isDefined) {
+  val serviceName = sparkConf.get(KAFKA_KERBEROS_SERVICE_NAME)
+  require(serviceName.nonEmpty, "Kerberos service name must be 
defined")
+  val principal = sparkConf.get(PRINCIPAL)
+  require(principal.nonEmpty, "Principal must be defined")
+
+  val params =
+s"""
+|${getKrb5LoginModuleName} required
+| useKeyTab=true
+| serviceName="${serviceName.get}"
+| keyTab="${keytab.get}"
+| principal="${principal.get}";
+""".stripMargin.replace("\n", "")
+  logDebug(s"Krb JAAS params: $params")
+  Some(params)
+} else {
+  None
+}
+  }
+
+  private def getKrb5LoginModuleName(): String = {
--- End diff --

Added.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22684: [SPARK-25699][SQL] Partially push down conjunctiv...

2018-10-10 Thread gengliangwang

GitHub user gengliangwang opened a pull request:

https://github.com/apache/spark/pull/22684

[SPARK-25699][SQL] Partially push down conjunctive predicated in Orc

## What changes were proposed in this pull request?

Inspired by https://github.com/apache/spark/pull/22574 .
We can partially push down top level conjunctive predicates to Orc.
The code changes include Orc data source under SQL and Hive module.

## How was this patch tested?

New unit test.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/gengliangwang/spark pushOrcFilters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22684


commit e2dee098b2efb80238e38068d2ad8681504e03e0
Author: Gengliang Wang 
Date:   2018-10-10T07:00:13Z

push more predicates

commit 798ed5317f953c0aaaf6c5e92126b26f15fe66c0
Author: Gengliang Wang 
Date:   2018-10-10T08:34:34Z

handle hive orc

commit 28322b2efacd5fd1787c0ca1fe64e5db4a5b25fc
Author: Gengliang Wang 
Date:   2018-10-10T08:43:29Z

add jira number in test case




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r223983702
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -189,23 +192,34 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
   """.stripMargin.trim
   }
 
+  private def writeOrError(writer: Writer)(f: Writer => Unit): Unit =
+try f(writer) catch { case e: AnalysisException => 
writer.write(e.toString) }
--- End diff --

Please use multiple lines here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r223983537
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
@@ -167,6 +172,58 @@ package object util {
 builder.toString()
   }
 
+  /**
+   * The performance overhead of creating and logging strings for wide 
schemas can be large. To
+   * limit the impact, we bound the number of fields to include by 
default. This can be overridden
+   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv or by 
settings the SQL config
+   * `spark.sql.debug.maxToStringFields`.
+   */
+  private[spark] def maxNumToStringFields: Int = {
+val legacyLimit = if (SparkEnv.get != null) {
+  SparkEnv.get.conf.get(config.MAX_TO_STRING_FIELDS)
+} else {
+  config.MAX_TO_STRING_FIELDS.defaultValue.get
+}
+val sqlConfLimit = SQLConf.get.maxToStringFields
+
+Math.max(sqlConfLimit, legacyLimit)
+  }
+
+  /** Whether we have warned about plan string truncation yet. */
+  private val truncationWarningPrinted = new AtomicBoolean(false)
+
+  /**
+   * Format a sequence with semantics similar to calling .mkString(). Any 
elements beyond
+   * maxNumToStringFields will be dropped and replaced by a "... N more 
fields" placeholder.
+   *
+   * @return the trimmed and formatted string.
+   */
+  def truncatedString[T](
+  seq: Seq[T],
+  start: String,
+  sep: String,
+  end: String,
+  maxFields: Option[Int]): String = {
+val maxNumFields = maxFields.getOrElse(maxNumToStringFields)
--- End diff --

You should document this behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-10-10 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22009#discussion_r223982672
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala
 ---
@@ -169,15 +174,16 @@ object DataSourceV2Relation {
   options: Map[String, String],
   tableIdent: Option[TableIdentifier] = None,
   userSpecifiedSchema: Option[StructType] = None): 
DataSourceV2Relation = {
-val reader = source.createReader(options, userSpecifiedSchema)
+val readSupport = source.createReadSupport(options, 
userSpecifiedSchema)
--- End diff --

Looks not directly related with this PR but I think this is a good place to 
ask. Why do we make a readsupport in write path?


https://github.com/apache/spark/blob/e06da95cd9423f55cdb154a2778b0bddf7be984c/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L249

Retrieving the physical schema of the underlying storage is potentially 
expensive. Actually even worse: it looks odd that write path requires read 
side's schema. Which schema should we expect here in write path?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r223982046
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ---
@@ -167,6 +172,58 @@ package object util {
 builder.toString()
   }
 
+  /**
+   * The performance overhead of creating and logging strings for wide 
schemas can be large. To
+   * limit the impact, we bound the number of fields to include by 
default. This can be overridden
+   * by setting the 'spark.debug.maxToStringFields' conf in SparkEnv or by 
settings the SQL config
+   * `spark.sql.debug.maxToStringFields`.
+   */
+  private[spark] def maxNumToStringFields: Int = {
+val legacyLimit = if (SparkEnv.get != null) {
--- End diff --

Just for context why do you want to retain the legacy behavior? It is 
probably fine to break it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r223980665
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala 
---
@@ -455,21 +457,37 @@ abstract class TreeNode[BaseType <: 
TreeNode[BaseType]] extends Product {
   }.mkString(", ")
 
   /** ONE line description of this node. */
-  def simpleString: String = s"$nodeName $argString".trim
+  def simpleString(maxFields: Option[Int]): String = {
--- End diff --

Please document the `maxFields` parameter. I am especially interested in 
what `None` represents here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r223979931
  
--- Diff: 
external/avro/src/main/scala/org/apache/spark/sql/avro/CatalystDataToAvro.scala 
---
@@ -52,7 +52,7 @@ case class CatalystDataToAvro(child: Expression) extends 
UnaryExpression {
 out.toByteArray
   }
 
-  override def simpleString: String = {
+  override def simpleString(maxFields: Option[Int]): String = {
 s"to_avro(${child.sql}, ${child.dataType.simpleString})"
--- End diff --

Should we also pass the maxFields to `child.dataType`? For example 
`StructType` fields are truncated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r223979392
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -633,4 +633,14 @@ package object config {
   .stringConf
   .toSequence
   .createWithDefault(Nil)
+
+  private[spark] val MAX_TO_STRING_FIELDS =
+ConfigBuilder("spark.debug.maxToStringFields")
+  .internal()
+  .doc("Maximum number of fields of sequence-like entries that can be 
converted to strings " +
--- End diff --

What is a sequence like entry?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22429: [SPARK-25440][SQL] Dumping query execution info to a fil...

2018-10-10 Thread hvanhovell

Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/22429
  
@boy-uber the thing you are suggesting is a pretty big undertaking and 
beyond the scope of this PR.

If you are going to add structured plans to the explain output, you 
probably also want some guarantees about stability over multiple spark versions 
and you probably also want to be able to reconstruct the plan. Neither is easy. 
If you want to have this in Spark, then I suggest sending a proposal to the dev 
list.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22678: [SPARK-25685][BUILD] Allow running tests in Jenki...

2018-10-10 Thread LantaoJin

Github user LantaoJin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22678#discussion_r223966833
  
--- Diff: dev/run-tests-jenkins.py ---
@@ -176,7 +177,8 @@ def main():
 build_display_name = os.environ["BUILD_DISPLAY_NAME"]
 build_url = os.environ["BUILD_URL"]
 
-commit_url = "https://github.com/apache/spark/commit/; + 
ghprb_actual_commit
+project_url = os.getenv("SPARK_PROJECT_URL", 
"https://github.com/apache/spark;)
+commit_url = project_url + "/commit/" + ghprb_actual_commit
--- End diff --

How about to add documentation at 
https://spark.apache.org/docs/latest/building-spark.html?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update...

2018-10-10 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22682#discussion_r223966211
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1890,6 +1890,10 @@ working with timestamps in `pandas_udf`s to get the 
best performance, see
 
 # Migration Guide
 
+## Upgrading From Spark SQL 2.4 to 3.0
+
+  - In PySpark, when creating a `SparkSession` with 
`SparkSession.builder.getOrCreate()`, if there is an existing `SparkContext`, 
the builder was trying to update the `SparkConf` of the existing `SparkContext` 
with configurations specified to the builder, but the `SparkContext` is shared 
by all `SparkSession`s, so we should not update them. Since 3.0, the builder 
come to not update the configurations. If you want to update them, you need to 
update them prior to creating a `SparkSession`.
--- End diff --

let's also mention that, this is already the behavior for Spark java/scala 
APIs.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22682
  
**[Test build #97186 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97186/testReport)**
 for PR 22682 at commit 
[`3bf42af`](https://github.com/apache/spark/commit/3bf42af5695729597e1f49b11f50f2fb450ead7f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97186/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22575: [SPARK-24630][SS][WIP] Support SQLStreaming in Spark

2018-10-10 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/spark/pull/22575
  
Is this still a WIP?
Using isStreaming tag in DDL to mark if a table is streaming or not is 
brilliant. It keeps compatible with batch queries sql.
If possible, I think not introducing STREAM keywords in DML is better to 
go. Maybe we can use properties(like `isStreaming`) of table participated in 
query to generate StreamingRelation or batch relation. How do you think?
SQLStreaming is important part in SS in my perspective, as it makes SS more 
complete and usable. Thanks for your work!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22683
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22683: [SPARK-25696] The storage memory displayed on spa...

2018-10-10 Thread httfighter

GitHub user httfighter opened a pull request:

https://github.com/apache/spark/pull/22683

[SPARK-25696] The storage memory displayed on spark Application UI isâ¦

â¦ incorrect.

## What changes were proposed in this pull request?
Change the cardinality of the unit conversion in the formatBytes function 
to 1024.

## How was this patch tested?
 manual tests

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/httfighter/spark SPARK-25696

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22683.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22683


commit 9e45697296039e55e85dd204788e287c9c60fceb
Author: é©ç°ç°00222924 
Date:   2018-10-10T06:47:36Z

[SPARK-25696] The storage memory displayed on spark Application UI is 
incorrect.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97183/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21669
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21669: [SPARK-23257][K8S] Kerberos Support for Spark on K8S

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21669
  
**[Test build #97183 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97183/testReport)**
 for PR 21669 at commit 
[`a958920`](https://github.com/apache/spark/commit/a9589203ef9262414c474febadaf9e809d28ac8c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22545: [SPARK-25525][SQL][PYSPARK] Do not update conf fo...

2018-10-10 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/22545#discussion_r223953269
  
--- Diff: python/pyspark/sql/session.py ---
@@ -156,7 +156,7 @@ def getOrCreate(self):
 default.
 
 >>> s1 = SparkSession.builder.config("k1", "v1").getOrCreate()
->>> s1.conf.get("k1") == s1.sparkContext.getConf().get("k1") 
== "v1"
+>>> s1.conf.get("k1") == "v1"
--- End diff --

Submitted a pr to update the migration guide #22682.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3842/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22682
  
**[Test build #97186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97186/testReport)**
 for PR 22682 at commit 
[`3bf42af`](https://github.com/apache/spark/commit/3bf42af5695729597e1f49b11f50f2fb450ead7f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22682
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update...

2018-10-10 Thread ueshin

GitHub user ueshin opened a pull request:

https://github.com/apache/spark/pull/22682

[SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the migration guide.

## What changes were proposed in this pull request?

This is a follow-up pr of #18536 and #22545 to update the migration guide.

## How was this patch tested?

Build and check the doc locally.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ueshin/apache-spark 
issues/SPARK-20946_25525/migration_guide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22682.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22682


commit 3bf42af5695729597e1f49b11f50f2fb450ead7f
Author: Takuya UESHIN 
Date:   2018-10-10T06:29:20Z

Update the migration guide.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22682: [SPARK-20946][SPARK-25525][SQL][FOLLOW-UP] Update the mi...

2018-10-10 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/22682
  
cc @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22575: [SPARK-24630][SS][WIP] Support SQLStreaming in Spark

2018-10-10 Thread WangTaoTheTonic

Github user WangTaoTheTonic commented on the issue:

https://github.com/apache/spark/pull/22575
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22466
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-10 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22466
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97182/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22466: [SPARK-25464][SQL] Create Database to the location,only ...

2018-10-10 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22466
  
**[Test build #97182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97182/testReport)**
 for PR 22466 at commit 
[`5d95a09`](https://github.com/apache/spark/commit/5d95a09122436217a8a6e891cfb96babfe49fb2b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5

401 - 499 of 499 matches

Mail list logo