date:20180328

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20926
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20926
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88688/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20926
  
**[Test build #88688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88688/testReport)**
 for PR 20926 at commit 
[`851a5ef`](https://github.com/apache/spark/commit/851a5efa87a9f10843ec9d45437d6a5d94cc0816).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20812: [SPARK-23669] Executors fetch jars and name the jars wit...

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20812
  
@jinxing64 , I think using same name jars which contains different classes 
seems practically is not a best practice. Ideally different udfs should be 
packaged in different jars with different name/version. That will be easy for 
user to manage. Also same name jars could easily cause classpath issue usually.

As you always has a workaround for this issue out of Spark. So I would 
suggest not to fix it, since this is a quite user specific issue.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1823/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20860
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20860
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1822/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #88693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88693/testReport)**
 for PR 19222 at commit 
[`b69cb64`](https://github.com/apache/spark/commit/b69cb6430d71fe6ce7a39f9d6a13bdcfa8704ccf).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20860
  
**[Test build #88692 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88692/testReport)**
 for PR 20860 at commit 
[`2ea9b7a`](https://github.com/apache/spark/commit/2ea9b7a58279d0e5d7cdfad8d67ab9227983be1a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20860
  
LGTM. I'm also playing around with isolated hive classloader these days.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20860: [SPARK-23743][SQL] Changed a comparison logic from conta...

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20860
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20876: [SPARK-23653][SQL] Capture sql statements user input and...

2018-03-28 Thread LantaoJin

Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/20876
  
Hi, @jerryshao @cloud-fan, may I have some update?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20920
  
LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20920
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-28 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r177958142
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/ByteArrayMemoryBlock.java
 ---
@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.unsafe.memory;
+
+import com.google.common.primitives.Ints;
+
+import org.apache.spark.unsafe.Platform;
+
+/**
+ * A consecutive block of memory with a byte array on Java heap.
+ */
+public final class ByteArrayMemoryBlock extends MemoryBlock {
+
+  private final byte[] array;
+
+  public ByteArrayMemoryBlock(byte[] obj, long offset, long size) {
+super(obj, offset, size);
+this.array = obj;
+assert(offset + size <= Platform.BYTE_ARRAY_OFFSET + obj.length) :
--- End diff --

To add this assertion cause a new failure at 
[`UTF8StringSuite.writeToOutputStreamUnderflow()`](https://github.com/apache/spark/pull/19222/files#diff-321a62638d3ef7bbc9c35842967c868bR515).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20928: Fix small typo in configuration doc

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20928
  
It would be better to check other docs, not only in configurations here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20931
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20931: [SPARK-23815][Core]Spark writer dynamic partition overwr...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20931
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20931: [SPARK-23815][Core]Spark writer dynamic partition...

2018-03-28 Thread fangshil

GitHub user fangshil opened a pull request:

https://github.com/apache/spark/pull/20931

[SPARK-23815][Core]Spark writer dynamic partition overwrite mode may fail 
to write output on multi level partition


## What changes were proposed in this pull request?

Spark introduced new writer mode to overwrite only related partitions in 
SPARK-20236. While we are using this feature in our production cluster, we 
found a bug when writing multi-level partitions on HDFS.

A simple test case to reproduce this issue:
val df = Seq(("1","2","3")).toDF("col1", "col2","col3")

df.write.partitionBy("col1","col2").mode("overwrite").save("/my/hdfs/location")

If HDFS location "/my/hdfs/location" does not exist, there will be no 
output.

This seems to be caused by the job commit change in SPARK-20236 in 
HadoopMapReduceCommitProtocol.

In the commit job process, the output has been written into staging dir 
/my/hdfs/location/.spark-staging.xxx/col1=1/col2=2, and then the code calls 
fs.rename to rename /my/hdfs/location/.spark-staging.xxx/col1=1/col2=2 to 
/my/hdfs/location/col1=1/col2=2. However, in our case the operation will fail 
on HDFS because /my/hdfs/location/col1=1 does not exists. HDFS rename can not 
create directory for more than one level. 

This does not happen in unit test covered with SPARK-20236 with local file 
system.

We are proposing a fix. When cleaning current partition dir 
/my/hdfs/location/col1=1/col2=2 before the rename op, if the delete op fails 
(because /my/hdfs/location/col1=1/col2=2 may not exist), we call mkdirs op to 
create the parent dir /my/hdfs/location/col1=1 (if the parent dir does not 
exist) so the following rename op can succeed.





## How was this patch tested?

We have tested this patch on our production cluster and it fixed the problem

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fangshil/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20931.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20931


commit da63c17d7ae7fbf04cc474d946d61a098b3e1ade
Author: Fangshi Li 
Date:   2018-03-28T04:25:54Z

Spark writer dynamic partition overwrite mode may fail to write output on 
multi level partition




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20920
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20920
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1821/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20920
  
**[Test build #88691 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88691/testReport)**
 for PR 20920 at commit 
[`35ecbf9`](https://github.com/apache/spark/commit/35ecbf983b675b7fa5643c4c395995e4dca2647e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20920
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20922: Roll forward "[SPARK-23096][SS] Migrate rate sour...

2018-03-28 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20922#discussion_r177953871
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamProvider.scala
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.sources
+
+import java.util.Optional
+
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2._
+import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, 
MicroBatchReader}
+import org.apache.spark.sql.types._
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `rowsPerSecond` (e.g. 100, default: 1): How many rows should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `rowsPerSecond`. Using finer granularities than seconds will 
be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated rows. The source will try its best to reach 
`rowsPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateStreamProvider extends DataSourceV2
+  with MicroBatchReadSupport with ContinuousReadSupport with 
DataSourceRegister {
+  import RateStreamProvider._
+
+  override def createMicroBatchReader(
+  schema: Optional[StructType],
+  checkpointLocation: String,
+  options: DataSourceOptions): MicroBatchReader = {
--- End diff --

Thanks for the explanation @jose-torres . This seems like a quite common 
usage scenario, I also see that socket source and console sink require 
SparkSession, also in my customized hive streaming sink 
(https://github.com/jerryshao/spark-hive-streaming-sink/blob/7b3afcee280d2e70ffb12dde24184726b618829d/core/src/main/scala/com/hortonworks/spark/hive/HiveSourceProvider.scala#L46).
 If we add that parameter back, things might be much easier.

What's your opinion @cloud-fan ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20930: [SPARK-23811][Core] Same tasks' FetchFailed event comes ...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20930
  
**[Test build #88690 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88690/testReport)**
 for PR 20930 at commit 
[`2907075`](https://github.com/apache/spark/commit/2907075b43eac26c7efbe4aca5f2c037bb5934c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20930: [SPARK-23811][Core] Same tasks' FetchFailed event comes ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-03-28 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r177953620
  
--- Diff: 
common/unsafe/src/test/java/org/apache/spark/unsafe/types/UTF8StringSuite.java 
---
@@ -515,7 +518,8 @@ public void writeToOutputStreamUnderflow() throws 
IOException {
 final byte[] test = "01234567".getBytes(StandardCharsets.UTF_8);
 
 for (int i = 1; i <= Platform.BYTE_ARRAY_OFFSET; ++i) {
-  UTF8String.fromAddress(test, Platform.BYTE_ARRAY_OFFSET - i, 
test.length + i)
+  new UTF8String(
+new ByteArrayMemoryBlock(test, Platform.BYTE_ARRAY_OFFSET - i, 
test.length + i))
--- End diff --

I thought this is what you said 
[here](https://github.com/apache/spark/pull/19222#discussion_r176986304).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20930: [SPARK-23811][Core] Same tasks' FetchFailed event comes ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20930
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1820/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20930: [SPARK-23811][Core] Same tasks' FetchFailed event comes ...

2018-03-28 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20930
  
The scenario can be reproduced by below test case added in 
`DAGSchedulerSuite`
```scala
/**
   * This tests the case where origin task success after speculative task 
got FetchFailed
   * before.
   */
  test("[SPARK-23811] Fetch failed task should kill other attempt") {
// Create 3 RDDs with shuffle dependencies on each other: rddA <--- 
rddB <--- rddC
val rddA = new MyRDD(sc, 2, Nil)
val shuffleDepA = new ShuffleDependency(rddA, new HashPartitioner(2))
val shuffleIdA = shuffleDepA.shuffleId

val rddB = new MyRDD(sc, 2, List(shuffleDepA), tracker = 
mapOutputTracker)
val shuffleDepB = new ShuffleDependency(rddB, new HashPartitioner(2))

val rddC = new MyRDD(sc, 2, List(shuffleDepB), tracker = 
mapOutputTracker)

submit(rddC, Array(0, 1))

// Complete both tasks in rddA.
assert(taskSets(0).stageId === 0 && taskSets(0).stageAttemptId === 0)
complete(taskSets(0), Seq(
  (Success, makeMapStatus("hostA", 2)),
  (Success, makeMapStatus("hostB", 2

// The first task success
runEvent(makeCompletionEvent(
  taskSets(1).tasks(0), Success, makeMapStatus("hostB", 2)))

// The second task's speculative attempt fails first, but task self 
still running.
// This may caused by ExecutorLost.
runEvent(makeCompletionEvent(
  taskSets(1).tasks(1),
  FetchFailed(makeBlockManagerId("hostA"), shuffleIdA, 0, 0, "ignored"),
  null))
// Check currently missing partition

assert(mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get.size 
=== 1)
val missingPartition = 
mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get(0)

// The second result task self success soon
runEvent(makeCompletionEvent(
  taskSets(1).tasks(1), Success, makeMapStatus("hostB", 2)))
// No missing partitions here, this will cause child stage never succeed

assert(mapOutputTracker.findMissingPartitions(shuffleDepB.shuffleId).get.size 
=== 0)
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20930: [SPARK-23811][Core] Same tasks' FetchFailed event...

2018-03-28 Thread xuanyuanking

GitHub user xuanyuanking opened a pull request:

https://github.com/apache/spark/pull/20930

[SPARK-23811][Core] Same tasks' FetchFailed event comes before Success will 
cause child stage never succeed

## What changes were proposed in this pull request?

This is a bug caused by abnormal scenario describe below:

ShuffleMapTask 1.0 running, this task will fetch data from ExecutorA
ExecutorA Lost, trigger `mapOutputTracker.removeOutputsOnExecutor(execId)` 
, shuffleStatus changed.
Speculative ShuffleMapTask 1.1 start, got a FetchFailed immediately.
ShuffleMapTask 1 is the last task of its stage, so this stage will never 
succeed because of there's no missing task DAGScheduler can get.

I apply the detailed screenshots in jira comments.

## How was this patch tested?

Add a new UT in `TaskSetManagerSuite`


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xuanyuanking/spark SPARK-23811

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20930.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20930


commit 2907075b43eac26c7efbe4aca5f2c037bb5934c2
Author: Yuanjian Li 
Date:   2018-03-29T04:50:16Z

[SPARK-23811][Core] Same tasks' FetchFailed event comes before Success will 
cause child stage never succeed




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88686/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20797
  
**[Test build #88686 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88686/testReport)**
 for PR 20797 at commit 
[`4493909`](https://github.com/apache/spark/commit/4493909f3b66c74e488e57ffb6e89fc048a81a8d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88687/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20753
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20753
  
**[Test build #88687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88687/testReport)**
 for PR 20753 at commit 
[`09cdf5e`](https://github.com/apache/spark/commit/09cdf5e9920a4f896fd34fc361cf6c4382fd09e5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19222
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1819/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20920
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20920
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88685/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20920
  
**[Test build #88685 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88685/testReport)**
 for PR 20920 at commit 
[`35ecbf9`](https://github.com/apache/spark/commit/35ecbf983b675b7fa5643c4c395995e4dca2647e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19222
  
**[Test build #88689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88689/testReport)**
 for PR 19222 at commit 
[`59fd393`](https://github.com/apache/spark/commit/59fd393cb4e378550f90aaa5f5ceb2c9e3d85fef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20926
  
**[Test build #88688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88688/testReport)**
 for PR 20926 at commit 
[`851a5ef`](https://github.com/apache/spark/commit/851a5efa87a9f10843ec9d45437d6a5d94cc0816).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20926
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88683/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20926
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20926
  
**[Test build #88683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88683/testReport)**
 for PR 20926 at commit 
[`d0988f7`](https://github.com/apache/spark/commit/d0988f7378152b576844c4ae11b1761fa9a3bde2).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class TestSparkSessionSuite extends SparkFunSuite with 
SharedSparkSession `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20922: Roll forward "[SPARK-23096][SS] Migrate rate sour...

2018-03-28 Thread jose-torres

Github user jose-torres commented on a diff in the pull request:

https://github.com/apache/spark/pull/20922#discussion_r177943116
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamProvider.scala
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.sources
+
+import java.util.Optional
+
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2._
+import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, 
MicroBatchReader}
+import org.apache.spark.sql.types._
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `rowsPerSecond` (e.g. 100, default: 1): How many rows should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `rowsPerSecond`. Using finer granularities than seconds will 
be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated rows. The source will try its best to reach 
`rowsPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateStreamProvider extends DataSourceV2
+  with MicroBatchReadSupport with ContinuousReadSupport with 
DataSourceRegister {
+  import RateStreamProvider._
+
+  override def createMicroBatchReader(
+  schema: Optional[StructType],
+  checkpointLocation: String,
+  options: DataSourceOptions): MicroBatchReader = {
--- End diff --

I agree that there's a mismatch here.

The reason it doesn't currently have this parameter is that one of the 
DataSourceV2 design goals 
(https://docs.google.com/document/d/1n_vUVbF4KD3gxTmkNEon5qdQ-Z8qU5Frf6WMQZ6jJVM/edit#heading=h.mi1fbff5f8f9)
 was to avoid API dependencies on upper level APIs like SparkSession. (IIRC 
Wenchen and I discussed SparkSession specifically in the design stage.) In this 
story, SparkSession.get{Active/Default}Session is just a way to keep our 
existing sources working rather than an encouraged development practice.

I agree that there's a mismatch which could be worth some discussion, but I 
think it's out of scope for this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-28 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r177939445
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -32,141 +30,133 @@
  */
 public final class UnsafeArrayWriter extends UnsafeWriter {
 
-  private BufferHolder holder;
-
-  // The offset of the global buffer where we start to write this array.
-  private int startingOffset;
-
   // The number of elements in this array
   private int numElements;
 
+  // The element size in this array
+  private int elementSize;
+
   private int headerInBytes;
 
   private void assertIndexIsValid(int index) {
 assert index >= 0 : "index (" + index + ") should >= 0";
 assert index < numElements : "index (" + index + ") should < " + 
numElements;
   }
 
-  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+  public UnsafeArrayWriter(UnsafeWriter writer, int elementSize) {
+super(writer.getBufferHolder());
+this.elementSize = elementSize;
+  }
+
+  public void initialize(int numElements) {
 // We need 8 bytes to store numElements in header
 this.numElements = numElements;
 this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
-this.holder = holder;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
 
 // Grows the global buffer ahead for header and fixed size data.
 int fixedPartInBytes =
   ByteArrayMethods.roundNumberOfBytesToNearestWord(elementSize * 
numElements);
 holder.grow(headerInBytes + fixedPartInBytes);
 
 // Write numElements and clear out null bits to header
-Platform.putLong(holder.buffer, startingOffset, numElements);
+Platform.putLong(buffer(), startingOffset, numElements);
 for (int i = 8; i < headerInBytes; i += 8) {
-  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+  Platform.putLong(buffer(), startingOffset + i, 0L);
 }
 
 // fill 0 into reminder part of 8-bytes alignment in unsafe array
 for (int i = elementSize * numElements; i < fixedPartInBytes; i++) {
-  Platform.putByte(holder.buffer, startingOffset + headerInBytes + i, 
(byte) 0);
+  Platform.putByte(buffer(), startingOffset + headerInBytes + i, 
(byte) 0);
 }
-holder.cursor += (headerInBytes + fixedPartInBytes);
+incrementCursor(headerInBytes + fixedPartInBytes);
   }
 
-  private void zeroOutPaddingBytes(int numBytes) {
-if ((numBytes & 0x07) > 0) {
-  Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 
3), 0L);
-}
+  protected long getOffset(int ordinal, int elementSize) {
+return getElementOffset(ordinal, elementSize);
   }
 
   private long getElementOffset(int ordinal, int elementSize) {
 return startingOffset + headerInBytes + ordinal * elementSize;
   }
 
-  public void setOffsetAndSize(int ordinal, int currentCursor, int size) {
+  @Override
+  public void setOffsetAndSizeFromMark(int ordinal, int mark) {
 assertIndexIsValid(ordinal);
-final long relativeOffset = currentCursor - startingOffset;
-final long offsetAndSize = (relativeOffset << 32) | (long)size;
-
-write(ordinal, offsetAndSize);
+_setOffsetAndSizeFromMark(ordinal, mark);
   }
 
   private void setNullBit(int ordinal) {
 assertIndexIsValid(ordinal);
-BitSetMethods.set(holder.buffer, startingOffset + 8, ordinal);
+BitSetMethods.set(buffer(), startingOffset + 8, ordinal);
   }
 
   public void setNull1Bytes(int ordinal) {
 setNullBit(ordinal);
 // put zero into the corresponding field when set null
-Platform.putByte(holder.buffer, getElementOffset(ordinal, 1), (byte)0);
+Platform.putByte(buffer(), getElementOffset(ordinal, 1), (byte)0);
   }
 
   public void setNull2Bytes(int ordinal) {
 setNullBit(ordinal);
 // put zero into the corresponding field when set null
-Platform.putShort(holder.buffer, getElementOffset(ordinal, 2), 
(short)0);
+Platform.putShort(buffer(), getElementOffset(ordinal, 2), (short)0);
   }
 
   public void setNull4Bytes(int ordinal) {
 setNullBit(ordinal);
 // put zero into the corresponding field when set null
-Platform.putInt(holder.buffer, getElementOffset(ordinal, 4), 0);
+Platform.putInt(buffer(), getElementOffset(ordinal, 4), 0);
   }
 
   public void setNull8Bytes(int ordinal) {
 setNullBit(ordinal);
 // put zero into the corresponding field when set null
-Platform.putLong(holder.buffer, getEleme

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-28 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r177940602
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java
 ---
@@ -31,24 +31,24 @@
  * for each incoming record, we should call `reset` of BufferHolder 
instance before write the record
  * and reuse the data buffer.
  *
- * Generally we should call `UnsafeRow.setTotalSize` and pass in 
`BufferHolder.totalSize` to update
+ * Generally we should call `UnsafeRowWriter.setTotalSize` using 
`BufferHolder.totalSize` to update
--- End diff --

Not sure if this description is still here or better to move to 
`UnsafeRowWriter`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-28 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r177941093
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -17,17 +17,86 @@
 package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.sql.types.Decimal;
+import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
 /**
  * Base class for writing Unsafe* structures.
  */
 public abstract class UnsafeWriter {
+  // Keep internal buffer holder
+  protected final BufferHolder holder;
+
+  // The offset of the global buffer where we start to write this 
structure.
+  protected int startingOffset;
+
+  protected UnsafeWriter(BufferHolder holder) {
+this.holder = holder;
+  }
+
+  /**
+   * Accessor methods are delegated from BufferHolder class
+   */
+  public final BufferHolder getBufferHolder() {
+return holder;
+  }
+
+  public final byte[] buffer() {
+return holder.buffer();
+  }
+
+  public final void reset() {
+holder.reset();
+  }
+
+  public final int totalSize() {
+return holder.totalSize();
+  }
+
+  public final void grow(int neededSize) {
+holder.grow(neededSize);
+  }
+
+  public final int cursor() {
+return holder.getCursor();
+  }
+
+  public final void incrementCursor(int val) {
+holder.incrementCursor(val);
+  }
+
+  public abstract void setOffsetAndSizeFromMark(int ordinal, int mark);
--- End diff --

`Mark` is an ambiguous term. It is not clear what it means here. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-28 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r177941414
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -17,17 +17,86 @@
 package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.sql.types.Decimal;
+import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
 /**
  * Base class for writing Unsafe* structures.
  */
 public abstract class UnsafeWriter {
+  // Keep internal buffer holder
+  protected final BufferHolder holder;
+
+  // The offset of the global buffer where we start to write this 
structure.
+  protected int startingOffset;
+
+  protected UnsafeWriter(BufferHolder holder) {
+this.holder = holder;
+  }
+
+  /**
+   * Accessor methods are delegated from BufferHolder class
+   */
+  public final BufferHolder getBufferHolder() {
+return holder;
+  }
+
+  public final byte[] buffer() {
+return holder.buffer();
+  }
+
+  public final void reset() {
+holder.reset();
+  }
+
+  public final int totalSize() {
+return holder.totalSize();
+  }
+
+  public final void grow(int neededSize) {
+holder.grow(neededSize);
+  }
+
+  public final int cursor() {
+return holder.getCursor();
+  }
+
+  public final void incrementCursor(int val) {
+holder.incrementCursor(val);
+  }
+
+  public abstract void setOffsetAndSizeFromMark(int ordinal, int mark);
--- End diff --

Btw, why we have `_setOffsetAndSizeFromMark` and 
`setOffsetAndSizeFromMark`? Seems `setOffsetAndSizeFromMark` just call 
`_setOffsetAndSizeFromMark`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-28 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r177939830
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 ---
@@ -17,17 +17,86 @@
 package org.apache.spark.sql.catalyst.expressions.codegen;
 
 import org.apache.spark.sql.types.Decimal;
+import org.apache.spark.unsafe.Platform;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
 /**
  * Base class for writing Unsafe* structures.
  */
 public abstract class UnsafeWriter {
+  // Keep internal buffer holder
+  protected final BufferHolder holder;
+
+  // The offset of the global buffer where we start to write this 
structure.
+  protected int startingOffset;
+
+  protected UnsafeWriter(BufferHolder holder) {
+this.holder = holder;
+  }
+
+  /**
+   * Accessor methods are delegated from BufferHolder class
+   */
+  public final BufferHolder getBufferHolder() {
+return holder;
+  }
+
+  public final byte[] buffer() {
+return holder.buffer();
+  }
+
+  public final void reset() {
+holder.reset();
+  }
+
+  public final int totalSize() {
+return holder.totalSize();
+  }
+
+  public final void grow(int neededSize) {
+holder.grow(neededSize);
+  }
+
+  public final int cursor() {
+return holder.getCursor();
+  }
+
+  public final void incrementCursor(int val) {
+holder.incrementCursor(val);
+  }
+
+  public abstract void setOffsetAndSizeFromMark(int ordinal, int mark);
+
+  protected void _setOffsetAndSizeFromMark(int ordinal, int mark) {
+setOffsetAndSize(ordinal, mark, cursor() - mark);
+  }
+
+  protected void setOffsetAndSize(int ordinal, int size) {
+setOffsetAndSize(ordinal, cursor(), size);
+  }
+
+  protected void setOffsetAndSize(int ordinal, int currentCursor, int 
size) {
+final long relativeOffset = currentCursor - startingOffset;
+final long offsetAndSize = (relativeOffset << 32) | (long)size;
+
+write(ordinal, offsetAndSize);
+  }
+
+  protected final void zeroOutPaddingBytes(int numBytes) {
+if ((numBytes & 0x07) > 0) {
+  Platform.putLong(buffer(), cursor() + ((numBytes >> 3) << 3), 0L);
+}
+  }
+
+  protected abstract long getOffset(int ordinal, int elementSize);
--- End diff --

Can this just be `getOffset(int ordinal)`? One reason is only 
`UnsafeArrayWriter` has `elementSize`, another reason is `elementSize` is given 
at constructing `UnsafeArrayWriter`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL][WIP] Provide an option to ignore colu...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL][WIP] Provide an option to ignore colu...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88682/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL][WIP] Provide an option to ignore colu...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20929
  
**[Test build #88682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88682/testReport)**
 for PR 20929 at commit 
[`876da84`](https://github.com/apache/spark/commit/876da84a7da9dbdc408e153b9e3dc17776a0c9db).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20926
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20926
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88684/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20926
  
**[Test build #88684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88684/testReport)**
 for PR 20926 at commit 
[`7be16a9`](https://github.com/apache/spark/commit/7be16a93da1efb86c69aa74f2c352ccbb66e5d4a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20850: [SPARK-23713][SQL] Cleanup UnsafeWriter and Buffe...

2018-03-28 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20850#discussion_r177939433
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java
 ---
@@ -32,141 +30,133 @@
  */
 public final class UnsafeArrayWriter extends UnsafeWriter {
 
-  private BufferHolder holder;
-
-  // The offset of the global buffer where we start to write this array.
-  private int startingOffset;
-
   // The number of elements in this array
   private int numElements;
 
+  // The element size in this array
+  private int elementSize;
+
   private int headerInBytes;
 
   private void assertIndexIsValid(int index) {
 assert index >= 0 : "index (" + index + ") should >= 0";
 assert index < numElements : "index (" + index + ") should < " + 
numElements;
   }
 
-  public void initialize(BufferHolder holder, int numElements, int 
elementSize) {
+  public UnsafeArrayWriter(UnsafeWriter writer, int elementSize) {
+super(writer.getBufferHolder());
+this.elementSize = elementSize;
+  }
+
+  public void initialize(int numElements) {
 // We need 8 bytes to store numElements in header
 this.numElements = numElements;
 this.headerInBytes = calculateHeaderPortionInBytes(numElements);
 
-this.holder = holder;
-this.startingOffset = holder.cursor;
+this.startingOffset = cursor();
 
 // Grows the global buffer ahead for header and fixed size data.
 int fixedPartInBytes =
   ByteArrayMethods.roundNumberOfBytesToNearestWord(elementSize * 
numElements);
 holder.grow(headerInBytes + fixedPartInBytes);
 
 // Write numElements and clear out null bits to header
-Platform.putLong(holder.buffer, startingOffset, numElements);
+Platform.putLong(buffer(), startingOffset, numElements);
 for (int i = 8; i < headerInBytes; i += 8) {
-  Platform.putLong(holder.buffer, startingOffset + i, 0L);
+  Platform.putLong(buffer(), startingOffset + i, 0L);
 }
 
 // fill 0 into reminder part of 8-bytes alignment in unsafe array
 for (int i = elementSize * numElements; i < fixedPartInBytes; i++) {
-  Platform.putByte(holder.buffer, startingOffset + headerInBytes + i, 
(byte) 0);
+  Platform.putByte(buffer(), startingOffset + headerInBytes + i, 
(byte) 0);
 }
-holder.cursor += (headerInBytes + fixedPartInBytes);
+incrementCursor(headerInBytes + fixedPartInBytes);
   }
 
-  private void zeroOutPaddingBytes(int numBytes) {
-if ((numBytes & 0x07) > 0) {
-  Platform.putLong(holder.buffer, holder.cursor + ((numBytes >> 3) << 
3), 0L);
-}
+  protected long getOffset(int ordinal, int elementSize) {
+return getElementOffset(ordinal, elementSize);
   }
 
   private long getElementOffset(int ordinal, int elementSize) {
--- End diff --

Isn't `elementSize` a given parameter when constructing `UnsafeArrayWriter` 
now?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20893
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20893
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88679/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20893: [SPARK-23785][LAUNCHER] LauncherBackend doesn't check st...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20893
  
**[Test build #88679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88679/testReport)**
 for PR 20893 at commit 
[`4ca8a32`](https://github.com/apache/spark/commit/4ca8a32e2a518f3c7ccecd406a8b03eac06f860b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20784: [SPARK-23639][SQL]Obtain token before init metastore cli...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20784: [SPARK-23639][SQL]Obtain token before init metastore cli...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88680/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20784: [SPARK-23639][SQL]Obtain token before init metastore cli...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20784
  
**[Test build #88680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88680/testReport)**
 for PR 20784 at commit 
[`cd8056c`](https://github.com/apache/spark/commit/cd8056c3ad40afc08ac251a7ce502626fb9dd3c4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20924: [SPARK-23806] Broadcast.unpersist can cause fatal except...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20924
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20924: [SPARK-23806] Broadcast.unpersist can cause fatal except...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20924
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88678/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20924: [SPARK-23806] Broadcast.unpersist can cause fatal except...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20924
  
**[Test build #88678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88678/testReport)**
 for PR 20924 at commit 
[`54cab78`](https://github.com/apache/spark/commit/54cab78296c7e09777ba9989e9be620928801a51).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20891: [SPARK-23782][CORE][UI] SHS should list only application...

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20891
  
@mgaido91 what is the status of Hadoop, for example like YARN RM UI, will 
it show apps which is run by other users, while this user doesn't have 
permission to see?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20753
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1818/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20753
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20797
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1817/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20922: Roll forward "[SPARK-23096][SS] Migrate rate sour...

2018-03-28 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20922#discussion_r177933081
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamProvider.scala
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.sources
+
+import java.util.Optional
+
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2._
+import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, 
MicroBatchReader}
+import org.apache.spark.sql.types._
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `rowsPerSecond` (e.g. 100, default: 1): How many rows should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `rowsPerSecond`. Using finer granularities than seconds will 
be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated rows. The source will try its best to reach 
`rowsPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateStreamProvider extends DataSourceV2
+  with MicroBatchReadSupport with ContinuousReadSupport with 
DataSourceRegister {
+  import RateStreamProvider._
+
+  override def createMicroBatchReader(
+  schema: Optional[StructType],
+  checkpointLocation: String,
+  options: DataSourceOptions): MicroBatchReader = {
--- End diff --

What do you think @jose-torres @tdas @gatorsmile ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20920
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1816/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20922: Roll forward "[SPARK-23096][SS] Migrate rate sour...

2018-03-28 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/20922#discussion_r177932994
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/RateStreamProvider.scala
 ---
@@ -0,0 +1,125 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.streaming.sources
+
+import java.util.Optional
+
+import org.apache.spark.network.util.JavaUtils
+import org.apache.spark.sql.AnalysisException
+import 
org.apache.spark.sql.execution.streaming.continuous.RateStreamContinuousReader
+import org.apache.spark.sql.sources.DataSourceRegister
+import org.apache.spark.sql.sources.v2._
+import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousReader, 
MicroBatchReader}
+import org.apache.spark.sql.types._
+
+/**
+ *  A source that generates increment long values with timestamps. Each 
generated row has two
+ *  columns: a timestamp column for the generated time and an auto 
increment long column starting
+ *  with 0L.
+ *
+ *  This source supports the following options:
+ *  - `rowsPerSecond` (e.g. 100, default: 1): How many rows should be 
generated per second.
+ *  - `rampUpTime` (e.g. 5s, default: 0s): How long to ramp up before the 
generating speed
+ *becomes `rowsPerSecond`. Using finer granularities than seconds will 
be truncated to integer
+ *seconds.
+ *  - `numPartitions` (e.g. 10, default: Spark's default parallelism): The 
partition number for the
+ *generated rows. The source will try its best to reach 
`rowsPerSecond`, but the query may
+ *be resource constrained, and `numPartitions` can be tweaked to help 
reach the desired speed.
+ */
+class RateStreamProvider extends DataSourceV2
+  with MicroBatchReadSupport with ContinuousReadSupport with 
DataSourceRegister {
+  import RateStreamProvider._
+
+  override def createMicroBatchReader(
+  schema: Optional[StructType],
+  checkpointLocation: String,
+  options: DataSourceOptions): MicroBatchReader = {
--- End diff --

Here if `MicrobatchReadSupport` could pass in `SparkSession` parameter like 
`StreamSourceProvider#createSource` (sqlContext), then it is not required to 
get session from thread local variable or default variable, also the UT doesn't 
required to `setDefaultSession`.

That's what I thought when I did this refactoring work.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20920
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20920
  
**[Test build #88685 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88685/testReport)**
 for PR 20920 at commit 
[`35ecbf9`](https://github.com/apache/spark/commit/35ecbf983b675b7fa5643c4c395995e4dca2647e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20797
  
**[Test build #88686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88686/testReport)**
 for PR 20797 at commit 
[`4493909`](https://github.com/apache/spark/commit/4493909f3b66c74e488e57ffb6e89fc048a81a8d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20753
  
**[Test build #88687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88687/testReport)**
 for PR 20753 at commit 
[`09cdf5e`](https://github.com/apache/spark/commit/09cdf5e9920a4f896fd34fc361cf6c4382fd09e5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20920: [SPARK-23040][CORE][FOLLOW-UP] Avoid double wrap result ...

2018-03-28 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20920
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20797: [SPARK-23583][SQL] Invoke should support interpreted exe...

2018-03-28 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20797
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20753: [SPARK-23582][SQL] StaticInvoke should support interpret...

2018-03-28 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20753
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20922: Roll forward "[SPARK-23096][SS] Migrate rate source to V...

2018-03-28 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20922
  
Thanks for the help @jose-torres .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20928: Fix small typo in configuration doc

2018-03-28 Thread dsakuma

Github user dsakuma commented on the issue:

https://github.com/apache/spark/pull/20928
  
@HyukjinKwon Great idea! I've found and fixed some other issues using a 
spell checker.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20818: [SPARK-23675][WEB-UI]Title add spark logo, use sp...

2018-03-28 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20818


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20925
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88675/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20925: [SPARK-22941][core] Do not exit JVM when submit fails wi...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20925
  
**[Test build #88675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88675/testReport)**
 for PR 20925 at commit 
[`466f84a`](https://github.com/apache/spark/commit/466f84a558dcfe9b6944dcc3a62a8cdadf871d02).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  logInfo(s\"Failed to load main class $childMainClass.\")`
  * `  error(s\"Cannot load main class from JAR 
$primaryResource\")`
  * `  error(\"No main class set in JAR; please specify one with 
--class\")`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20818: [SPARK-23675][WEB-UI]Title add spark logo, use spark log...

2018-03-28 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/20818
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20926
  
**[Test build #88684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88684/testReport)**
 for PR 20926 at commit 
[`7be16a9`](https://github.com/apache/spark/commit/7be16a93da1efb86c69aa74f2c352ccbb66e5d4a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20927: [SPARK-23809][SQL] Active SparkSession should be set by ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20927
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20927: [SPARK-23809][SQL] Active SparkSession should be set by ...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20927
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88681/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20928: Fix small typo in configuration doc

2018-03-28 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20928
  
that's fine but mind taking a look for other typoes while we are here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20927: [SPARK-23809][SQL] Active SparkSession should be set by ...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20927
  
**[Test build #88681 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88681/testReport)**
 for PR 20927 at commit 
[`8f3cbf3`](https://github.com/apache/spark/commit/8f3cbf3399420a14f5ebe74b99b2739437fe3647).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20926: [SPARK-23808][SQL] Set default Spark session in test-onl...

2018-03-28 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20926
  
**[Test build #88683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88683/testReport)**
 for PR 20926 at commit 
[`d0988f7`](https://github.com/apache/spark/commit/d0988f7378152b576844c4ae11b1761fa9a3bde2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL][WIP] Provide an option to ignore colu...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20929: [SPARK-23772][SQL][WIP] Provide an option to ignore colu...

2018-03-28 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20929
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1815/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20929: [SPARK-23772][SQL][WIP] Provide an option to igno...

2018-03-28 Thread maropu

Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/20929#discussion_r177925982
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 ---
@@ -624,6 +624,42 @@ class FileStreamSourceSuite extends 
FileStreamSourceTest {
 }
   }
 
+  test("SPARK-23772 Ignore column of all null values or empty array during 
JSON schema inference") {
--- End diff --

@mengxr This test matches your intention described in the jira? (I just 
want to confirm this before I brush up the code).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 360 matches

Mail list logo