Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22333
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95683/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22334
**[Test build #95689 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95689/testReport)**
for PR 22334 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22333
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22334
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22333
**[Test build #95683 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95683/testReport)**
for PR 22333 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21669
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21669
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95682/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21669
**[Test build #95682 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95682/testReport)**
for PR 21669 at commit
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/22112
@tgravescs yes you are right about the problem here. Instead of asking
executors to remove old committed shuffle data, I prefer #6648 , which just
write new shuffle data with a different file
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95687/
Test FAILed.
---
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22324#discussion_r215111327
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/FileBasedDataSourceSuite.scala ---
@@ -473,6 +476,27 @@ class FileBasedDataSourceSuite extends
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22334
**[Test build #95687 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95687/testReport)**
for PR 22334 at commit
Github user hindog commented on the issue:
https://github.com/apache/spark/pull/17174
I believe another performance impact related to this may be attributed to
the `cast` operator failing to match during filter-pushdown, meaning that the
filter on the timestamp will NOT get pushed
Github user fangshil commented on the issue:
https://github.com/apache/spark/pull/21310
To summarize our discussion in this pr:
Spark-avro is now merged into Spark as a built-in data source. Upstream
community is not merging the AvroEncoder to support Avro types in Dataset,
Github user fangshil closed the pull request at:
https://github.com/apache/spark/pull/21310
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22324
ping @srowen @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/21638
Ideally the last test should have 50 partitions? is it because we really
need the test data to be at least 50 bytes? ideally a multiple of 50, I guess.
---
Github user bomeng commented on the issue:
https://github.com/apache/spark/pull/21638
Here is the test code, not sure it is right or not ---
```
test("Number of partitions") {
sc = new SparkContext(new
SparkConf().setAppName("test").setMaster("local")
Github user wangyum commented on a diff in the pull request:
https://github.com/apache/spark/pull/22320#discussion_r215106921
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala
---
@@ -56,7 +56,7 @@ case class
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22138
**[Test build #95688 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95688/testReport)**
for PR 22138 at commit
Github user tigerquoll commented on the issue:
https://github.com/apache/spark/pull/21306
So Kudu range partitions support arbitrary sized partition intervals, like
the example below, where the first and last range partition are six months in
size, but the middle partition is one
Github user tigerquoll commented on the issue:
https://github.com/apache/spark/pull/21306
Sure,
I am looking at the point of view of supporting Kudu. Check out
https://kudu.apache.org/docs/schema_design.html#partitioning for some of the
details. In particular
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/22332
I also can't find a strong reason to append a new API in `Dataset`... btw,
to add a new API there, you'd be better to discuss in jira before making a pr,
I think. cc: @rxin @cloud-fan @HyukjinKwon
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22334
**[Test build #95687 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95687/testReport)**
for PR 22334 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user zsxwing commented on the issue:
https://github.com/apache/spark/pull/22334
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95684/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22334
**[Test build #95684 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95684/testReport)**
for PR 22334 at commit
Github user rdblue commented on the issue:
https://github.com/apache/spark/pull/21308
@tigerquoll, what we come up with needs to work across a variety of data
sources, including those like JDBC that can delete at a lower granularity than
partition.
For Hive tables, the
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22298
Kubernetes integration test status success
URL:
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2848/
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22298
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22298
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/22282#discussion_r215092933
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala
---
@@ -88,7 +92,30 @@ private[kafka010] abstract
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22234
Did we introduce any behavior change in
https://github.com/apache/spark/pull/21273? Does this PR resolve it?
---
-
To
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22298
Kubernetes integration test starting
URL:
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2848/
---
Github user tigerquoll commented on the issue:
https://github.com/apache/spark/pull/21308
I am assuming this API was intended to support the "drop partition"
use-case. I'm arguing that adding and deleting partitions deal with a concept
that is a slightly higher concept than just a
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22298
**[Test build #95686 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95686/testReport)**
for PR 22298 at commit
Github user ifilonenko commented on the issue:
https://github.com/apache/spark/pull/22298
@felixcheung @holdenk I have moved the PySpark example files to a more
appropriate location. Any other comments before merge?
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22171
@vinodkc Could you answer the question from @cloud-fan ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22218
**[Test build #4331 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4331/testReport)**
for PR 22218 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22313
**[Test build #95685 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95685/testReport)**
for PR 22313 at commit
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22313
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22313
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22313
Retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22313
At this time, R failure.
```
DONE
===
Had test warnings or failures; see logs.
```
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22313
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22313
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95680/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22313
**[Test build #95680 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95680/testReport)**
for PR 22313 at commit
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/22138
@zsxwing
If it means code freeze for 2.4 is just around the corner then sure! We can
focus on blockers for releasing 2.4, and revisit this again. Let me reflect
@gaborgsomogyi review
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22333
Oh, I assumed that it's already dockerized. Sorry, never mind about that
@shaneknapp . And, thanks!
---
-
To unsubscribe,
Github user dbtsai commented on the issue:
https://github.com/apache/spark/pull/21756
add @jerryshao for more feedback. Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22334
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22334
**[Test build #95684 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95684/testReport)**
for PR 22334 at commit
GitHub user zsxwing opened a pull request:
https://github.com/apache/spark/pull/22334
[SPARK-25336][SS]Revert SPARK-24863 and SPARK 24748
## What changes were proposed in this pull request?
Revert SPARK-24863 and SPARK 24748 as per discussion in #21721. We will
revisit
Github user huaxingao commented on the issue:
https://github.com/apache/spark/pull/20442
Any more comments? @MLnick @jkbradley
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user wmellouli commented on the issue:
https://github.com/apache/spark/pull/22332
@mgaido91 Thank you for your suggestion, I updated the PR name, description
and sources with a new version using a parameter `atPosition` instead of a flag
`atTheEnd`. Let me know what you think
Github user shaneknapp commented on the issue:
https://github.com/apache/spark/pull/22333
moving any parts of the spark build infrastructure to use docker is a big
project and not happening in the next few months.
---
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
yeah you would have to be able to handle network partitioning somehow. I
don't know how difficult it is but its definitely work we may not want to do
here. I was trying to clarify and make
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/22112#discussion_r215070653
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
---
@@ -1513,37 +1513,34 @@ private[spark] class DAGScheduler(
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22333
**[Test build #95683 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95683/testReport)**
for PR 22333 at commit
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22333
Hi, @shaneknapp and @srowen .
Can we build and use the zinc-installed docker images in our build system?
-
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22333
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user ifilonenko commented on the issue:
https://github.com/apache/spark/pull/22145
this PR is waiting on @shaneknapp to migrate to ubuntu and have R setup in
the node responsible for distribution building. This was planning on being done
right after the 2.4 cut.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21669
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/21669
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22333
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21669
Kubernetes integration test status success
URL:
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2844/
---
GitHub user dongjoon-hyun opened a pull request:
https://github.com/apache/spark/pull/22333
[SPARK-25335][BUILD] Skip Zinc downloading if it's installed in the system
## What changes were proposed in this pull request?
Zinc is 23.5MB.
```
$ curl -LO
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21669
Kubernetes integration test starting
URL:
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2844/
---
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/22112
> So in order to fix that we would need a way to tell the executors to
remove that older committed shuffle data
@tgravescs It is also hard to implement such a robust solution for
Github user ifilonenko commented on the issue:
https://github.com/apache/spark/pull/21669
This PR has been tested and passed on a local cluster with an integration
test that will be merged in a follow-up PR. It passes all three configuration
options. It is now in a state that is
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/21669
**[Test build #95682 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95682/testReport)**
for PR 21669 at commit
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22192
**[Test build #95681 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95681/testReport)**
for PR 22192 at commit
Github user bersprockets commented on the issue:
https://github.com/apache/spark/pull/22192
retest this please.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:
Github user ifilonenko commented on a diff in the pull request:
https://github.com/apache/spark/pull/21669#discussion_r215057079
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -164,7 +164,15 @@ private[spark] class SparkSubmit extends Logging {
Github user ankuriitg commented on a diff in the pull request:
https://github.com/apache/spark/pull/22209#discussion_r215056633
--- Diff:
core/src/test/scala/org/apache/spark/status/AppStatusListenerSuite.scala ---
@@ -1190,6 +1190,61 @@ class AppStatusListenerSuite extends
Github user liyinan926 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22323#discussion_r215040572
--- Diff:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/LocalDirsFeatureStep.scala
---
@@ -22,6 +22,7 @@ import
Github user liyinan926 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22323#discussion_r215041417
--- Diff:
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/Config.scala
---
@@ -225,6 +225,15 @@ private[spark] object
Github user liyinan926 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22323#discussion_r215055594
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
Github user liyinan926 commented on a diff in the pull request:
https://github.com/apache/spark/pull/22323#discussion_r215040130
--- Diff: docs/running-on-kubernetes.md ---
@@ -215,6 +215,19 @@
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkpointpvc.options.clai
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215038606
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215037240
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageFileFormat.scala ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215039097
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala
---
@@ -567,6 +567,7 @@ object DataSource extends
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215036263
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215037968
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/image/ImageFileFormatSuite.scala
---
@@ -0,0 +1,119 @@
+/*
+ * Licensed to the
Github user jaceklaskowski commented on a diff in the pull request:
https://github.com/apache/spark/pull/22328#discussion_r215036643
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/image/ImageDataSource.scala ---
@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache
Github user dhruve commented on a diff in the pull request:
https://github.com/apache/spark/pull/22288#discussion_r215036162
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -414,9 +425,54 @@ private[spark] class TaskSchedulerImpl(
Github user jaceklaskowski commented on the issue:
https://github.com/apache/spark/pull/22332
Why not `select($"*", newColumnHere)` or `select(newColumnHere, $"*")`?
Somehow I don't think the use case merits overloading `withColumn`.
---
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22179
And, @wangyum . Please add `[SPARK-25258]` to the PR title like
`[SPARK-25258][SPARK-23131][SPARK-25176]`. SPARK-23131 is the one you created
for this PR.
Also, the PR description
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/21638#discussion_r215030825
--- Diff:
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class
Github user dongjoon-hyun commented on the issue:
https://github.com/apache/spark/pull/22179
Although this will give us a different Kryo version (not Hive, ORC), the
newly added test cases show the benefit clearly. Also, I checked two new test
cases with/without this PR. It looks
Github user tgravescs commented on the issue:
https://github.com/apache/spark/pull/22112
ok for anyone else trying, I was able to reproduce this consistently with
the following code, adding in more repartitions. I have blacklisting, dynamic
allocation, and external shuffle service
Github user rxin commented on the issue:
https://github.com/apache/spark/pull/21721
Given the uncertainty about how this works across batch, streaming, and CP,
and given we are still flushing out the main APIs, I think we should revert
this, and revisit when the main APIs are done.
Github user bomeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/21638#discussion_r215022562
--- Diff:
core/src/main/scala/org/apache/spark/input/PortableDataStream.scala ---
@@ -47,7 +47,7 @@ private[spark] abstract class StreamFileInputFormat[T]
Github user huaxingao commented on a diff in the pull request:
https://github.com/apache/spark/pull/22295#discussion_r215022091
--- Diff: python/pyspark/sql/session.py ---
@@ -252,6 +252,16 @@ def newSession(self):
"""
return self.__class__(self._sc,
101 - 200 of 583 matches
Mail list logo