Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20648
> allows the length of tokens are shorter than its schema, putting nulls
(or NA) into missing fields
Actually I also recalled this is a valid case for csv, and I remember that
we did this
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170279504
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala
---
@@ -64,6 +69,41 @@ class KafkaRDDSuite extends Spar
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170278317
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala
---
@@ -162,17 +162,22 @@ private[kafka010] class Kafk
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170278931
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -71,25 +69,62 @@ class CachedKafkaConsumer
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170278685
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/mocks/MockScheduler.scala
---
@@ -0,0 +1,100 @@
+/*
+ * Licensed to
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170279950
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala
---
@@ -64,6 +69,41 @@ class KafkaRDDSuite extends Spar
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170278078
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala
---
@@ -64,6 +69,41 @@ class KafkaRDDSuite extends Spar
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170277915
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaRDD.scala
---
@@ -172,57 +187,138 @@ private[spark] class KafkaRDD[K,
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r170279150
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaRDD.scala
---
@@ -87,47 +89,60 @@ private[spark] class KafkaRDD[K, V](
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20664#discussion_r170279656
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext {
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20553
IIUC the `spark.kubernetes.executor.cores` here is just a special case for
`spark.executor.cores`, for k8s backend, you shall still have to handle float
values if you're to read the value of `sp
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20553
also cc @cloud-fan @jerryshao
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-
Github user jiangxb1987 commented on the issue:
https://github.com/apache/spark/pull/20553
How do we plan to support dynamic allocation with k8s? Should we read
`spark.executor.cores` or `spark.kubernetes.executor.cores` ?
---
Github user ala commented on a diff in the pull request:
https://github.com/apache/spark/pull/20664#discussion_r170277224
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext {
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
+1 for disallowing it anyway if it was Wenchen's opinion too. Please go
ahead. Will help double check anyway.
---
-
To unsub
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
Yup, +1 for starting this by disallowing but up to my knowledge R's
read.csv allows then the legnth of tokens are shorter then its schema, putting
nulls (or NA) into missing fields, as a valid c
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20664#discussion_r170271696
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with
SharedSparkContext {
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
@HyukjinKwon From the document of `DataFrameReader.csv`, the behavior of
CSV reader isn't consistent with the document.
```
`PERMISSIVE` : sets other fields to `null` when it meets a corr
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87630/
Test PASSed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20662
**[Test build #87630 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87630/testReport)**
for PR 20662 at commit
[`bab27e6`](https://github.com/apache/spark/commit/b
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20664
**[Test build #87632 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87632/testReport)**
for PR 20664 at commit
[`6d67dfc`](https://github.com/apache/spark/commit/6d
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20664
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1019/
Tes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20664
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user ala commented on the issue:
https://github.com/apache/spark/pull/20664
@hvanhovell
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.a
GitHub user ala opened a pull request:
https://github.com/apache/spark/pull/20664
[SPARK-23496][CORE] Locality of coalesced partitions can be severely skewed
by the order of input partitions
## What changes were proposed in this pull request?
The algorithm in `DefaultPartit
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
Yup, it's unsupported in JSON but CSV supports it. Do you mean to disallow
CSV too, or simply clean up JSON code path?
---
-
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/20649#discussion_r170265519
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/StructTypeSuite.scala ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
I'll close this PR and create another PR to refactor JSON parser and
related codes. Thanks @cloud-fan and @HyukjinKwon.
---
-
To
Github user viirya closed the pull request at:
https://github.com/apache/spark/pull/20648
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/20648
According to offline discussion with @cloud-fan, partial results are not
supported at all now. We should refactor the code to clear it and reduce
confusion.
---
---
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20648
How about we start this by disallowing the partial results at all,
documenting the behaviour and matching the behaviour to R's `read.csv(...)` in
case of CSV (in terms of which case is an error
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20649#discussion_r170256168
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -297,7 +300,9 @@ case class StructType(fields: Array[StructField])
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20649#discussion_r170256000
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -271,7 +271,9 @@ case class StructType(fields: Array[StructField])
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20649#discussion_r170256107
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -284,7 +286,8 @@ case class StructType(fields: Array[StructField])
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20649#discussion_r170256030
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala ---
@@ -284,7 +286,8 @@ case class StructType(fields: Array[StructField])
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/20649#discussion_r170256385
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/StructTypeSuite.scala ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundatio
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20663
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/20653
LGTM
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apac
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20663
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1018/
Tes
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20663
**[Test build #87631 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87631/testReport)**
for PR 20663 at commit
[`d246df2`](https://github.com/apache/spark/commit/d2
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/20663
cc @vanzin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spa
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/20663
[SPARK-23475][UI][FOLLOWUP] Refactor AllStagesPage in order to avoid
redundant code
## What changes were proposed in this pull request?
As suggested in #20651, the code is very redundant
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87628/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20662
**[Test build #87628 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87628/testReport)**
for PR 20662 at commit
[`619a371`](https://github.com/apache/spark/commit/6
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20611
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20611
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87629/
Test FAILed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20611
**[Test build #87629 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87629/testReport)**
for PR 20611 at commit
[`22c71dd`](https://github.com/apache/spark/commit/2
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20648
I think we do have an intention to return partial result, but there is no
strict definition for it, and seems there is no public document, so it's kind
of a new feature.
Since this is a n
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20624#discussion_r170226448
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
---
@@ -387,6 +390,101 @@ case class CatalogStatistics(
}
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20624#discussion_r170226982
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
---
@@ -387,6 +390,101 @@ case class CatalogStatistics(
}
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20662
**[Test build #87630 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87630/testReport)**
for PR 20662 at commit
[`bab27e6`](https://github.com/apache/spark/commit/ba
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1017/
Tes
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20611
**[Test build #87629 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87629/testReport)**
for PR 20611 at commit
[`22c71dd`](https://github.com/apache/spark/commit/22
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170208631
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
---
@@ -77,31 +79,32 @@ class MicroBatchExecution(
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170208349
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala
---
@@ -23,11 +23,11 @@ import org.apache.spar
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20662
**[Test build #87628 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87628/testReport)**
for PR 20662 at commit
[`619a371`](https://github.com/apache/spark/commit/61
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20662
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1016/
Tes
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/20651
thanks @vanzin I created the PR for the backport. Thanks.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/20662
cc @vanzin
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spa
GitHub user mgaido91 opened a pull request:
https://github.com/apache/spark/pull/20662
[SPARK-23475][UI][BACKPORT-2.3] Show also skipped stages
## What changes were proposed in this pull request?
SPARK-20648 introduced the status `SKIPPED` for the stages. On the UI,
previou
Github user lonehacker commented on the issue:
https://github.com/apache/spark/pull/10942
Thanks @cloud-fan . I couldn't find a JIRA that tracks this feature, could
you help with that? This feature is very important for our use case, so it
would be great to get any info on when this w
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20658
**[Test build #87627 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87627/testReport)**
for PR 20658 at commit
[`d7e03cd`](https://github.com/apache/spark/commit/d
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20658
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87627/
Test FAILed.
---
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20658
Merged build finished. Test FAILed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20647
Merged build finished. Test PASSed.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional comma
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20658
**[Test build #87627 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87627/testReport)**
for PR 20658 at commit
[`d7e03cd`](https://github.com/apache/spark/commit/d7
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20647
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87624/
Test PASSed.
---
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20647
**[Test build #87624 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87624/testReport)**
for PR 20647 at commit
[`dbee281`](https://github.com/apache/spark/commit/d
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20658
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@s
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/20647#discussion_r170185948
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
---
@@ -77,31 +79,32 @@ class MicroBatchExecution(
101 - 174 of 174 matches
Mail list logo