date:20200113

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366189514
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
+
+  // Mapping of field indexes of `parsedSchema` to indexes of `requiredSchema`.
+  // It returns -1 if `requiredSchema` doesn't contain a field from 
`parsedSchema`.
 
 Review comment:
   If `requiredSchema` always contains filter references, it is significant 
assumption, and it can simplify this implementation slightly. Is it just 
specific of current implementation or `requiredSchema` could contain **really 
required** column in the future? because filter columns are not actually 
required if the filter is applied only once in a datasource.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] 
Respect filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574045415
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-574045429
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116675/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-574045429
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116675/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-574045420
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect 
filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574045422
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21468/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] 
Respect filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574045422
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21468/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect 
filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574045415
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26805: [SPARK-15616][SQL] Add 
optimizer rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-574045420
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-13 Thread GitBox

SparkQA commented on issue #26805: [SPARK-15616][SQL] Add optimizer rule 
PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-574044885
 
 
   **[Test build #116675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116675/testReport)**
 for PR 26805 at commit 
[`3de6491`](https://github.com/apache/spark/commit/3de6491597f2d1b8a011df707d643fba5b14fce8).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

SparkQA commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter 
in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574044887
 
 
   **[Test build #116689 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116689/testReport)**
 for PR 27198 at commit 
[`cc658c9`](https://github.com/apache/spark/commit/cc658c99b4633fe49b14990b80be672b11f62ced).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer rule PruneHiveTablePartitions

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #26805: [SPARK-15616][SQL] Add optimizer 
rule PruneHiveTablePartitions
URL: https://github.com/apache/spark/pull/26805#issuecomment-573980935
 
 
   **[Test build #116675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116675/testReport)**
 for PR 26805 at commit 
[`3de6491`](https://github.com/apache/spark/commit/3de6491597f2d1b8a011df707d643fba5b14fce8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #26102: [SPARK-29448][SQL] Support the `INTERVAL` type by Parquet datasource

2020-01-13 Thread GitBox

MaxGekk commented on issue #26102: [SPARK-29448][SQL] Support the `INTERVAL` 
type by Parquet datasource
URL: https://github.com/apache/spark/pull/26102#issuecomment-574044689
 
 
   @cloud-fan Should I close this PR?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26868: [SPARK-29665][SQL] refine the 
TableProvider interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-574044182
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26868: [SPARK-29665][SQL] refine the 
TableProvider interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-574044191
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116674/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26868: [SPARK-29665][SQL] refine the 
TableProvider interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-574044191
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116674/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26868: [SPARK-29665][SQL] refine the 
TableProvider interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-574044182
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

2020-01-13 Thread GitBox

SparkQA commented on issue #26868: [SPARK-29665][SQL] refine the TableProvider 
interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-574043654
 
 
   **[Test build #116674 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116674/testReport)**
 for PR 26868 at commit 
[`ffb0b29`](https://github.com/apache/spark/commit/ffb0b2984a74f173c42723c3ae567668667e3fd7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `  implicit class PartitionTypeHelper(colNames: Seq[String]) `
 * `trait SimpleTableProvider extends TableProvider `
 * `class NoopDataSource extends SimpleTableProvider with 
DataSourceRegister `
 * `class RateStreamProvider extends SimpleTableProvider with 
DataSourceRegister `
 * `class TextSocketSourceProvider extends SimpleTableProvider with 
DataSourceRegister with Logging `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26868: [SPARK-29665][SQL] refine the TableProvider interface

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #26868: [SPARK-29665][SQL] refine the 
TableProvider interface
URL: https://github.com/apache/spark/pull/26868#issuecomment-573980923
 
 
   **[Test build #116674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116674/testReport)**
 for PR 26868 at commit 
[`ffb0b29`](https://github.com/apache/spark/commit/ffb0b2984a74f173c42723c3ae567668667e3fd7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate 
Avro option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574040718
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate Avro 
option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574037384
 
 
   **[Test build #116687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116687/testReport)**
 for PR 27194 at commit 
[`bfbaf2b`](https://github.com/apache/spark/commit/bfbaf2b4eb17cfc0457a8c710003fc59138bace5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate 
Avro option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574040733
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116687/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

SparkQA commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option 
`ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574040611
 
 
   **[Test build #116687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116687/testReport)**
 for PR 27194 at commit 
[`bfbaf2b`](https://github.com/apache/spark/commit/bfbaf2b4eb17cfc0457a8c710003fc59138bace5).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro 
option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574040733
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116687/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro 
option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574040718
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] beliefer commented on a change in pull request #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

beliefer commented on a change in pull request #27198: 
[SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of 
AggregateExpression
URL: https://github.com/apache/spark/pull/27198#discussion_r366183077
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/interfaces.scala
 ##
 @@ -149,10 +149,24 @@ case class AggregateExpression(
   case PartialMerge => "merge_"
   case Final | Complete => ""
 }
-prefix + aggregateFunction.toAggString(isDistinct)
+val aggFuncStr = prefix + aggregateFunction.toAggString(isDistinct)
+mode match {
+  case Partial | Complete if filter.isDefined =>
 
 Review comment:
   I mean the mode is `PartialMerge`.
   If we only need to show filter in physical plans after rewrite, this is OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mickjermsurawong-stripe commented on issue #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

mickjermsurawong-stripe commented on issue #27153: [SPARK-20384][SQL] Support 
value class in schema of Dataset (without breaking existing current projection)
URL: https://github.com/apache/spark/pull/27153#issuecomment-574038632
 
 
   Thanks @mt40!
   Also added more tests on filtering on dataframe c23705d to illustrate the 
expected schemas more explicitly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] mickjermsurawong-stripe commented on a change in pull request #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

mickjermsurawong-stripe commented on a change in pull request #27153: 
[SPARK-20384][SQL] Support value class in schema of Dataset (without breaking 
existing current projection)
URL: https://github.com/apache/spark/pull/27153#discussion_r366181241
 
 

 ##
 File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala
 ##
 @@ -679,6 +679,33 @@ class DataFrameSuite extends QueryTest with 
SharedSparkSession {
 assert(df.schema.map(_.name) === Seq("key", "valueRenamed", "newCol"))
   }
 
+  test("Value class filter") {
+val df = spark.sparkContext
+  .parallelize(Seq(StringWrapper("a"), StringWrapper("b"), 
StringWrapper("c")))
+  .toDF()
+val filtered = df.where("s = \"a\"")
+checkAnswer(filtered, 
spark.sparkContext.parallelize(Seq(StringWrapper("a"))).toDF)
+  }
+
+  test("Array value class filter") {
+val ab = ArrayStringWrapper(Seq(StringWrapper("a"), StringWrapper("b")))
+val cd = ArrayStringWrapper(Seq(StringWrapper("c"), StringWrapper("d")))
+
+val df = spark.sparkContext.parallelize(Seq(ab, cd)).toDF
+val filtered = df.where(array_contains(col("wrappers.s"), "b"))
+checkAnswer(filtered, spark.sparkContext.parallelize(Seq(ab)).toDF)
+  }
+
+  test("Nested value class filter") {
+val a = ContainerStringWrapper(StringWrapper("a"))
+val b = ContainerStringWrapper(StringWrapper("b"))
+
+val df = spark.sparkContext.parallelize(Seq(a, b)).toDF
+// flat value class, `s` field is not in schema
+val filtered = df.where("wrapper = \"a\"")
+checkAnswer(filtered, spark.sparkContext.parallelize(Seq(a)).toDF)
 
 Review comment:
   Before this change we never support nested value class: 
   - Filter with `wrapper` would break with
   ```
   org.apache.spark.sql.AnalysisException: cannot resolve '(`wrapper` = 'a')' 
due to data type mismatch: differing types in '(`wrapper` = 'a')' 
(struct and string).; line 1 pos 0;
   ```
   - Filter with `wrapper.s` would break with:
   ```
   java.lang.ClassCastException: java.lang.String cannot be cast to 
org.apache.spark.sql.test.SQLTestData$StringWrapper
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27153: [SPARK-20384][SQL] Support 
value class in schema of Dataset (without breaking existing current projection)
URL: https://github.com/apache/spark/pull/27153#issuecomment-574037786
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21467/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate 
Avro option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574037807
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21466/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro 
option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574037801
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27153: [SPARK-20384][SQL] Support value class 
in schema of Dataset (without breaking existing current projection)
URL: https://github.com/apache/spark/pull/27153#issuecomment-57403
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27153: [SPARK-20384][SQL] Support 
value class in schema of Dataset (without breaking existing current projection)
URL: https://github.com/apache/spark/pull/27153#issuecomment-57403
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27153: [SPARK-20384][SQL] Support value class 
in schema of Dataset (without breaking existing current projection)
URL: https://github.com/apache/spark/pull/27153#issuecomment-574037786
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21467/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro 
option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574037807
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21466/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27194: [SPARK-30505][DOCS] Deprecate 
Avro option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574037801
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27153: [SPARK-20384][SQL] Support value class in schema of Dataset (without breaking existing current projection)

2020-01-13 Thread GitBox

SparkQA commented on issue #27153: [SPARK-20384][SQL] Support value class in 
schema of Dataset (without breaking existing current projection)
URL: https://github.com/apache/spark/pull/27153#issuecomment-574037400
 
 
   **[Test build #116688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116688/testReport)**
 for PR 27153 at commit 
[`c23705d`](https://github.com/apache/spark/commit/c23705d9715efa16d73491191143c9f802bdff51).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

SparkQA commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option 
`ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574037384
 
 
   **[Test build #116687 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116687/testReport)**
 for PR 27194 at commit 
[`bfbaf2b`](https://github.com/apache/spark/commit/bfbaf2b4eb17cfc0457a8c710003fc59138bace5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366180228
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
+
+  // Mapping of field indexes of `parsedSchema` to indexes of `requiredSchema`.
+  // It returns -1 if `requiredSchema` doesn't contain a field from 
`parsedSchema`.
 
 Review comment:
   It should read the column referenced by filters from the source because we 
filter it one more time within Spark side - that referenced column should be 
read from the source.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366180369
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
 
 Review comment:
   You can just make it public and remove `private[sql]`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366180228
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
+
+  // Mapping of field indexes of `parsedSchema` to indexes of `requiredSchema`.
+  // It returns -1 if `requiredSchema` doesn't contain a field from 
`parsedSchema`.
 
 Review comment:
   It should read the column referenced by filters from the source because we 
filter it one more time within Spark side.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on issue #23859: [SPARK-26956][SS] remove streaming output mode from data source v2 APIs

2020-01-13 Thread GitBox

HeartSaVioR edited a comment on issue #23859: [SPARK-26956][SS] remove 
streaming output mode from data source v2 APIs
URL: https://github.com/apache/spark/pull/23859#issuecomment-574034418
 
 
   Btw, does the concern based on the real world workload? Because I cannot 
imagine "complete mode" works with decent amount of traffic, especially you're 
running the query for long time. "complete mode" means you cannot evict any 
state regardless of watermark, which won't make sense except you have finite 
set of group key (if then the cardinality of group keys will define the overall 
size of state).
   
   > If my assumption is right aren't we going back to Dstream behaviour of 
applying window transformation over the batch interval?
   
   That's why "state" comes into play in structured streaming. The state 
retains the values across micro-batches, "windows" in case of window 
transformations.
   
   In fact, as previous comments in this PR stated already, the only mode works 
without any tweak in production is append mode. In update mode you can tweak 
with custom sink to make it correctly upsert with the output, but there's no 
API to define "group keys" in existing sinks.
   (Even we have defined the group keys for sink side, it would be odd if we 
cannot guarantee the keys are same being used to group the data and do the 
stateful operation. It's semantically incorrect.)
   
   Btw, the streaming output mode is all about how to emit output for the 
stateful operation. If you don't do any stateful operation, output mode is 
no-op.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] 
Respect filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574034248
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116684/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on issue #23859: [SPARK-26956][SS] remove streaming output mode from data source v2 APIs

2020-01-13 Thread GitBox

HeartSaVioR commented on issue #23859: [SPARK-26956][SS] remove streaming 
output mode from data source v2 APIs
URL: https://github.com/apache/spark/pull/23859#issuecomment-574034418
 
 
   Btw, does the concern based on the real world workload? Because I cannot 
imagine "complete mode" works with decent amount of traffic, especially you're 
running the query for long time. "complete mode" means you cannot evict any 
state regardless of watermark, which won't make sense except you have finite 
set of group key (if then the cardinality of group keys will define the overall 
size of state).
   
   > If my assumption is right aren't we going back to Dstream behaviour of 
applying window transformation over the batch interval?
   
   That's why "state" comes into play in structured streaming. The state 
retains the values across micro-batches, "windows" in case of window 
transformations.
   
   In fact, as previous comments in this PR stated already, the only mode works 
without any tweak in production is append mode. In update mode you can tweak 
with custom sink to make it correctly upsert with the output, but there's no 
API to define "group keys" in existing sinks.
   
   Btw, the streaming output mode is all about how to emit output for the 
stateful operation. If you don't do any stateful operation, output mode is 
no-op.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect 
filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574010302
 
 
   **[Test build #116684 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116684/testReport)**
 for PR 27198 at commit 
[`e54839b`](https://github.com/apache/spark/commit/e54839bab74b1b98048b82e1f3122883f390).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27198: [SPARK-27986][SQL][FOLLOWUP] 
Respect filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574034245
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

gengliangwang commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro 
option `ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574034462
 
 
   @MaxGekk  I see. I didn't know that the option `pathGlobFilter` is 
documented in DataFrameReader.scala.
   
   I think reverting the description of the option pathGlobFilter in 
`sql-data-sources-avro.md` makes sense.
   
   Still, we need to document the options in SPARK-30506. I will find someone 
else or do it myself.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect 
filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574034248
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116684/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect 
filter in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574034245
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter in sql/toString of AggregateExpression

2020-01-13 Thread GitBox

SparkQA commented on issue #27198: [SPARK-27986][SQL][FOLLOWUP] Respect filter 
in sql/toString of AggregateExpression
URL: https://github.com/apache/spark/pull/27198#issuecomment-574034177
 
 
   **[Test build #116684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116684/testReport)**
 for PR 27198 at commit 
[`e54839b`](https://github.com/apache/spark/commit/e54839bab74b1b98048b82e1f3122883f390).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366177956
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
 
 Review comment:
   Just for the context, originally I made it private but had to make it more 
open because it is used in other places in `sql`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #27166: [SPARK-30482][SQL][CORE][TESTS] Add sub-class of `AppenderSkeleton` reusable in tests

2020-01-13 Thread GitBox

maropu commented on issue #27166: [SPARK-30482][SQL][CORE][TESTS] Add sub-class 
of `AppenderSkeleton` reusable in tests
URL: https://github.com/apache/spark/pull/27166#issuecomment-574033880
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu closed pull request #27166: [SPARK-30482][SQL][CORE][TESTS] Add sub-class of `AppenderSkeleton` reusable in tests

2020-01-13 Thread GitBox

maropu closed pull request #27166: [SPARK-30482][SQL][CORE][TESTS] Add 
sub-class of `AppenderSkeleton` reusable in tests
URL: https://github.com/apache/spark/pull/27166
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #27166: [SPARK-30482][SQL][CORE][TESTS] Add sub-class of `AppenderSkeleton` reusable in tests

2020-01-13 Thread GitBox

maropu commented on issue #27166: [SPARK-30482][SQL][CORE][TESTS] Add sub-class 
of `AppenderSkeleton` reusable in tests
URL: https://github.com/apache/spark/pull/27166#issuecomment-574033453
 
 
   yea, I will, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366177325
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
 
 Review comment:
   Do you propose just `private`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier 
and catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574032554
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116673/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26961: [SPARK-29708][SQL] Correct aggregated values when grouping sets are duplicated

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26961: [SPARK-29708][SQL] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/26961#issuecomment-574032924
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21465/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier 
and catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574032548
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26961: [SPARK-29708][SQL] Correct aggregated values when grouping sets are duplicated

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26961: [SPARK-29708][SQL] Correct 
aggregated values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/26961#issuecomment-574032919
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366177325
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
 
 Review comment:
   Do you propose just `private`? or public?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26961: [SPARK-29708][SQL] Correct aggregated values when grouping sets are duplicated

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26961: [SPARK-29708][SQL] Correct aggregated 
values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/26961#issuecomment-574032924
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/21465/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26961: [SPARK-29708][SQL] Correct aggregated values when grouping sets are duplicated

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26961: [SPARK-29708][SQL] Correct aggregated 
values when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/26961#issuecomment-574032919
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26961: [SPARK-29708][SQL] Correct aggregated values when grouping sets are duplicated

2020-01-13 Thread GitBox

SparkQA commented on issue #26961: [SPARK-29708][SQL] Correct aggregated values 
when grouping sets are duplicated
URL: https://github.com/apache/spark/pull/26961#issuecomment-574032515
 
 
   **[Test build #116686 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116686/testReport)**
 for PR 26961 at commit 
[`cdcc4d0`](https://github.com/apache/spark/commit/cdcc4d017645df5e8cfd9775b60444b443e3bf82).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and 
catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574032554
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116673/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and 
catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574032548
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366176665
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -204,15 +234,15 @@ class UnivocityParser(
* Parses a single CSV string and turns it into either one resulting row or 
no row (if the
* the record is malformed).
*/
-  def parse(input: String): InternalRow = doParse(input)
+  def parse(input: String): Seq[InternalRow] = doParse(input)
 
   private val getToken = if (options.columnPruning) {
 (tokens: Array[String], index: Int) => tokens(index)
   } else {
 (tokens: Array[String], index: Int) => tokens(tokenIndexArr(index))
   }
 
-  private def convert(tokens: Array[String]): InternalRow = {
+  private def convert(tokens: Array[String]): Seq[InternalRow] = {
 
 Review comment:
   Yup, It should be fine to do it separately


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #26957: [SPARK-30314] Add identifier and 
catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-573975813
 
 
   **[Test build #116673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116673/testReport)**
 for PR 26957 at commit 
[`42bf872`](https://github.com/apache/spark/commit/42bf8722f3634c0975db89af8dfcd3373a12e5f7).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366176019
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -204,15 +234,15 @@ class UnivocityParser(
* Parses a single CSV string and turns it into either one resulting row or 
no row (if the
* the record is malformed).
*/
-  def parse(input: String): InternalRow = doParse(input)
+  def parse(input: String): Seq[InternalRow] = doParse(input)
 
   private val getToken = if (options.columnPruning) {
 (tokens: Array[String], index: Int) => tokens(index)
   } else {
 (tokens: Array[String], index: Int) => tokens(tokenIndexArr(index))
   }
 
-  private def convert(tokens: Array[String]): InternalRow = {
+  private def convert(tokens: Array[String]): Seq[InternalRow] = {
 
 Review comment:
   The required change in `FailureSafeParser` is one liner. I think we should 
use `Option` here instead of `Seq` ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

SparkQA commented on issue #26957: [SPARK-30314] Add identifier and catalog 
information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574032046
 
 
   **[Test build #116673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116673/testReport)**
 for PR 26957 at commit 
[`42bf872`](https://github.com/apache/spark/commit/42bf8722f3634c0975db89af8dfcd3373a12e5f7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366176306
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/CSVFilters.scala
 ##
 @@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.csv
+
+import scala.util.Try
+
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.sources
+import org.apache.spark.sql.types.{BooleanType, StructType}
+
+/**
+ * An instance of the class compiles filters to predicates and allows to
+ * apply the predicates to an internal row with partially initialized values
+ * converted from parsed CSV fields.
+ *
+ * @param filters The filters pushed down to CSV datasource.
+ * @param dataSchema The full schema with all fields in CSV files.
+ * @param requiredSchema The schema with only fields requested by the upper 
layer.
+ * @param columnPruning true if CSV parser can read sub-set of columns 
otherwise false.
+ */
+class CSVFilters(
+filters: Seq[sources.Filter],
+dataSchema: StructType,
+requiredSchema: StructType,
+columnPruning: Boolean) {
+  require(checkFilters(), "All filters must be applicable to the data schema.")
 
 Review comment:
   Initially the function was slightly complex, see 
https://github.com/apache/spark/pull/26973/commits/f0aa0a88bfa0c87007f8781ba7fac8f9cd3057ba#diff-44a98c4a53980cb04e57f0489b257a37L126
 . That's why I extracted it to a separate method.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26975: [SPARK-30325][CORE] markPartitionCompleted cause task status inconsistent

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-574031548
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

HyukjinKwon commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366176019
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -204,15 +234,15 @@ class UnivocityParser(
* Parses a single CSV string and turns it into either one resulting row or 
no row (if the
* the record is malformed).
*/
-  def parse(input: String): InternalRow = doParse(input)
+  def parse(input: String): Seq[InternalRow] = doParse(input)
 
   private val getToken = if (options.columnPruning) {
 (tokens: Array[String], index: Int) => tokens(index)
   } else {
 (tokens: Array[String], index: Int) => tokens(tokenIndexArr(index))
   }
 
-  private def convert(tokens: Array[String]): InternalRow = {
+  private def convert(tokens: Array[String]): Seq[InternalRow] = {
 
 Review comment:
   The required change in `FailureSafeParser` is one liner. I think we should 
use `Option` here instead of `Seq` ...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26975: [SPARK-30325][CORE] markPartitionCompleted cause task status inconsistent

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-574031555
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116679/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26975: [SPARK-30325][CORE] markPartitionCompleted cause task status inconsistent

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-574031548
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26975: [SPARK-30325][CORE] markPartitionCompleted cause task status inconsistent

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-574031555
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116679/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26975: [SPARK-30325][CORE] markPartitionCompleted cause task status inconsistent

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #26975: [SPARK-30325][CORE] 
markPartitionCompleted cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-573986167
 
 
   **[Test build #116679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116679/testReport)**
 for PR 26975 at commit 
[`92e30c3`](https://github.com/apache/spark/commit/92e30c3d4ae52a8ce78bc10328446c5149a38e55).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26975: [SPARK-30325][CORE] markPartitionCompleted cause task status inconsistent

2020-01-13 Thread GitBox

SparkQA commented on issue #26975: [SPARK-30325][CORE] markPartitionCompleted 
cause task status inconsistent
URL: https://github.com/apache/spark/pull/26975#issuecomment-574030980
 
 
   **[Test build #116679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116679/testReport)**
 for PR 26975 at commit 
[`92e30c3`](https://github.com/apache/spark/commit/92e30c3d4ae52a8ce78bc10328446c5149a38e55).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366175302
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -204,15 +234,15 @@ class UnivocityParser(
* Parses a single CSV string and turns it into either one resulting row or 
no row (if the
* the record is malformed).
*/
-  def parse(input: String): InternalRow = doParse(input)
+  def parse(input: String): Seq[InternalRow] = doParse(input)
 
   private val getToken = if (options.columnPruning) {
 (tokens: Array[String], index: Int) => tokens(index)
   } else {
 (tokens: Array[String], index: Int) => tokens(tokenIndexArr(index))
   }
 
-  private def convert(tokens: Array[String]): InternalRow = {
+  private def convert(tokens: Array[String]): Seq[InternalRow] = {
 
 Review comment:
   We can do that but modification of `FailureSafeParser` is slightly 
orthogonal to the purpose of the PR, and it is not necessary for this changes. 
Can we do that separately? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier 
and catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574030333
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26957: [SPARK-30314] Add identifier 
and catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574030343
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116672/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and 
catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574030333
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #26957: [SPARK-30314] Add identifier and 
catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574030343
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116672/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox

MaxGekk commented on a change in pull request #26973: [SPARK-30323][SQL] 
Support filters pushdown in CSV datasource
URL: https://github.com/apache/spark/pull/26973#discussion_r366174594
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/csv/UnivocityParser.scala
 ##
 @@ -39,27 +40,49 @@ import org.apache.spark.unsafe.types.UTF8String
  * @param requiredSchema The schema of the data that should be output for each 
row. This should be a
  *   subset of the columns in dataSchema.
  * @param options Configuration options for a CSV parser.
+ * @param filters The pushdown filters that should be applied to converted 
values.
  */
 class UnivocityParser(
 dataSchema: StructType,
 requiredSchema: StructType,
-val options: CSVOptions) extends Logging {
+val options: CSVOptions,
+filters: Seq[Filter]) extends Logging {
   require(requiredSchema.toSet.subsetOf(dataSchema.toSet),
 s"requiredSchema (${requiredSchema.catalogString}) should be the subset of 
" +
   s"dataSchema (${dataSchema.catalogString}).")
 
+  def this(dataSchema: StructType, requiredSchema: StructType, options: 
CSVOptions) = {
+this(dataSchema, requiredSchema, options, Seq.empty)
+  }
   def this(schema: StructType, options: CSVOptions) = this(schema, schema, 
options)
 
   // A `ValueConverter` is responsible for converting the given value to a 
desired type.
   private type ValueConverter = String => Any
 
+  private val csvFilters = new CSVFilters(
+filters,
+dataSchema,
+requiredSchema,
+options.columnPruning)
+
+  private[sql] val parsedSchema = csvFilters.readSchema
+
+  // Mapping of field indexes of `parsedSchema` to indexes of `requiredSchema`.
+  // It returns -1 if `requiredSchema` doesn't contain a field from 
`parsedSchema`.
 
 Review comment:
   What should the upper layer do with the column if a datasource already 
applied filters? As far as I know filters are applied only once in DSv2, 
@cloud-fan or I am wrong?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #26957: [SPARK-30314] Add identifier and 
catalog information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-573970634
 
 
   **[Test build #116672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116672/testReport)**
 for PR 26957 at commit 
[`c80a155`](https://github.com/apache/spark/commit/c80a155610472029eb1e2adf988aba3d0cf54206).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26957: [SPARK-30314] Add identifier and catalog information to DataSourceV2Relation

2020-01-13 Thread GitBox

SparkQA commented on issue #26957: [SPARK-30314] Add identifier and catalog 
information to DataSourceV2Relation
URL: https://github.com/apache/spark/pull/26957#issuecomment-574029847
 
 
   **[Test build #116672 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116672/testReport)**
 for PR 26957 at commit 
[`c80a155`](https://github.com/apache/spark/commit/c80a155610472029eb1e2adf988aba3d0cf54206).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-574028851
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116680/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #27164: [SPARK-30479][SQL] Apply 
compaction of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-574028846
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of 
event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-574028851
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116680/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-13 Thread GitBox

AmplabJenkins commented on issue #27164: [SPARK-30479][SQL] Apply compaction of 
event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-574028846
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-13 Thread GitBox

SparkQA removed a comment on issue #27164: [SPARK-30479][SQL] Apply compaction 
of event log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-573987598
 
 
   **[Test build #116680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116680/testReport)**
 for PR 27164 at commit 
[`01ece29`](https://github.com/apache/spark/commit/01ece298b6be6cd6cb40c89b62a391c7fb47a3e4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #27166: [SPARK-30482][SQL][CORE][TESTS] Add sub-class of `AppenderSkeleton` reusable in tests

2020-01-13 Thread GitBox

MaxGekk commented on issue #27166: [SPARK-30482][SQL][CORE][TESTS] Add 
sub-class of `AppenderSkeleton` reusable in tests
URL: https://github.com/apache/spark/pull/27166#issuecomment-574028201
 
 
   @HyukjinKwon 's fix helped, @maropu this can be merged.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event log to SQL events

2020-01-13 Thread GitBox

SparkQA commented on issue #27164: [SPARK-30479][SQL] Apply compaction of event 
log to SQL events
URL: https://github.com/apache/spark/pull/27164#issuecomment-574028326
 
 
   **[Test build #116680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116680/testReport)**
 for PR 27164 at commit 
[`01ece29`](https://github.com/apache/spark/commit/01ece298b6be6cd6cb40c89b62a391c7fb47a3e4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression al

2020-01-13 Thread GitBox

maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r366171505
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -118,6 +118,52 @@ import org.apache.spark.sql.types.IntegerType
  *   LocalTableScan [...]
  * }}}
  *
+ * Third example: single distinct aggregate function with filter clauses (in 
sql):
+ * {{{
+ *   SELECT
+ * COUNT(DISTINCT cat1) FILTER (WHERE id > 1) as cat1_cnt1,
+ * COUNT(DISTINCT cat1) as cat1_cnt2,
+ * SUM(value) AS total
+ *  FROM
+ *data
+ *  GROUP BY
+ *key
+ * }}}
+ *
+ * This translates to the following (pseudo) logical plan:
+ * {{{
+ * Aggregate(
+ *key = ['key]
+ *functions = [COUNT(DISTINCT 'cat1) with FILTER('id > 1),
+ * COUNT(DISTINCT 'cat1),
+ * sum('value)]
+ *output = ['key, 'cat1_cnt1, 'cat1_cnt2, 'total])
+ *   LocalTableScan [...]
+ * }}}
+ *
+ * This rule rewrites this logical plan to the following (pseudo) logical plan:
+ * {{{
+ *   Aggregate(
+ *  key = ['key]
+ *  functions = [count(if (('gid = 1)) '_gen_distinct_1 else null),
+ *   count(if (('gid = 2)) '_gen_distinct_2 else null),
+ *   first(if (('gid = 0)) 'total else null) ignore nulls]
+ *  output = ['key, 'cat1_cnt, 'cat1_cnt2, 'total])
+ * Aggregate(
+ *key = ['key, '_gen_distinct_1, '_gen_distinct_2, 'gid]
+ *functions = [sum('value)]
+ *output = ['key, '_gen_distinct_1, '_gen_distinct_2, 'gid, 'total])
+ *   Expand(
+ *   projections = [('key, null, null, 0, 'value),
+ * ('key, '_gen_distinct_1, null, 1, null),
+ * ('key, null, '_gen_distinct_2, 2, null)]
+ *   output = ['key, '_gen_distinct_1, '_gen_distinct_2, 'gid, 'value])
+ * Expand(
 
 Review comment:
   Probably, this is related to [the 
comment](https://github.com/apache/spark/pull/27058#discussion_r366139379). If 
we avoid the recursive call, I think we can have a chance to merge them.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When one or more DISTINCT aggregate expressions operate on the same field, the DISTINCT aggregate expression al

2020-01-13 Thread GitBox

maropu commented on a change in pull request #27058: [SPARK-30395][SQL] When 
one or more DISTINCT aggregate expressions operate on the same field, the 
DISTINCT aggregate expression allows the use of the FILTER clause
URL: https://github.com/apache/spark/pull/27058#discussion_r366171011
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/RewriteDistinctAggregates.scala
 ##
 @@ -316,6 +362,86 @@ object RewriteDistinctAggregates extends 
Rule[LogicalPlan] {
 }.asInstanceOf[NamedExpression]
   }
   Aggregate(groupByAttrs, patchedAggExpressions, firstAggregate)
+} else if (distinctAggGroups.size == 1) {
+  val (distinctAggExpressions, regularAggExpressions) = 
aggExpressions.partition(_.isDistinct)
+  if (distinctAggExpressions.exists(_.filter.isDefined)) {
+val regularAggExprs = regularAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val regularFunChildren = regularAggExprs
+  .flatMap(_.aggregateFunction.children.filter(!_.foldable))
+val regularFilterAttrs = regularAggExprs.flatMap(_.filterAttributes)
+val regularAggChildren = (regularFunChildren ++ 
regularFilterAttrs).distinct
+val regularAggChildAttrMap = 
regularAggChildren.map(expressionAttributePair)
+val regularAggChildAttrLookup = regularAggChildAttrMap.toMap
+val regularOperatorMap = regularAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+val newChildren = af.children.map(c => 
regularAggChildAttrLookup.getOrElse(c, c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val filterOpt = filter.map(_.transform {
+  case a: Attribute => regularAggChildAttrLookup.getOrElse(a, a)
+})
+val aggExpr = ae.copy(aggregateFunction = raf, filter = filterOpt)
+(ae, aggExpr)
+}
+val distinctAggExprs = distinctAggExpressions.filter(e => 
e.children.exists(!_.foldable))
+val rewriteDistinctOperatorMap = distinctAggExprs.map {
+  case ae @ AggregateExpression(af, _, _, filter, _) =>
+// Why do we need to construct the phantom id ?
+// First, In order to reduce costs, it is better to handle the 
filter clause locally.
+// e.g. COUNT (DISTINCT a) FILTER (WHERE id > 1), evaluate 
expression
+// If(id > 1) 'a else null first, and use the result as output.
+// Second, If more than one DISTINCT aggregate expression uses the 
same column,
+// We need to construct the phantom attributes so as the output 
not lost.
+// e.g. SUM (DISTINCT a), COUNT (DISTINCT a) FILTER (WHERE id > 1) 
will output
+// attribute '_gen_distinct-1 and attribute '_gen_distinct-2 
instead of two 'a.
+// Note: We just need to illusion the expression with filter 
clause.
+// The illusionary mechanism may result in multiple distinct 
aggregations uses
+// different column, so we still need to call `rewrite`.
+val phantomId = NamedExpression.newExprId.id
+val unfoldableChildren = af.children.filter(!_.foldable)
+val exprAttrs = unfoldableChildren.map { e =>
+  (e, AttributeReference(s"_gen_distinct_$phantomId", e.dataType, 
nullable = true)())
+}
+val exprAttrLookup = exprAttrs.toMap
+val newChildren = af.children.map(c => exprAttrLookup.getOrElse(c, 
c))
+val raf = 
af.withNewChildren(newChildren).asInstanceOf[AggregateFunction]
+val aggExpr = ae.copy(aggregateFunction = raf, filter = None)
+// Expand projection
+val projection = unfoldableChildren.map {
+  case e if filter.isDefined => If(filter.get, e, nullify(e))
+  case e => e
+}
+(projection, exprAttrs, (ae, aggExpr))
+}
+val rewriteDistinctAttrMap = rewriteDistinctOperatorMap.flatMap(_._2)
+val distinctAggChildAttrs = rewriteDistinctAttrMap.map(_._2)
+val allAggAttrs = regularAggChildAttrMap.map(_._2) ++ 
distinctAggChildAttrs
+// Construct the aggregate input projection.
+val rewriteDistinctProjections = 
rewriteDistinctOperatorMap.flatMap(_._1)
+val rewriteAggProjections =
+  Seq((a.groupingExpressions ++ regularAggChildren ++ 
rewriteDistinctProjections))
+val groupByMap = a.groupingExpressions.collect {
+  case ne: NamedExpression => ne -> ne.toAttribute
+  case e => e -> AttributeReference(e.sql, e.dataType, e.nullable)()
+}
+val groupByAttrs = groupByMap.map(_._2)
+// Construct the expand operator.
+val expand = Expand(rewriteAggProjections, groupByAttrs ++ 
allAggAttrs, a.child)
+val rewriteAggExprLookup =
+  (rewriteDistinctOperatorMap.map(_._3) ++

[GitHub] [spark] viirya commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-13 Thread GitBox

viirya commented on a change in pull request #27187: [SPARK-30497][SQL]  
migrate DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#discussion_r366167726
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -900,6 +902,18 @@ class Analyzer(
 }
   case _ => u
 }
+
+  case u @ UnresolvedTableOrView(identifier) =>
+expandRelationName(identifier) match {
+  case SessionCatalogAndIdentifier(catalog, ident) =>
+CatalogV2Util.loadTable(catalog, ident) match {
+  // We don't support view in v2 catalog yet. Here we treat v1 
view as table and use
+  // `ResolvedTable` to carry it.
+  case Some(table) => ResolvedTable(catalog.asTableCatalog, ident, 
table)
 
 Review comment:
   Do you mean to add a case?
   
   ```
   CatalogV2Util.loadTable(catalog, ident) match {
 case Some(v1Table: V1Table) if v1Table.v1Table.tableType == 
CatalogTableType.VIEW =>
 ...
   }
   ```
   
   Otherwise I'm not sure why `case Some(table) =>` is for v1 view.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #27187: [SPARK-30497][SQL] migrate DESCRIBE TABLE to the new framework

2020-01-13 Thread GitBox

viirya commented on a change in pull request #27187: [SPARK-30497][SQL]  
migrate DESCRIBE TABLE to the new framework
URL: https://github.com/apache/spark/pull/27187#discussion_r366169592
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ##
 @@ -759,6 +759,8 @@ class Analyzer(
   u.failAnalysis(s"${ident.quoted} is a temp view not table.")
 }
 u
+  case u @ UnresolvedTableOrView(ident) =>
+lookupTempView(ident).getOrElse(u)
 
 Review comment:
   Looks like it is very similar to `UnresolvedRelation`?
   
   `UnresolvedRelation` can also be resolve to view by `ResolveTempViews`, can 
be resolved to v2 relation by `ResolveTables`, can be resolved to v1 relation 
by `ResolveRelations`.
   
   Seems we have a few similar ways to do resolving?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2020-01-13 Thread GitBox

cloud-fan commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality 
wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-574024661
 
 
   sounds good, can you open a new PR for your new idea? then we can review and 
leave comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option `ignoreExtension` in sql-data-sources-avro.md

2020-01-13 Thread GitBox

MaxGekk commented on issue #27194: [SPARK-30505][DOCS] Deprecate Avro option 
`ignoreExtension` in sql-data-sources-avro.md
URL: https://github.com/apache/spark/pull/27194#issuecomment-574024291
 
 
   I described the global option because it has been already described in each 
load method - json, text, parquet ... in DataFrameReader, for example 
https://github.com/apache/spark/blob/f8d59572b014e5254b0c574b26e101c2e4157bdd/sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala#L410
   
   > Are you interested in writing the new doc?
   
   No, I am not.
   
   If you think, description of the option `pathGlobFilter ` is useless in 
`sql-data-sources-avro.md`. Let me know, I will revert it from the PR but I 
would leave it as in this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #23859: [SPARK-26956][SS] remove streaming output mode from data source v2 APIs

2020-01-13 Thread GitBox

cloud-fan commented on issue #23859: [SPARK-26956][SS] remove streaming output 
mode from data source v2 APIs
URL: https://github.com/apache/spark/pull/23859#issuecomment-574023315
 
 
   the "new data" mentioned in this PR means the new result of running the 
query with the entire input, not a single micro-batch. This is the semantic of 
complete mode AFAIK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26201: [SPARK-29543][SS][UI] Init structured streaming ui

2020-01-13 Thread GitBox

AmplabJenkins removed a comment on issue #26201: [SPARK-29543][SS][UI] Init 
structured streaming ui
URL: https://github.com/apache/spark/pull/26201#issuecomment-574020999
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116681/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1194 matches

Mail list logo