[jira] [Commented] (SPARK-44265) Built-in XML data source support
[ https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845650#comment-17845650 ] HiuFung Kwok commented on SPARK-44265: -- [~gurwls223] Hi, I have double-checked all remaining tickets, these are all duplicated tickets and the functionality already exists on the master branch of Spark. Perhaps we can go ahead to mark all sub-tasks as duplicated and mark this umbrella task as resolved? > Built-in XML data source support > > > Key: SPARK-44265 > URL: https://issues.apache.org/jira/browse/SPARK-44265 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Priority: Critical > Labels: pull-request-available > > XML is a widely used data format. An external spark-xml package > ([https://github.com/databricks/spark-xml)] is available to read and write > XML data in spark. Making spark-xml built-in will provide a better user > experience for Spark SQL and structured streaming. The proposal is to inline > code from spark-xml package. > > Here is the link to > [SPIP|https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19426) Add support for custom coalescers on Data
[ https://issues.apache.org/jira/browse/SPARK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-19426: --- Labels: pull-request-available (was: ) > Add support for custom coalescers on Data > - > > Key: SPARK-19426 > URL: https://issues.apache.org/jira/browse/SPARK-19426 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.0 >Reporter: Marius van Niekerk >Priority: Minor > Labels: pull-request-available > > This is a continuation of SPARK-14042 now that the Dataset api's have > stabilized in Spark 2+. > Provide the same PartitionCoalescer support that exists in the RDD api -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs
[ https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-48240: Assignee: BingKun Pan > Replace `Local[..]` with `"Local[...]"` in the docs > --- > > Key: SPARK-48240 > URL: https://issues.apache.org/jira/browse/SPARK-48240 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs
[ https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-48240. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 46535 [https://github.com/apache/spark/pull/46535] > Replace `Local[..]` with `"Local[...]"` in the docs > --- > > Key: SPARK-48240 > URL: https://issues.apache.org/jira/browse/SPARK-48240 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48243) Support push down char and varchar predicates to Hive Metastore
[ https://issues.apache.org/jira/browse/SPARK-48243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48243: --- Labels: pull-request-available (was: ) > Support push down char and varchar predicates to Hive Metastore > --- > > Key: SPARK-48243 > URL: https://issues.apache.org/jira/browse/SPARK-48243 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.1 >Reporter: Wechar >Priority: Major > Labels: pull-request-available > > Hive Metastore supports {{char}} and {{varchar}} types in partition filter > since [HIVE-26661|https://issues.apache.org/jira/browse/HIVE-26661], so we > can support it in Spark side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48243) Support push down char and varchar predicates to Hive Metastore
Wechar created SPARK-48243: -- Summary: Support push down char and varchar predicates to Hive Metastore Key: SPARK-48243 URL: https://issues.apache.org/jira/browse/SPARK-48243 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.1 Reporter: Wechar Hive Metastore supports {{char}} and {{varchar}} types in partition filter since [HIVE-26661|https://issues.apache.org/jira/browse/HIVE-26661], so we can support it in Spark side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48242) Upgrade extra-enforcer-rules to 1.8.0
[ https://issues.apache.org/jira/browse/SPARK-48242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48242: --- Labels: pull-request-available (was: ) > Upgrade extra-enforcer-rules to 1.8.0 > - > > Key: SPARK-48242 > URL: https://issues.apache.org/jira/browse/SPARK-48242 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48242) Upgrade extra-enforcer-rules to 1.8.0
BingKun Pan created SPARK-48242: --- Summary: Upgrade extra-enforcer-rules to 1.8.0 Key: SPARK-48242 URL: https://issues.apache.org/jira/browse/SPARK-48242 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48241) CSV parsing failure with char/varchar type columns
[ https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48241: --- Labels: pull-request-available (was: ) > CSV parsing failure with char/varchar type columns > -- > > Key: SPARK-48241 > URL: https://issues.apache.org/jira/browse/SPARK-48241 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.1 >Reporter: Jiayi Liu >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > CSV table containing char and varchar columns will result in the following > error when selecting from the CSV table: > {code:java} > java.lang.IllegalArgumentException: requirement failed: requiredSchema > (struct) should be the subset of dataSchema > (struct). > at scala.Predef$.require(Predef.scala:281) > at > org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) > at > org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) > at > org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} > The reason for the error is that the StringType columns in the dataSchema and > requiredSchema of UnivocityParser are not consistent. It is due to the > metadata contained in the StringType StructField of the dataSchema, which is > missing in the requiredSchema. We need to retain the metadata when resolving > schema. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-48241) CSV parsing failure with char/varchar type columns
Jiayi Liu created SPARK-48241: - Summary: CSV parsing failure with char/varchar type columns Key: SPARK-48241 URL: https://issues.apache.org/jira/browse/SPARK-48241 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.1 Reporter: Jiayi Liu Fix For: 4.0.0 CSV table containing char and varchar columns will result in the following error when selecting from the CSV table: {code:java} java.lang.IllegalArgumentException: requirement failed: requiredSchema (struct) should be the subset of dataSchema (struct). at scala.Predef$.require(Predef.scala:281) at org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code} The reason for the error is that the StringType columns in the dataSchema and requiredSchema of UnivocityParser are not consistent. It is due to the metadata contained in the StringType StructField of the dataSchema, which is missing in the requiredSchema. We need to retain the metadata when resolving schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs
[ https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-48240: --- Labels: pull-request-available (was: ) > Replace `Local[..]` with `"Local[...]"` in the docs > --- > > Key: SPARK-48240 > URL: https://issues.apache.org/jira/browse/SPARK-48240 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs
[ https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-48240: Summary: Replace `Local[..]` with `"Local[...]"` in the docs (was: Replace `Local[..]` with `"Local[...]"` in the doc) > Replace `Local[..]` with `"Local[...]"` in the docs > --- > > Key: SPARK-48240 > URL: https://issues.apache.org/jira/browse/SPARK-48240 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 4.0.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org