[jira] [Commented] (SPARK-44265) Built-in XML data source support

2024-05-11 Thread HiuFung Kwok (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845650#comment-17845650
 ] 

HiuFung Kwok commented on SPARK-44265:
--

[~gurwls223] Hi, I have double-checked all remaining tickets, these are all 
duplicated tickets and the functionality already exists on the master branch of 
Spark.

 

Perhaps we can go ahead to mark all sub-tasks as duplicated and mark this 
umbrella task as resolved?

 

 

 

> Built-in XML data source support
> 
>
> Key: SPARK-44265
> URL: https://issues.apache.org/jira/browse/SPARK-44265
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Priority: Critical
>  Labels: pull-request-available
>
> XML is a widely used data format. An external spark-xml package 
> ([https://github.com/databricks/spark-xml)] is available to read and write 
> XML data in spark. Making spark-xml built-in will provide a better user 
> experience for Spark SQL and structured streaming. The proposal is to inline 
> code from spark-xml package.
>  
> Here is the link to 
> [SPIP|https://docs.google.com/document/d/1ZaOBT4-YFtN58UCx2cdFhlsKbie1ugAn-Fgz_Dddz-Q/edit]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19426) Add support for custom coalescers on Data

2024-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-19426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-19426:
---
Labels: pull-request-available  (was: )

> Add support for custom coalescers on Data
> -
>
> Key: SPARK-19426
> URL: https://issues.apache.org/jira/browse/SPARK-19426
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Marius van Niekerk
>Priority: Minor
>  Labels: pull-request-available
>
> This is a continuation of SPARK-14042 now that the Dataset api's have 
> stabilized in Spark 2+.
> Provide the same PartitionCoalescer support that exists in the RDD api



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs

2024-05-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48240:


Assignee: BingKun Pan

> Replace `Local[..]` with `"Local[...]"` in the docs
> ---
>
> Key: SPARK-48240
> URL: https://issues.apache.org/jira/browse/SPARK-48240
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs

2024-05-11 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48240.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46535
[https://github.com/apache/spark/pull/46535]

> Replace `Local[..]` with `"Local[...]"` in the docs
> ---
>
> Key: SPARK-48240
> URL: https://issues.apache.org/jira/browse/SPARK-48240
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48243) Support push down char and varchar predicates to Hive Metastore

2024-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48243:
---
Labels: pull-request-available  (was: )

> Support push down char and varchar predicates to Hive Metastore
> ---
>
> Key: SPARK-48243
> URL: https://issues.apache.org/jira/browse/SPARK-48243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Wechar
>Priority: Major
>  Labels: pull-request-available
>
> Hive Metastore supports {{char}} and {{varchar}} types in partition filter 
> since [HIVE-26661|https://issues.apache.org/jira/browse/HIVE-26661], so we 
> can support it in Spark side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48243) Support push down char and varchar predicates to Hive Metastore

2024-05-11 Thread Wechar (Jira)
Wechar created SPARK-48243:
--

 Summary: Support push down char and varchar predicates to Hive 
Metastore
 Key: SPARK-48243
 URL: https://issues.apache.org/jira/browse/SPARK-48243
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Wechar


Hive Metastore supports {{char}} and {{varchar}} types in partition filter 
since [HIVE-26661|https://issues.apache.org/jira/browse/HIVE-26661], so we can 
support it in Spark side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48242) Upgrade extra-enforcer-rules to 1.8.0

2024-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48242:
---
Labels: pull-request-available  (was: )

> Upgrade extra-enforcer-rules to 1.8.0
> -
>
> Key: SPARK-48242
> URL: https://issues.apache.org/jira/browse/SPARK-48242
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48242) Upgrade extra-enforcer-rules to 1.8.0

2024-05-11 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-48242:
---

 Summary: Upgrade extra-enforcer-rules to 1.8.0
 Key: SPARK-48242
 URL: https://issues.apache.org/jira/browse/SPARK-48242
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48241) CSV parsing failure with char/varchar type columns

2024-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48241:
---
Labels: pull-request-available  (was: )

> CSV parsing failure with char/varchar type columns
> --
>
> Key: SPARK-48241
> URL: https://issues.apache.org/jira/browse/SPARK-48241
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jiayi Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following 
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).
>     at scala.Predef$.require(Predef.scala:281)
>     at 
> org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and 
> requiredSchema of UnivocityParser are not consistent. It is due to the 
> metadata contained in the StringType StructField of the dataSchema, which is 
> missing in the requiredSchema. We need to retain the metadata when resolving 
> schema.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48241) CSV parsing failure with char/varchar type columns

2024-05-11 Thread Jiayi Liu (Jira)
Jiayi Liu created SPARK-48241:
-

 Summary: CSV parsing failure with char/varchar type columns
 Key: SPARK-48241
 URL: https://issues.apache.org/jira/browse/SPARK-48241
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.1
Reporter: Jiayi Liu
 Fix For: 4.0.0


CSV table containing char and varchar columns will result in the following 
error when selecting from the CSV table:
{code:java}
java.lang.IllegalArgumentException: requirement failed: requiredSchema 
(struct) should be the subset of dataSchema 
(struct).
    at scala.Predef$.require(Predef.scala:281)
    at 
org.apache.spark.sql.catalyst.csv.UnivocityParser.(UnivocityParser.scala:56)
    at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
    at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
    at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
    at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
The reason for the error is that the StringType columns in the dataSchema and 
requiredSchema of UnivocityParser are not consistent. It is due to the metadata 
contained in the StringType StructField of the dataSchema, which is missing in 
the requiredSchema. We need to retain the metadata when resolving schema.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs

2024-05-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-48240:
---
Labels: pull-request-available  (was: )

> Replace `Local[..]` with `"Local[...]"` in the docs
> ---
>
> Key: SPARK-48240
> URL: https://issues.apache.org/jira/browse/SPARK-48240
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48240) Replace `Local[..]` with `"Local[...]"` in the docs

2024-05-11 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-48240:

Summary: Replace `Local[..]` with `"Local[...]"` in the docs  (was: Replace 
`Local[..]` with `"Local[...]"` in the doc)

> Replace `Local[..]` with `"Local[...]"` in the docs
> ---
>
> Key: SPARK-48240
> URL: https://issues.apache.org/jira/browse/SPARK-48240
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org