[jira] [Created] (SPARK-46405) Issue with CSV schema inference and malformed records

2023-12-14 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-46405: --- Summary: Issue with CSV schema inference and malformed records Key: SPARK-46405 URL: https://issues.apache.org/jira/browse/SPARK-46405 Project: Spark Issue

[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-45815: Component/s: Structured Streaming > Provide an interface for Streaming sources to add _metadata

[jira] [Updated] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-45815: Description: Currently, only the native V1 file-based streaming source can read the `_metadata`

[jira] [Created] (SPARK-45815) Provide an interface for Streaming sources to add _metadata columns

2023-11-06 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-45815: --- Summary: Provide an interface for Streaming sources to add _metadata columns Key: SPARK-45815 URL: https://issues.apache.org/jira/browse/SPARK-45815 Project: Spark

[jira] [Created] (SPARK-45035) Support ignoreCorruptFiles for multiline CSV

2023-08-31 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-45035: --- Summary: Support ignoreCorruptFiles for multiline CSV Key: SPARK-45035 URL: https://issues.apache.org/jira/browse/SPARK-45035 Project: Spark Issue Type:

[jira] [Created] (SPARK-43177) Add deprecation warning for input_file_name()

2023-04-18 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-43177: --- Summary: Add deprecation warning for input_file_name() Key: SPARK-43177 URL: https://issues.apache.org/jira/browse/SPARK-43177 Project: Spark Issue Type:

[jira] [Created] (SPARK-41151) Keep built-in file _metadata column nullable value consistent

2022-11-15 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-41151: --- Summary: Keep built-in file _metadata column nullable value consistent Key: SPARK-41151 URL: https://issues.apache.org/jira/browse/SPARK-41151 Project: Spark

[jira] [Updated] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-41143: Description: Parser can parse: _FUNC{_}_{_} ( key0 => value0 ) (was: Parser can parse: _FUNC_ (

[jira] [Updated] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-41143: Description: The parser can parse: {code:java} _FUNC_ ( key0 => value0 ){code} was:Parser can

[jira] [Updated] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-41143: Description: Parser can parse: _{_}FUNC_{_} ( key0 => value0 ) (was: Parser can parse:

[jira] [Created] (SPARK-41143) Add named arguments function syntax support and trait

2022-11-14 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-41143: --- Summary: Add named arguments function syntax support and trait Key: SPARK-41143 URL: https://issues.apache.org/jira/browse/SPARK-41143 Project: Spark Issue

[jira] [Created] (SPARK-41142) Support named arguments functions

2022-11-14 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-41142: --- Summary: Support named arguments functions Key: SPARK-41142 URL: https://issues.apache.org/jira/browse/SPARK-41142 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-19 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606678#comment-17606678 ] Yaohua Zhao commented on SPARK-40460: - [~kabhwan] You are right! Updated > Streaming metrics is

[jira] [Updated] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-19 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-40460: Affects Version/s: 3.4.0 > Streaming metrics is zero when select _metadata >

[jira] [Updated] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-19 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-40460: Affects Version/s: 3.3.1 3.3.2 (was: 3.2.0)

[jira] [Updated] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-15 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-40460: Description: Streaming metrics report all 0 (`processedRowsPerSecond`, etc) when selecting

[jira] [Created] (SPARK-40460) Streaming metrics is zero when select _metadata

2022-09-15 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-40460: --- Summary: Streaming metrics is zero when select _metadata Key: SPARK-40460 URL: https://issues.apache.org/jira/browse/SPARK-40460 Project: Spark Issue Type:

[jira] [Updated] (SPARK-39768) Strip any CRLF character if lineSep is not set in CSV data source

2022-07-13 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-39768: Summary: Strip any CRLF character if lineSep is not set in CSV data source (was: Strip any CLRF

[jira] [Updated] (SPARK-39768) Strip any CRLF character if lineSep is not set in CSV data source

2022-07-13 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-39768: Description: If `lineSep` is not set, the line separator is automatically detected. To be safe,

[jira] [Commented] (SPARK-39768) Strip any CLRF character if lineSep is not set in CSV data source

2022-07-13 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-39768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566417#comment-17566417 ] Yaohua Zhao commented on SPARK-39768: - cc @[~hyukjin.kwon]  > Strip any CLRF character if lineSep

[jira] [Created] (SPARK-39768) Strip any CLRF character if lineSep is not set in CSV data source

2022-07-13 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-39768: --- Summary: Strip any CLRF character if lineSep is not set in CSV data source Key: SPARK-39768 URL: https://issues.apache.org/jira/browse/SPARK-39768 Project: Spark

[jira] [Created] (SPARK-39689) Support 2-chars lineSep in CSV datasource

2022-07-05 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-39689: --- Summary: Support 2-chars lineSep in CSV datasource Key: SPARK-39689 URL: https://issues.apache.org/jira/browse/SPARK-39689 Project: Spark Issue Type:

[jira] [Created] (SPARK-39404) Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame

2022-06-07 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-39404: --- Summary: Unable to query _metadata in streaming if getBatch returns multiple logical nodes in the DataFrame Key: SPARK-39404 URL: https://issues.apache.org/jira/browse/SPARK-39404

[jira] [Created] (SPARK-39014) Respect ignoreMissingFiles from Data Source options in InMemoryFileIndex

2022-04-25 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-39014: --- Summary: Respect ignoreMissingFiles from Data Source options in InMemoryFileIndex Key: SPARK-39014 URL: https://issues.apache.org/jira/browse/SPARK-39014 Project:

[jira] [Created] (SPARK-38767) Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options

2022-04-01 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-38767: --- Summary: Support ignoreCorruptFiles and ignoreMissingFiles in Data Source options Key: SPARK-38767 URL: https://issues.apache.org/jira/browse/SPARK-38767 Project:

[jira] [Updated] (SPARK-38323) Support the hidden file metadata in Streaming

2022-02-24 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-38323: Description: Currently, querying the hidden file metadata struct `_metadata` will fail with

[jira] [Created] (SPARK-38323) Support the hidden file metadata in Streaming

2022-02-24 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-38323: --- Summary: Support the hidden file metadata in Streaming Key: SPARK-38323 URL: https://issues.apache.org/jira/browse/SPARK-38323 Project: Spark Issue Type:

[jira] [Updated] (SPARK-38314) Fail to read parquet files after writing the hidden file metadata in

2022-02-24 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-38314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao updated SPARK-38314: Description: Selecting and then writing df containing hidden file metadata column `_metadata`

[jira] [Created] (SPARK-38314) Fail to read parquet files after writing the hidden file metadata in

2022-02-24 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-38314: --- Summary: Fail to read parquet files after writing the hidden file metadata in Key: SPARK-38314 URL: https://issues.apache.org/jira/browse/SPARK-38314 Project: Spark

[jira] [Resolved] (SPARK-37767) Follow-up Improvements of Hidden File Metadata Support for Spark SQL

2022-02-10 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao resolved SPARK-37767. - Resolution: Fixed > Follow-up Improvements of Hidden File Metadata Support for Spark SQL >

[jira] [Created] (SPARK-38159) Minor refactor of MetadataAttribute unapply method

2022-02-09 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-38159: --- Summary: Minor refactor of MetadataAttribute unapply method Key: SPARK-38159 URL: https://issues.apache.org/jira/browse/SPARK-38159 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-37770) Performance improvements for ColumnVector `putByteArray`

2022-02-09 Thread Yaohua Zhao (Jira)
[ https://issues.apache.org/jira/browse/SPARK-37770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaohua Zhao resolved SPARK-37770. - Resolution: Fixed > Performance improvements for ColumnVector `putByteArray` >

[jira] [Created] (SPARK-37896) ConstantColumnVector: a column vector with same values

2022-01-13 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37896: --- Summary: ConstantColumnVector: a column vector with same values Key: SPARK-37896 URL: https://issues.apache.org/jira/browse/SPARK-37896 Project: Spark Issue

[jira] [Created] (SPARK-37770) Performance improvements for ColumnVector `putByteArray`

2021-12-28 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37770: --- Summary: Performance improvements for ColumnVector `putByteArray` Key: SPARK-37770 URL: https://issues.apache.org/jira/browse/SPARK-37770 Project: Spark Issue

[jira] [Created] (SPARK-37769) Filter on the metadata struct

2021-12-28 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37769: --- Summary: Filter on the metadata struct Key: SPARK-37769 URL: https://issues.apache.org/jira/browse/SPARK-37769 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-37768) Schema pruning for the metadata struct

2021-12-28 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37768: --- Summary: Schema pruning for the metadata struct Key: SPARK-37768 URL: https://issues.apache.org/jira/browse/SPARK-37768 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-37767) Follow-up Improvements of Hidden File Metadata Support for Spark SQL

2021-12-28 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37767: --- Summary: Follow-up Improvements of Hidden File Metadata Support for Spark SQL Key: SPARK-37767 URL: https://issues.apache.org/jira/browse/SPARK-37767 Project: Spark

[jira] [Created] (SPARK-37273) Hidden File Metadata Support for Spark SQL

2021-11-10 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37273: --- Summary: Hidden File Metadata Support for Spark SQL Key: SPARK-37273 URL: https://issues.apache.org/jira/browse/SPARK-37273 Project: Spark Issue Type: