Yaohua Zhao created SPARK-46405:
---
Summary: Issue with CSV schema inference and malformed records
Key: SPARK-46405
URL: https://issues.apache.org/jira/browse/SPARK-46405
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-45815:
Component/s: Structured Streaming
> Provide an interface for Streaming sources to add _metadata
[
https://issues.apache.org/jira/browse/SPARK-45815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-45815:
Description:
Currently, only the native V1 file-based streaming source can read the
`_metadata`
Yaohua Zhao created SPARK-45815:
---
Summary: Provide an interface for Streaming sources to add
_metadata columns
Key: SPARK-45815
URL: https://issues.apache.org/jira/browse/SPARK-45815
Project: Spark
Yaohua Zhao created SPARK-45035:
---
Summary: Support ignoreCorruptFiles for multiline CSV
Key: SPARK-45035
URL: https://issues.apache.org/jira/browse/SPARK-45035
Project: Spark
Issue Type:
Yaohua Zhao created SPARK-43177:
---
Summary: Add deprecation warning for input_file_name()
Key: SPARK-43177
URL: https://issues.apache.org/jira/browse/SPARK-43177
Project: Spark
Issue Type:
Yaohua Zhao created SPARK-41151:
---
Summary: Keep built-in file _metadata column nullable value
consistent
Key: SPARK-41151
URL: https://issues.apache.org/jira/browse/SPARK-41151
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-41143:
Description: Parser can parse: _FUNC{_}_{_} ( key0 => value0 ) (was:
Parser can parse: _FUNC_ (
[
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-41143:
Description:
The parser can parse:
{code:java}
_FUNC_ ( key0 => value0 ){code}
was:Parser can
[
https://issues.apache.org/jira/browse/SPARK-41143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-41143:
Description: Parser can parse: _{_}FUNC_{_} ( key0 => value0 ) (was:
Parser can parse:
Yaohua Zhao created SPARK-41143:
---
Summary: Add named arguments function syntax support and trait
Key: SPARK-41143
URL: https://issues.apache.org/jira/browse/SPARK-41143
Project: Spark
Issue
Yaohua Zhao created SPARK-41142:
---
Summary: Support named arguments functions
Key: SPARK-41142
URL: https://issues.apache.org/jira/browse/SPARK-41142
Project: Spark
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17606678#comment-17606678
]
Yaohua Zhao commented on SPARK-40460:
-
[~kabhwan] You are right! Updated
> Streaming metrics is
[
https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-40460:
Affects Version/s: 3.4.0
> Streaming metrics is zero when select _metadata
>
[
https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-40460:
Affects Version/s: 3.3.1
3.3.2
(was: 3.2.0)
[
https://issues.apache.org/jira/browse/SPARK-40460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-40460:
Description: Streaming metrics report all 0 (`processedRowsPerSecond`, etc)
when selecting
Yaohua Zhao created SPARK-40460:
---
Summary: Streaming metrics is zero when select _metadata
Key: SPARK-40460
URL: https://issues.apache.org/jira/browse/SPARK-40460
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-39768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-39768:
Summary: Strip any CRLF character if lineSep is not set in CSV data source
(was: Strip any CLRF
[
https://issues.apache.org/jira/browse/SPARK-39768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-39768:
Description: If `lineSep` is not set, the line separator is automatically
detected. To be safe,
[
https://issues.apache.org/jira/browse/SPARK-39768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566417#comment-17566417
]
Yaohua Zhao commented on SPARK-39768:
-
cc @[~hyukjin.kwon]
> Strip any CLRF character if lineSep
Yaohua Zhao created SPARK-39768:
---
Summary: Strip any CLRF character if lineSep is not set in CSV
data source
Key: SPARK-39768
URL: https://issues.apache.org/jira/browse/SPARK-39768
Project: Spark
Yaohua Zhao created SPARK-39689:
---
Summary: Support 2-chars lineSep in CSV datasource
Key: SPARK-39689
URL: https://issues.apache.org/jira/browse/SPARK-39689
Project: Spark
Issue Type:
Yaohua Zhao created SPARK-39404:
---
Summary: Unable to query _metadata in streaming if getBatch
returns multiple logical nodes in the DataFrame
Key: SPARK-39404
URL: https://issues.apache.org/jira/browse/SPARK-39404
Yaohua Zhao created SPARK-39014:
---
Summary: Respect ignoreMissingFiles from Data Source options in
InMemoryFileIndex
Key: SPARK-39014
URL: https://issues.apache.org/jira/browse/SPARK-39014
Project:
Yaohua Zhao created SPARK-38767:
---
Summary: Support ignoreCorruptFiles and ignoreMissingFiles in Data
Source options
Key: SPARK-38767
URL: https://issues.apache.org/jira/browse/SPARK-38767
Project:
[
https://issues.apache.org/jira/browse/SPARK-38323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-38323:
Description:
Currently, querying the hidden file metadata struct `_metadata` will fail with
Yaohua Zhao created SPARK-38323:
---
Summary: Support the hidden file metadata in Streaming
Key: SPARK-38323
URL: https://issues.apache.org/jira/browse/SPARK-38323
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-38314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao updated SPARK-38314:
Description:
Selecting and then writing df containing hidden file metadata column
`_metadata`
Yaohua Zhao created SPARK-38314:
---
Summary: Fail to read parquet files after writing the hidden file
metadata in
Key: SPARK-38314
URL: https://issues.apache.org/jira/browse/SPARK-38314
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-37767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao resolved SPARK-37767.
-
Resolution: Fixed
> Follow-up Improvements of Hidden File Metadata Support for Spark SQL
>
Yaohua Zhao created SPARK-38159:
---
Summary: Minor refactor of MetadataAttribute unapply method
Key: SPARK-38159
URL: https://issues.apache.org/jira/browse/SPARK-38159
Project: Spark
Issue Type:
[
https://issues.apache.org/jira/browse/SPARK-37770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaohua Zhao resolved SPARK-37770.
-
Resolution: Fixed
> Performance improvements for ColumnVector `putByteArray`
>
Yaohua Zhao created SPARK-37896:
---
Summary: ConstantColumnVector: a column vector with same values
Key: SPARK-37896
URL: https://issues.apache.org/jira/browse/SPARK-37896
Project: Spark
Issue
Yaohua Zhao created SPARK-37770:
---
Summary: Performance improvements for ColumnVector `putByteArray`
Key: SPARK-37770
URL: https://issues.apache.org/jira/browse/SPARK-37770
Project: Spark
Issue
Yaohua Zhao created SPARK-37769:
---
Summary: Filter on the metadata struct
Key: SPARK-37769
URL: https://issues.apache.org/jira/browse/SPARK-37769
Project: Spark
Issue Type: Sub-task
Yaohua Zhao created SPARK-37768:
---
Summary: Schema pruning for the metadata struct
Key: SPARK-37768
URL: https://issues.apache.org/jira/browse/SPARK-37768
Project: Spark
Issue Type: Sub-task
Yaohua Zhao created SPARK-37767:
---
Summary: Follow-up Improvements of Hidden File Metadata Support
for Spark SQL
Key: SPARK-37767
URL: https://issues.apache.org/jira/browse/SPARK-37767
Project: Spark
Yaohua Zhao created SPARK-37273:
---
Summary: Hidden File Metadata Support for Spark SQL
Key: SPARK-37273
URL: https://issues.apache.org/jira/browse/SPARK-37273
Project: Spark
Issue Type:
38 matches
Mail list logo