[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-36528:
-
Parent: (was: SPARK-35743)
Issue Type: Bug  (was: Sub-task)

> Implement lazy decoding for the vectorized Parquet reader
> -
>
> Key: SPARK-36528
> URL: https://issues.apache.org/jira/browse/SPARK-36528
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector 
> and then operate on the decoded data. However, it may be more efficient to 
> directly operate on encoded data, for instance, performing filter or 
> aggregation on RLE-encoded data, or performing comparison over 
> dictionary-encoded string data. This can also potentially work with encodings 
> in Parquet v2 format, such as DELTA_BYTE_ARRAY.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2023-01-06 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-36528:
-
Issue Type: New Feature  (was: Bug)

> Implement lazy decoding for the vectorized Parquet reader
> -
>
> Key: SPARK-36528
> URL: https://issues.apache.org/jira/browse/SPARK-36528
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector 
> and then operate on the decoded data. However, it may be more efficient to 
> directly operate on encoded data, for instance, performing filter or 
> aggregation on RLE-encoded data, or performing comparison over 
> dictionary-encoded string data. This can also potentially work with encodings 
> in Parquet v2 format, such as DELTA_BYTE_ARRAY.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2021-08-26 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-36528:
-
Description: Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) 
into column vector and then operate on the decoded data. However, it may be 
more efficient to directly operate on encoded data, for instance, performing 
filter or aggregation on RLE-encoded data, or performing comparison over 
dictionary-encoded string data. This can also potentially work with encodings 
in Parquet v2 format, such as DELTA_BYTE_ARRAY.  (was: Currently Spark first 
decode (e.g., RLE/bit-packed, PLAIN) into column vector and then operate on the 
decoded data. However, it may be more efficient to directly operate on encoded 
data (e.g., when the data is using RLE encoding). This can also potentially 
work with encodings in Parquet v2 format, such as DELTA_BYTE_ARRAY.)

> Implement lazy decoding for the vectorized Parquet reader
> -
>
> Key: SPARK-36528
> URL: https://issues.apache.org/jira/browse/SPARK-36528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector 
> and then operate on the decoded data. However, it may be more efficient to 
> directly operate on encoded data, for instance, performing filter or 
> aggregation on RLE-encoded data, or performing comparison over 
> dictionary-encoded string data. This can also potentially work with encodings 
> in Parquet v2 format, such as DELTA_BYTE_ARRAY.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36528) Implement lazy decoding for the vectorized Parquet reader

2021-08-16 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-36528:
-
Description: Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) 
into column vector and then operate on the decoded data. However, it may be 
more efficient to directly operate on encoded data (e.g., when the data is 
using RLE encoding). This can also potentially work with encodings in Parquet 
v2 format, such as DELTA_BYTE_ARRAY.  (was: Currently Spark first decode (e.g., 
RLE/bit-packed, PLAIN) into column vector and then operate on the decoded data. 
However, it may be more efficient to directly operate on encoded data (e.g., 
when the data is using RLE encoding).)

> Implement lazy decoding for the vectorized Parquet reader
> -
>
> Key: SPARK-36528
> URL: https://issues.apache.org/jira/browse/SPARK-36528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently Spark first decode (e.g., RLE/bit-packed, PLAIN) into column vector 
> and then operate on the decoded data. However, it may be more efficient to 
> directly operate on encoded data (e.g., when the data is using RLE encoding). 
> This can also potentially work with encodings in Parquet v2 format, such as 
> DELTA_BYTE_ARRAY.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org