[GitHub] [incubator-paimon] JingsongLi opened a new issue, #735: [Feature] Support materialized column to improve query performance for complex types

via GitHub Tue, 28 Mar 2023 18:51:14 -0700


JingsongLi opened a new issue, #735:
URL: https://github.com/apache/incubator-paimon/issues/735


   ### Search before asking
   
   - [X] I searched in the 
[issues](https://github.com/apache/incubator-paimon/issues) and found nothing 
similar.
   
   
   ### Motivation
   
   In the world of data warehouse, it is very common to use one or more columns 
from a complex type such as a map, or to put many subfields into it. These 
operations can greatly affect query performance because:
   
   1. These operations are very wasteful IO. For example, if we have a field 
type of Map, which contains dozens of subfields, we need to read the entire 
column when reading this column. And Spark will traverse the entire map to get 
the value of the target key.
   2. Cannot take advantage of vectorized reads when reading nested type 
columns.
   3. Filter pushdown cannot be used when reading nested columns.
   
   It is necessary to introduce the materialized column feature in Flink Table 
Store, which transparently solves the above problems of arbitrary columnar 
storage (not just Parquet).
   
   ### Solution
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@paimon.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [incubator-paimon] JingsongLi opened a new issue, #735: [Feature] Support materialized column to improve query performance for complex types

Reply via email to