Noemi Pap-Takacs created IMPALA-12867:
-----------------------------------------

             Summary: Filter files to OPTIMIZE based on file size
                 Key: IMPALA-12867
                 URL: https://issues.apache.org/jira/browse/IMPALA-12867
             Project: IMPALA
          Issue Type: Sub-task
            Reporter: Noemi Pap-Takacs
            Assignee: Noemi Pap-Takacs


{{'OPTIMIZE TABLE <table_name>'}} rewrites all files of the table regardless of 
size and type, even if the table does not contain any small  or delete files.
With '{{{}FILE_SIZE_THRESHOLD'{}}} option, the user should be able to specify a 
file size limit to rewrite only small files.
{code:java}
Syntax: OPTIMIZE TABLE <table_name> (FILE_SIZE_THRESHOLD=100);{code}
The value of the threshold is the file size in MBs. Data files larger than the 
given limit will only be rewritten if they are referenced from delete deltas.

Note that if '{{{}FILE_SIZE_THRESHOLD'{}}} is set, only the selected files will 
be rewritten according to the latest schema and partition spec. Therefore the 
intact data files might still have an older schema or partition layout. Use 
{{'OPTIMIZE TABLE table_name'}} to rewrite the entire table according to the 
latest schema and partititon layout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to