[jira] [Commented] (IMPALA-7936) Enable better control over Parquet writing
[ https://issues.apache.org/jira/browse/IMPALA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891141#comment-16891141 ] Zoltán Borók-Nagy commented on IMPALA-7936: --- Normally the users won't need to set these query options, but I think it would be better to have these documented anyway. > Enable better control over Parquet writing > -- > > Key: IMPALA-7936 > URL: https://issues.apache.org/jira/browse/IMPALA-7936 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 3.3.0 > > > With the introduction of the Parquet page indexes it became desirable to have > more control over how Impala writes Parquet files. > These configuration options (probably implemented as query options) would be: > * enable/disable Parquet page index writing (currently we can do it with a > command-line argument) > * set page-size limits based on row count > * -Set truncation length for statistics about string values- (current > truncation length is 64, it is unlikely to have user data that needs longer > truncation than that) > They'd enable writing more complete tests for page filtering. They'd be also > useful for fine-tuning the page index for better performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7936) Enable better control over Parquet writing
[ https://issues.apache.org/jira/browse/IMPALA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890554#comment-16890554 ] Alex Rodoni commented on IMPALA-7936: - [~boroknagyz] . [~lv] The PARQUET_READ_PAGE_INDEX query option is not documented. Does it need to be documented along with these 2 new query options? Or is it undocumented for a reason? > Enable better control over Parquet writing > -- > > Key: IMPALA-7936 > URL: https://issues.apache.org/jira/browse/IMPALA-7936 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 3.3.0 > > > With the introduction of the Parquet page indexes it became desirable to have > more control over how Impala writes Parquet files. > These configuration options (probably implemented as query options) would be: > * enable/disable Parquet page index writing (currently we can do it with a > command-line argument) > * set page-size limits based on row count > * -Set truncation length for statistics about string values- (current > truncation length is 64, it is unlikely to have user data that needs longer > truncation than that) > They'd enable writing more complete tests for page filtering. They'd be also > useful for fine-tuning the page index for better performance. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7936) Enable better control over Parquet writing
[ https://issues.apache.org/jira/browse/IMPALA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845262#comment-16845262 ] ASF subversion and git services commented on IMPALA-7936: - Commit 95be560e0c43a5f8ea7b00f3ba9f83ec3f734ca2 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=95be560 ] IMPALA-7936: Enable better control over Parquet writing This commit adds two new query options to Impala. One is to disable/ enable parquet page index writing (by default it is enabled), the other is to set a row count limit per Parquet page (by default there is no row count limit). It removes the old command-line flag that controlled the enablement of page index writing. Since page index writing is the default since IMPALA-5843, I moved the tests from the "custom cluster" test suite to the "query test" test suite. This way the tests run faster because we don't need to restart the Impala daemons. Testing: Added new test cases to test the effect of the query options. Change-Id: Ib9ec8b16036e1fd35886e887809be8eca52a6982 Reviewed-on: http://gerrit.cloudera.org:8080/13361 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Enable better control over Parquet writing > -- > > Key: IMPALA-7936 > URL: https://issues.apache.org/jira/browse/IMPALA-7936 > Project: IMPALA > Issue Type: Improvement >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > With the introduction of the Parquet page indexes it became desirable to have > more control over how Impala writes Parquet files. > These configuration options (probably implemented as query options) would be: > * enable/disable Parquet page index writing (currently we can do it with a > command-line argument) > * set page-size limits based on row count > * -Set truncation length for statistics about string values- (current > truncation length is 64, it is unlikely to have user data that needs longer > truncation than that) > They'd enable writing more complete tests for page filtering. They'd be also > useful for fine-tuning the page index for better performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org