[jira] [Commented] (IMPALA-7936) Enable better control over Parquet writing

2019-07-23 Thread JIRA


[ 
https://issues.apache.org/jira/browse/IMPALA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891141#comment-16891141
 ] 

Zoltán Borók-Nagy commented on IMPALA-7936:
---

Normally the users won't need to set these query options, but I think it would 
be better to have these documented anyway.

> Enable better control over Parquet writing
> --
>
> Key: IMPALA-7936
> URL: https://issues.apache.org/jira/browse/IMPALA-7936
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> With the introduction of the Parquet page indexes it became desirable to have 
> more control over how Impala writes Parquet files.
> These configuration options (probably implemented as query options) would be:
>  * enable/disable Parquet page index writing (currently we can do it with a 
> command-line argument)
>  * set page-size limits based on row count
>  * -Set truncation length for statistics about string values-   (current 
> truncation length is 64, it is unlikely to have user data that needs longer 
> truncation than that)
> They'd enable writing more complete tests for page filtering. They'd be also 
> useful for fine-tuning the page index for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7936) Enable better control over Parquet writing

2019-07-22 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890554#comment-16890554
 ] 

Alex Rodoni commented on IMPALA-7936:
-

[~boroknagyz] . [~lv] The PARQUET_READ_PAGE_INDEX query option is not 
documented. Does it need to be documented along with these 2 new query options? 
Or is it undocumented for a reason?

> Enable better control over Parquet writing
> --
>
> Key: IMPALA-7936
> URL: https://issues.apache.org/jira/browse/IMPALA-7936
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 3.3.0
>
>
> With the introduction of the Parquet page indexes it became desirable to have 
> more control over how Impala writes Parquet files.
> These configuration options (probably implemented as query options) would be:
>  * enable/disable Parquet page index writing (currently we can do it with a 
> command-line argument)
>  * set page-size limits based on row count
>  * -Set truncation length for statistics about string values-   (current 
> truncation length is 64, it is unlikely to have user data that needs longer 
> truncation than that)
> They'd enable writing more complete tests for page filtering. They'd be also 
> useful for fine-tuning the page index for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7936) Enable better control over Parquet writing

2019-05-21 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845262#comment-16845262
 ] 

ASF subversion and git services commented on IMPALA-7936:
-

Commit 95be560e0c43a5f8ea7b00f3ba9f83ec3f734ca2 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=95be560 ]

IMPALA-7936: Enable better control over Parquet writing

This commit adds two new query options to Impala. One is to disable/
enable parquet page index writing (by default it is enabled), the other
is to set a row count limit per Parquet page (by default there is no
row count limit).

It removes the old command-line flag that controlled the enablement
of page index writing.

Since page index writing is the default since IMPALA-5843, I moved the
tests from the "custom cluster" test suite to the "query test" test
suite. This way the tests run faster because we don't need to restart
the Impala daemons.

Testing:
Added new test cases to test the effect of the query options.

Change-Id: Ib9ec8b16036e1fd35886e887809be8eca52a6982
Reviewed-on: http://gerrit.cloudera.org:8080/13361
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Enable better control over Parquet writing
> --
>
> Key: IMPALA-7936
> URL: https://issues.apache.org/jira/browse/IMPALA-7936
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> With the introduction of the Parquet page indexes it became desirable to have 
> more control over how Impala writes Parquet files.
> These configuration options (probably implemented as query options) would be:
>  * enable/disable Parquet page index writing (currently we can do it with a 
> command-line argument)
>  * set page-size limits based on row count
>  * -Set truncation length for statistics about string values-   (current 
> truncation length is 64, it is unlikely to have user data that needs longer 
> truncation than that)
> They'd enable writing more complete tests for page filtering. They'd be also 
> useful for fine-tuning the page index for better performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org