[ 
https://issues.apache.org/jira/browse/HIVE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596843#comment-14596843
 ] 

Gopal V commented on HIVE-11043:
--------------------------------

bq. 3) ... In which case we will end up using BI as default even though there 
are only small number of files.
bq. 5) Should we make this independently configurable? Instead of using the 
cache max size.

The max cache size is a safety limit for huge clusters, it is not a 
configuration requirement.

If you need to change the behaviour explicitly, the right config to change is 
the strategy used (between ETL/BI) to select whichever one's the preferred one.

> ORC split strategies should adapt based on number of files
> ----------------------------------------------------------
>
>                 Key: HIVE-11043
>                 URL: https://issues.apache.org/jira/browse/HIVE-11043
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Gopal V
>             Fix For: 2.0.0
>
>         Attachments: HIVE-11043.1.patch
>
>
> ORC split strategies added in HIVE-10114 chose strategies based on average 
> file size. It would be beneficial to choose a different strategy based on 
> number of files as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to