[ 
https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889579#action_12889579
 ] 

Joydeep Sen Sarma commented on HIVE-1468:
-----------------------------------------

yes - it does make sense to differentiate result data from intermediate. if 
anything - there's probably a good argument to be made that we don't need a 
separate option for intermediate compression. it should default to whatever 
policy is being applied for map-reduce intermediate traffic. (that would be a 
better default than either true or false - that way admins have one less option 
to get right).

interestingly - result data also needs minimal replication. the client is 
single threaded and cannot exploit multiple replicas for bandwidth purposes. 
also - the data is temporary in nature and doesn't need reliability.



> intermediate data produced for select queries ignores 
> hive.exec.compress.intermediate
> -------------------------------------------------------------------------------------
>
>                 Key: HIVE-1468
>                 URL: https://issues.apache.org/jira/browse/HIVE-1468
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Joydeep Sen Sarma
>
> > set hive.exec.compress.intermediate=false;
> > explain extended select xxx from yyy;
>     ...
>             File Output Operator
>               compressed: true
>               GlobalTableId: 0
> looks like we only intermediate locations identified during splitting mr 
> tasks follow this directive. this should be fixed because this forces clients 
> to always decompress output data (even if the config setting is altered).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to