[ https://issues.apache.org/jira/browse/HIVE-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889579#action_12889579 ]
Joydeep Sen Sarma commented on HIVE-1468: ----------------------------------------- yes - it does make sense to differentiate result data from intermediate. if anything - there's probably a good argument to be made that we don't need a separate option for intermediate compression. it should default to whatever policy is being applied for map-reduce intermediate traffic. (that would be a better default than either true or false - that way admins have one less option to get right). interestingly - result data also needs minimal replication. the client is single threaded and cannot exploit multiple replicas for bandwidth purposes. also - the data is temporary in nature and doesn't need reliability. > intermediate data produced for select queries ignores > hive.exec.compress.intermediate > ------------------------------------------------------------------------------------- > > Key: HIVE-1468 > URL: https://issues.apache.org/jira/browse/HIVE-1468 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor > Reporter: Joydeep Sen Sarma > > > set hive.exec.compress.intermediate=false; > > explain extended select xxx from yyy; > ... > File Output Operator > compressed: true > GlobalTableId: 0 > looks like we only intermediate locations identified during splitting mr > tasks follow this directive. this should be fixed because this forces clients > to always decompress output data (even if the config setting is altered). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.