optimize split sizes automatically taking into account amount of nature of map tasks ------------------------------------------------------------------------------------
Key: HIVE-1516 URL: https://issues.apache.org/jira/browse/HIVE-1516 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Joydeep Sen Sarma two immediate cases come to mind: - pure filter job (ie. no map-side sort required) - full aggregate computations only (like count(1)). in these cases - the amount of data to be sorted is zero or negligible. so mapper parallelism (and split size) should be dictated by the size of the cluster. there's no point running 10000 mappers on a 500 node cluster for a pure filter job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.