optimize split sizes automatically taking into account amount of nature of map 
tasks
------------------------------------------------------------------------------------

                 Key: HIVE-1516
                 URL: https://issues.apache.org/jira/browse/HIVE-1516
             Project: Hadoop Hive
          Issue Type: Improvement
          Components: Query Processor
            Reporter: Joydeep Sen Sarma


two immediate cases come to mind:
- pure filter job (ie. no map-side sort required)
- full aggregate computations only (like count(1)).

in these cases - the amount of data to be sorted is zero or negligible. so 
mapper parallelism (and split size) should be dictated by the size of the 
cluster. there's no point running 10000 mappers on a 500 node cluster for a 
pure filter job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to