Some optimization thoughts for Hive
-----------------------------------

                 Key: HIVE-477
                 URL: https://issues.apache.org/jira/browse/HIVE-477
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: He Yongqiang


Before we can start working on Hive-461. I am doing some profiling for hive. 
And here are some thoughts for improvements:

minor :
1) add a new HiveText to replace Text. It can avoid byte copy when init 
LazyString. I have done a draft one, it shows  ~1% performance gains.
2) let StructObjectInspector's 
    {noformat}
     public List<Object> getStructFieldsDataAsList(Object data);
    {noformat}
to be 
    {noformat}
     public Object[] getStructFieldsDataAsArray(Object data);
    {noformat}

In my profile, it shows some performace gains. but in acutal execution it did 
not. Anyway, let it return java array will reduce gc's burden of collection 
ArrayList

not so minor:
3) split FileSinkOperator's Writer into another Thread. Adding a 
producer-consumer array as the bridge between the Operators thread and the 
Writer thread.
4) the operator stack is kind of deep. In order to avoid instruction cache, and 
increase the efficiency data cache. I suggest to let Hive's operator can 
process an array of rows instead of processing only one row at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to