[ 
https://issues.apache.org/jira/browse/PIG-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784468#action_12784468
 ] 

Chao Wang commented on PIG-1098:
--------------------------------

Ideally, should have a better structure for methods such as: advance(), 
advanceCG(), getKey(), getCGKey(), getValue(), getCGValue() (ColumnGroup.java).
The only difference of new *CG* methods is that they do not do the check "if 
(atEnd())". This gives some performance gain while degrading code readability a 
bit.

Considering this is the first cut for performance improvement and all the above 
changes are inside ColumnGroup class, which is package private, as a result, 
these are Zebra's internal implementation details and we can safely improve 
them in the future,  overall +1







> [zebra] Zebra Performance Optimizations
> ---------------------------------------
>
>                 Key: PIG-1098
>                 URL: https://issues.apache.org/jira/browse/PIG-1098
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Yan Zhou
>            Assignee: Yan Zhou
>            Priority: Minor
>             Fix For: 0.6.0, 0.7.0
>
>         Attachments: PIG-1098.patch
>
>
> Many in-core performance optimization opportunities exist in zebra, such as 
> removal of redundant precautionary checks, use of better collection types to 
> reduce levels of indirection to the memory objects, changing of input splits 
> in ascending sizes to descending sizes. Observed improvements of wall clock 
> time of some PIG LOAD queries are around 10%.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to