[
https://issues.apache.org/jira/browse/PIG-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784468#action_12784468
]
Chao Wang commented on PIG-1098:
--------------------------------
Ideally, should have a better structure for methods such as: advance(),
advanceCG(), getKey(), getCGKey(), getValue(), getCGValue() (ColumnGroup.java).
The only difference of new *CG* methods is that they do not do the check "if
(atEnd())". This gives some performance gain while degrading code readability a
bit.
Considering this is the first cut for performance improvement and all the above
changes are inside ColumnGroup class, which is package private, as a result,
these are Zebra's internal implementation details and we can safely improve
them in the future, overall +1
> [zebra] Zebra Performance Optimizations
> ---------------------------------------
>
> Key: PIG-1098
> URL: https://issues.apache.org/jira/browse/PIG-1098
> Project: Pig
> Issue Type: Improvement
> Reporter: Yan Zhou
> Assignee: Yan Zhou
> Priority: Minor
> Fix For: 0.6.0, 0.7.0
>
> Attachments: PIG-1098.patch
>
>
> Many in-core performance optimization opportunities exist in zebra, such as
> removal of redundant precautionary checks, use of better collection types to
> reduce levels of indirection to the memory objects, changing of input splits
> in ascending sizes to descending sizes. Observed improvements of wall clock
> time of some PIG LOAD queries are around 10%.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.