PERFORMANCE: Misc. optimizations including optimization in Tuple serialization, 
set up of PigMapReduce & PigCombiner, accessing index in POLocalRearrange
---------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-628
                 URL: https://issues.apache.org/jira/browse/PIG-628
             Project: Pig
          Issue Type: Improvement
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: types_branch


- Currently DefaultTuple.write() needlessly writes a marker for null/not null. 
This is already handled by PigNullableWritable for keys and NullableTuple for 
values. Nested null tuples inside a tuple are written out as nulls in 
DataReaderWriter.writeDatum. So the null/not null marker in DefaultTuple can be 
avoided.

- In PigMapReduce and PigCombiner the roots and leaves of the plans are 
calculated in each reduce() call. Instead these can be computed in configure() 
one time.

- In each call of POLocalRearrange.getNext(), a new lroutput tuple is created 
whose first position is filled with index, second with key and third with value 
- this can be optimized by having a tuple member in POLocalRearrange which is 
reused in each getNext() call. Further, the first position of this tuple can be 
pre-filled with the index in the setIndex() method of POLocalRearrange at 
script compile time.

- In POCombinerPackage, the metadata data structures to figure out which parts 
of the value are present in the key can be set up in the setKeyInfo() method at 
compile time. This is because we currently use POCombinerPackage only with a 
"group by". Hence we don't need to look up the metadata at run time based on 
input index since there will be only one input (index = 0)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to