Hi,

I was reading through the PTFOperator and related code and was wondering if
there is an opportunity to optimize this function in
WindowingTableFunction.java

  public void execute(PTFPartitionIterator<Object> pItr, PTFPartition
outP) throws HiveException {

 This guy iterates over the input partition once to compute outputColumns.
This causes a full read of input partition.

It then iterates over input partition again to append newly computed
values. This causes another read of input partition and a write to output
partition.

I was wondering if it may be more efficient to append to the output
partition as soon as window expressions have been computed. This will avoid
one scan of the input partition.

FYI - I've been looking at hive 0.13 code mostly but a glance at trunk
suggests this logic is the same there.

Thanks,

Siva

Reply via email to