I made a ticket here https://issues.apache.org/jira/browse/PIG-2424 and
would like to give it a shot, but I wanted to get opinions on the
implementation. Extending the grammar is pretty simple, the question is how
to actually implement the operation. Right now, it looks like there is

protected boolean[] isToBeFlattenedArray;

in

src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java

And there is a comment about why the List<Boolean> is converted to an
array: because it is cheaper, which is why I am asking this question...

Ignoring the overhead of objects, it seems like the cleanest way to do this
would be to refactor it to be

protected Flattener[] isToBeFlattened;

if its null, no flattening, and if it isn't, you use the given flattener.
This adds a lot more overhead than the current approach, but is a lot more
extensible, as it would make adding new flatteners really really easy.

On the other hand, we could add a second array:
protected boolean[] isToBeOuterFlattenedArray;

and basically treat it as another operator. This might be annoying since
there is logic around whether or not a column needs to be flattened, and
it'd have to be duplicated in two places. It'd be much cheaper.

A third alternative might be
protected byte[] isToBeFlattened;

And then there could be a translator from the byte value to the proper
flattener. Cheaper and more extensible, but at this point we're just
creating our own virtual lookup and so on.

Or perhaps some much more brilliant fourth suggestion :)

Thoughts?
Jon

Reply via email to