I made a ticket here https://issues.apache.org/jira/browse/PIG-2424 and would like to give it a shot, but I wanted to get opinions on the implementation. Extending the grammar is pretty simple, the question is how to actually implement the operation. Right now, it looks like there is
protected boolean[] isToBeFlattenedArray; in src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POForEach.java And there is a comment about why the List<Boolean> is converted to an array: because it is cheaper, which is why I am asking this question... Ignoring the overhead of objects, it seems like the cleanest way to do this would be to refactor it to be protected Flattener[] isToBeFlattened; if its null, no flattening, and if it isn't, you use the given flattener. This adds a lot more overhead than the current approach, but is a lot more extensible, as it would make adding new flatteners really really easy. On the other hand, we could add a second array: protected boolean[] isToBeOuterFlattenedArray; and basically treat it as another operator. This might be annoying since there is logic around whether or not a column needs to be flattened, and it'd have to be duplicated in two places. It'd be much cheaper. A third alternative might be protected byte[] isToBeFlattened; And then there could be a translator from the byte value to the proper flattener. Cheaper and more extensible, but at this point we're just creating our own virtual lookup and so on. Or perhaps some much more brilliant fourth suggestion :) Thoughts? Jon
