[ https://issues.apache.org/jira/browse/PIG-2824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447047#comment-13447047 ]
Dmitriy V. Ryaboy commented on PIG-2824: ---------------------------------------- Jie, that's a good catch and a nice perf improvement, but the solution seems a bit heavyweight. What if we instead modified POLoad to automatically perform this check, and be aware of expected schemas? > Pushing checking number of fields into LoadFunc > ----------------------------------------------- > > Key: PIG-2824 > URL: https://issues.apache.org/jira/browse/PIG-2824 > Project: Pig > Issue Type: Improvement > Affects Versions: 0.9.0, 0.10.0 > Reporter: Jie Li > Attachments: 2824.patch, 2824.png > > > As described in PIG-1188, if users define a schema (w or w/o types), we need > to check the number of fields after loading data, so if there are less fields > we need to pad null fields, and if there are more fields we need to throw > them away. > For schema with types, Pig used to insert a Foreach after the loader for type > casting which also checks #fields. For schema without types there was no such > Foreach, thus PIG-1188 inserted one just for checking #fields. Unfortunately, > Foreach is too expensive for such checking, and ideally we can push it into > the loader. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira