Submitted a patch for this: https://issues.apache.org/jira/browse/PIG-3359
On Wed, Jun 19, 2013 at 5:36 PM, Rohini Palaniswamy <rohini.adi...@gmail.com > wrote: > Jon is right. I am trying to ensure that each line is mostly parsed only > once in https://issues.apache.org/jira/browse/PIG-3204. Have few issues > with other commands in pig script like fs, shell, cd, illustrate, error > messages not showing line numbers properly, etc which I have not got to > solving yet. Register and import command file localization should be done > only once with my patch as the parsing happens only once (unless you have > fs or sh commands). Will check it out to be doubly sure. > > Regards, > Rohini > > > On Wed, Jun 19, 2013 at 1:03 PM, Jonathan Coveney <jcove...@gmail.com > >wrote: > > > This sounds excellent! Would love to see this in trunk. > > > > As far as #3, this is probably because pig does essentially reparse > > everything with each new line. I know there is a ticket where Rohini > dealt > > with this in some cases where HDFS was being hit multiple times because > of > > load statements getting reparsed, but I'm not sure if remote imports are > > fixed by that patch as well. > > > > > > 2013/6/19 Jonathan Packer <jpac...@mortardata.com> > > > > > Hi, I'm an engineer at Mortar Data. I was working on some features to > > > improve macros that I'd like to contribute (we're hoping to build a > > library > > > of reusable pig macros implementing common algorithms), but I wanted to > > > check-in here first to see if anyone has concerns about the changes I'd > > be > > > making. > > > > > > The changes I've implemented are: > > > > > > 1. Macro files can register jars and udfs (avoiding namespace > > conflicts > > > is the user's responsibility) > > > 2. Macro files can be be redundantly imported (the extra import > > > statements will be ignored). The use case is pigscript A imports > macro > > > files A and B, but A also imports B. Pig will emit a warning, but > not > > > fail > > > as it currently does. > > > 3. Registers and imports from S3 aren't repeatedly downloaded as a > > > pigscript is parsed. I'm not sure why it was doing this in the first > > > place, > > > but it looked like a query was being assembled line-by-line and > every > > > time > > > it would re-download jars etc. > > > > > > I was working on our fork of 0.9.2 with modifications, so please let me > > > know if any of these have already been fixed in the latest version. > > > > > > Thanks, > > > Jonathan Packer > > > > > >