[jira] [Created] (PIG-3313) pig job hang if the job tracker is bounced during execution
Chenjie Yu created PIG-3313: --- Summary: pig job hang if the job tracker is bounced during execution Key: PIG-3313 URL: https://issues.apache.org/jira/browse/PIG-3313 Project: Pig Issue Type: Bug Reporter: Chenjie Yu When running a pig job through PigRunner, after the mapreduce job is submitted, if there is a job tracker bounce which doesn't get back up very soon, the pig job will hang there. The reason is pig is keeping all the JobControl objects, which are non-deamon threads, that keeps connecting to jobtracker. If the job tracker is down, pig will fail, but those jobcontrol threads keep running and there is no one who can stop them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (22 issues) Subscriber: pigdaily Key Summary PIG-3297Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc https://issues.apache.org/jira/browse/PIG-3297 PIG-3295Casting from bytearray failing after Union (even when each field is from a single Loader) https://issues.apache.org/jira/browse/PIG-3295 PIG-3291TestExampleGenerator fails on Windows because of lack of file name escaping https://issues.apache.org/jira/browse/PIG-3291 PIG-3285Jobs using HBaseStorage fail to ship dependency jars https://issues.apache.org/jira/browse/PIG-3285 PIG-3258Patch to allow MultiStorage to use more than one index to generate output tree https://issues.apache.org/jira/browse/PIG-3258 PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3199Expose LogicalPlan via PigServer API https://issues.apache.org/jira/browse/PIG-3199 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification https://issues.apache.org/jira/browse/PIG-3025 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2248Pig parser does not detect when a macro name masks a UDF name https://issues.apache.org/jira/browse/PIG-2248 PIG-2244Macros cannot be passed relation names https://issues.apache.org/jira/browse/PIG-2244 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650111#comment-13650111 ] Daniel Dai commented on PIG-3307: - That's what I am expecting. Would love to see some performance data with this approach. > Refactor physical operators to remove methods parameters that are always null > - > > Key: PIG-3307 > URL: https://issues.apache.org/jira/browse/PIG-3307 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch > > > The physical operators are sometimes overly complex. I'm trying to cleanup > some unnecessary code. > in particular there is an array of getNext(*T* v) where the value v does not > seem to have any importance and is just used to pick the correct method. > I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: A major addition to Pig. Working with spatial data
Nick: the only issue is that the way types are implemented in Pig don't allow us to easily "plug-in" types externally. Adding support for that would be cool, but a fair bit of work. 2013/5/6 Nick Dimiduk > I'm to a lawyer, but I see no reason why this cannot be an external > extension to Pig. It would behave the same way PostGIS is an external > extension to Postgres. Any Apache issues would be toward general > purpose enhancements, not specific to your project. > > Good on you! > -n > > On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy wrote: > > > I contacted solr developers to see how JTS can be included in an Apache > > project. See > > > > > http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/ > > As far as I understand, they did not include it in the main solr project, > > rather, they created a separate project (spatial 4j) which is still > > licensed under Apache license and refers to JTS. Users will have to > > download JTS libraries separately to make it run. That's pretty much the > > same plan that Jonathan mentioned. We will still have the overhead of > > serializing/deserializing the shapes each time a function is called. > Also, > > we will have to use the ugly bytearray data type for spatial data instead > > of creating its own data type (e.g., Geometry). > > I think using spatial 4j instead of JTS will not be sufficient for our > case > > as we need to provide an access to all spatial functions of JTS such as > > Union, Intersection, Difference, ... etc. This way we can claim > conformity > > with OGC standards which gives visibility and appreciations of the > spatial > > community. > > I think also that this means I will not add any issues to JIRA as it is > now > > a separate project. I'm planning to host it on github and have all the > > issues there. > > Let me know if you have any suggestions or comments. > > > > Thanks > > Ahmed > > > > > > Best regards, > > Ahmed Eldawy > > > > > > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney > > wrote: > > > > > You can give them all the same label or tag and filter on that later > on. > > > > > > > > > 2013/5/6 Ahmed Eldawy > > > > > > > Thanks all for taking the time to respond. Danial, I didn't know that > > > Solr > > > > uses JTS. This is a good finding and we can definitely ask them to > see > > if > > > > there is a work around we can do. Jonathan, I thought of the same > idea > > of > > > > serializing/deserializing a bytearray each time a UDF is called. The > > > > deserialization part is good for letting Pig auto detect spatial > types > > if > > > > not set explicitly in the schema. What is the best way to start > this? I > > > > want to add an initial set of JIRA issues and start working on them > > but I > > > > also need to keep the work grouped in some sense just for > organization. > > > > > > > > Thanks > > > > Ahmed > > > > > > > > Best regards, > > > > Ahmed Eldawy > > > > > > > > > > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney > > > > > wrote: > > > > > > > > > I agree that this is cool, and if other projects are using JTS it > is > > > > worth > > > > > talking them to see how. I also agree that licensing is very > > > frustrating. > > > > > > > > > > In the short term, however, while it is annoying to have to manage > > the > > > > > serialization and deserialization yourself, you can have the > geometry > > > > type > > > > > be passed around as a bytearray type. Your UDF's will have to know > > this > > > > and > > > > > treat it accordingly, but if you did this then all of the tools > could > > > be > > > > in > > > > > an external project on github instead of a branch in Pig. Then, if > we > > > can > > > > > get the licensing done, we could add the Geometry type to Pig. > Adding > > > > > types, honestly, is kind of tedious but not super difficult, so > once > > > the > > > > > rest is done, that shouldn't be too difficult. > > > > > > > > > > > > > > > 2013/5/4 Russell Jurney > > > > > > > > > > > If a way could be found, this would be an awesome addition to > Pig. > > > > > > > > > > > > Russell Jurney http://datasyndrome.com > > > > > > > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai > > > wrote: > > > > > > > > > > > > > I am not sure how other Apache projects dealing with it? Seems > > Solr > > > > > also > > > > > > > has some connector to JTS? > > > > > > > > > > > > > > Thanks, > > > > > > > Daniel > > > > > > > > > > > > > > > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy < > > aseld...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > >> Thanks Alan for your interest. It's too bad that an open > source > > > > > > licensing > > > > > > >> issue is holding me back from doing some open source work. I > > > > > understand > > > > > > the > > > > > > >> issue and your workarounds make sense. However, as I mentioned > > in > > > > the > > > > > > >> beginning, I don't want to have my own branch of Pig because > it
Re: A major addition to Pig. Working with spatial data
I'm to a lawyer, but I see no reason why this cannot be an external extension to Pig. It would behave the same way PostGIS is an external extension to Postgres. Any Apache issues would be toward general purpose enhancements, not specific to your project. Good on you! -n On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy wrote: > I contacted solr developers to see how JTS can be included in an Apache > project. See > > http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/ > As far as I understand, they did not include it in the main solr project, > rather, they created a separate project (spatial 4j) which is still > licensed under Apache license and refers to JTS. Users will have to > download JTS libraries separately to make it run. That's pretty much the > same plan that Jonathan mentioned. We will still have the overhead of > serializing/deserializing the shapes each time a function is called. Also, > we will have to use the ugly bytearray data type for spatial data instead > of creating its own data type (e.g., Geometry). > I think using spatial 4j instead of JTS will not be sufficient for our case > as we need to provide an access to all spatial functions of JTS such as > Union, Intersection, Difference, ... etc. This way we can claim conformity > with OGC standards which gives visibility and appreciations of the spatial > community. > I think also that this means I will not add any issues to JIRA as it is now > a separate project. I'm planning to host it on github and have all the > issues there. > Let me know if you have any suggestions or comments. > > Thanks > Ahmed > > > Best regards, > Ahmed Eldawy > > > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney > wrote: > > > You can give them all the same label or tag and filter on that later on. > > > > > > 2013/5/6 Ahmed Eldawy > > > > > Thanks all for taking the time to respond. Danial, I didn't know that > > Solr > > > uses JTS. This is a good finding and we can definitely ask them to see > if > > > there is a work around we can do. Jonathan, I thought of the same idea > of > > > serializing/deserializing a bytearray each time a UDF is called. The > > > deserialization part is good for letting Pig auto detect spatial types > if > > > not set explicitly in the schema. What is the best way to start this? I > > > want to add an initial set of JIRA issues and start working on them > but I > > > also need to keep the work grouped in some sense just for organization. > > > > > > Thanks > > > Ahmed > > > > > > Best regards, > > > Ahmed Eldawy > > > > > > > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney > > > wrote: > > > > > > > I agree that this is cool, and if other projects are using JTS it is > > > worth > > > > talking them to see how. I also agree that licensing is very > > frustrating. > > > > > > > > In the short term, however, while it is annoying to have to manage > the > > > > serialization and deserialization yourself, you can have the geometry > > > type > > > > be passed around as a bytearray type. Your UDF's will have to know > this > > > and > > > > treat it accordingly, but if you did this then all of the tools could > > be > > > in > > > > an external project on github instead of a branch in Pig. Then, if we > > can > > > > get the licensing done, we could add the Geometry type to Pig. Adding > > > > types, honestly, is kind of tedious but not super difficult, so once > > the > > > > rest is done, that shouldn't be too difficult. > > > > > > > > > > > > 2013/5/4 Russell Jurney > > > > > > > > > If a way could be found, this would be an awesome addition to Pig. > > > > > > > > > > Russell Jurney http://datasyndrome.com > > > > > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai > > wrote: > > > > > > > > > > > I am not sure how other Apache projects dealing with it? Seems > Solr > > > > also > > > > > > has some connector to JTS? > > > > > > > > > > > > Thanks, > > > > > > Daniel > > > > > > > > > > > > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy < > aseld...@gmail.com> > > > > > wrote: > > > > > > > > > > > >> Thanks Alan for your interest. It's too bad that an open source > > > > > licensing > > > > > >> issue is holding me back from doing some open source work. I > > > > understand > > > > > the > > > > > >> issue and your workarounds make sense. However, as I mentioned > in > > > the > > > > > >> beginning, I don't want to have my own branch of Pig because it > > > makes > > > > my > > > > > >> extension less portable. I'll think of another way to do it. > I'll > > > ask > > > > > vivid > > > > > >> solutions if they can double license their code although I think > > the > > > > > answer > > > > > >> will be no. I'll also think of a way to ship my extension as a > set > > > of > > > > > jar > > > > > >> files without the need to change the core of Pig. This way, it > can > > > be > > > > > >> easily ported to newer versions of Pig. > > > > > >> > > >
[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null
[ https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650021#comment-13650021 ] Julien Le Dem commented on PIG-3307: [~daijy] This is removing parameters that were not used. I have not tested performance but I think it could only improve performance. (see latest patch PIG-3307_2.patch) > Refactor physical operators to remove methods parameters that are always null > - > > Key: PIG-3307 > URL: https://issues.apache.org/jira/browse/PIG-3307 > Project: Pig > Issue Type: Improvement >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch > > > The physical operators are sometimes overly complex. I'm trying to cleanup > some unnecessary code. > in particular there is an array of getNext(*T* v) where the value v does not > seem to have any importance and is just used to pick the correct method. > I have started a refactoring for a more readable getNext*T*(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars
[ https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649958#comment-13649958 ] Daniel Dai commented on PIG-3285: - If HBase adding another method only adding hbase.jar/guava.jar/protobuf.jar etc(but not inputformat/outputformat jar), Pig can switch to use that. But for now, TableMapReduce.addDependencyJars(job) is not what we want. > Jobs using HBaseStorage fail to ship dependency jars > > > Key: PIG-3285 > URL: https://issues.apache.org/jira/browse/PIG-3285 > Project: Pig > Issue Type: Bug >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.11.1 > > Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, > 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig > > > Launching a job consuming {{HBaseStorage}} fails out of the box. The user > must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. > Exceptions look something like this: > {noformat} > 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running > child : java.lang.NoClassDefFoundError: com/google/protobuf/Message > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266) > at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84) > at $Proxy7.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136) > at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649924#comment-13649924 ] Rohini Palaniswamy commented on PIG-1824: - to use a jython install, the Lib dir must be in the jython search path * via env variable JYTHON_HOME=jy_home or JYTHON_PATH=jy_home/Lib:... or * jython-standalone.jar should be in the classpath > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10.0 > > Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, > 1824_final.patch, 1824.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1824) Support import modules in Jython UDF
[ https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649920#comment-13649920 ] Rohini Palaniswamy commented on PIG-1824: - You need to have jython/Lib directory in the classpath. We bundle it with our deployment. Else need to have jython-standalone.jar instead of jython.jar as in Pig 0.11. > Support import modules in Jython UDF > > > Key: PIG-1824 > URL: https://issues.apache.org/jira/browse/PIG-1824 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.8.0, 0.9.0 >Reporter: Richard Ding >Assignee: Woody Anderson > Fix For: 0.10.0 > > Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, > 1824_final.patch, 1824.patch, 1824x.patch, > TEST-org.apache.pig.test.TestGrunt.txt, > TEST-org.apache.pig.test.TestScriptLanguage.txt, > TEST-org.apache.pig.test.TestScriptUDF.txt > > > Currently, Jython UDF script doesn't support Jython import statement as in > the following example: > {code} > #!/usr/bin/python > import re > @outputSchema("word:chararray") > def resplit(content, regex, index): > return re.compile(regex).split(content)[index] > {code} > Can Pig automatically locate the Jython module file and ship it to the > backend? Or should we add a ship clause to let user explicitly specify the > module to ship? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem resolved PIG-3311. Resolution: Fixed Fix Version/s: 0.12 > add pig-withouthadoop-h2 to mvn-jar > --- > > Key: PIG-3311 > URL: https://issues.apache.org/jira/browse/PIG-3311 > Project: Pig > Issue Type: Improvement > Components: build >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Fix For: 0.12 > > Attachments: PIG-3311.patch > > > mvn-jar currently creates pig-version.jar and pig-version-h2.jar > I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar > that are needed to run pig from the command line. > This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar
[ https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649907#comment-13649907 ] Julien Le Dem commented on PIG-3311: Committed to TRUNK > add pig-withouthadoop-h2 to mvn-jar > --- > > Key: PIG-3311 > URL: https://issues.apache.org/jira/browse/PIG-3311 > Project: Pig > Issue Type: Improvement > Components: build >Reporter: Julien Le Dem >Assignee: Julien Le Dem > Attachments: PIG-3311.patch > > > mvn-jar currently creates pig-version.jar and pig-version-h2.jar > I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar > that are needed to run pig from the command line. > This will allow a dual-version package. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
dev@pig.apache.org
[ https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-3293: -- Attachment: pig-3293-test-only-v01.patch bq. Must be the "caster" in D's POCast is null. Can you attach MyLoader? Attaching a test case using {noformat} public class PigStorageWithStatistics extends PigStorage { {noformat} from org.apache.pig.test. Even though both PigStorage and PigStorageWithStatistics returns Utf8StorageConverter, testcase fails with "Cannot determine how to convert the bytearray to string." Note that I created PIG-3295 for dealing with the case when casting fails even when union comes from the same loader. Figuring out if the loaders were same was easy with calling 'equals' for the FuncSpec instances. I don't know how to achieve this easily for comparing casters. > Casting fails after Union from two data sources&loaders > --- > > Key: PIG-3293 > URL: https://issues.apache.org/jira/browse/PIG-3293 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Priority: Minor > Attachments: pig-3293-test-only-v01.patch > > > Script similar to > {noformat} > A = load 'data1' using MyLoader() as (a:bytearray); > B = load 'data2' as (a:bytearray); > C = union onschema A,B; > D = foreach C generate (chararray)a; > Store D into './out'; > {noformat} > fails with >java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: > ERROR 1075: Received a bytearray from the UDF. Cannot determine how to > convert the bytearray to string. > Both MyLoader and PigStorage use the default Utf8StorageConverter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: A major addition to Pig. Working with spatial data
I contacted solr developers to see how JTS can be included in an Apache project. See http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/ As far as I understand, they did not include it in the main solr project, rather, they created a separate project (spatial 4j) which is still licensed under Apache license and refers to JTS. Users will have to download JTS libraries separately to make it run. That's pretty much the same plan that Jonathan mentioned. We will still have the overhead of serializing/deserializing the shapes each time a function is called. Also, we will have to use the ugly bytearray data type for spatial data instead of creating its own data type (e.g., Geometry). I think using spatial 4j instead of JTS will not be sufficient for our case as we need to provide an access to all spatial functions of JTS such as Union, Intersection, Difference, ... etc. This way we can claim conformity with OGC standards which gives visibility and appreciations of the spatial community. I think also that this means I will not add any issues to JIRA as it is now a separate project. I'm planning to host it on github and have all the issues there. Let me know if you have any suggestions or comments. Thanks Ahmed Best regards, Ahmed Eldawy On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney wrote: > You can give them all the same label or tag and filter on that later on. > > > 2013/5/6 Ahmed Eldawy > > > Thanks all for taking the time to respond. Danial, I didn't know that > Solr > > uses JTS. This is a good finding and we can definitely ask them to see if > > there is a work around we can do. Jonathan, I thought of the same idea of > > serializing/deserializing a bytearray each time a UDF is called. The > > deserialization part is good for letting Pig auto detect spatial types if > > not set explicitly in the schema. What is the best way to start this? I > > want to add an initial set of JIRA issues and start working on them but I > > also need to keep the work grouped in some sense just for organization. > > > > Thanks > > Ahmed > > > > Best regards, > > Ahmed Eldawy > > > > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney > > wrote: > > > > > I agree that this is cool, and if other projects are using JTS it is > > worth > > > talking them to see how. I also agree that licensing is very > frustrating. > > > > > > In the short term, however, while it is annoying to have to manage the > > > serialization and deserialization yourself, you can have the geometry > > type > > > be passed around as a bytearray type. Your UDF's will have to know this > > and > > > treat it accordingly, but if you did this then all of the tools could > be > > in > > > an external project on github instead of a branch in Pig. Then, if we > can > > > get the licensing done, we could add the Geometry type to Pig. Adding > > > types, honestly, is kind of tedious but not super difficult, so once > the > > > rest is done, that shouldn't be too difficult. > > > > > > > > > 2013/5/4 Russell Jurney > > > > > > > If a way could be found, this would be an awesome addition to Pig. > > > > > > > > Russell Jurney http://datasyndrome.com > > > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai > wrote: > > > > > > > > > I am not sure how other Apache projects dealing with it? Seems Solr > > > also > > > > > has some connector to JTS? > > > > > > > > > > Thanks, > > > > > Daniel > > > > > > > > > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy > > > > wrote: > > > > > > > > > >> Thanks Alan for your interest. It's too bad that an open source > > > > licensing > > > > >> issue is holding me back from doing some open source work. I > > > understand > > > > the > > > > >> issue and your workarounds make sense. However, as I mentioned in > > the > > > > >> beginning, I don't want to have my own branch of Pig because it > > makes > > > my > > > > >> extension less portable. I'll think of another way to do it. I'll > > ask > > > > vivid > > > > >> solutions if they can double license their code although I think > the > > > > answer > > > > >> will be no. I'll also think of a way to ship my extension as a set > > of > > > > jar > > > > >> files without the need to change the core of Pig. This way, it can > > be > > > > >> easily ported to newer versions of Pig. > > > > >> > > > > >> Thanks > > > > >> Ahmed > > > > >> > > > > >> Best regards, > > > > >> Ahmed Eldawy > > > > >> > > > > >> > > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates < > ga...@hortonworks.com> > > > > wrote: > > > > >> > > > > >>> I know this is frustrating, but the different licenses do have > > > > different > > > > >>> requirements that make it so that Apache can't ship GPL code. A > > > legal > > > > >>> explanation is at > > > > http://www.apache.org/licenses/GPL-compatibility.htmlFor additional > > info > > > > on the LGPL specific questions see > > > > >>> http://www.apache.org/legal/3party.html
[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars
[ https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649875#comment-13649875 ] Nick Dimiduk commented on PIG-3285: --- I don't want this to become a game of dependency tracking across projects. That said, HBase doesn't add dependencies all that often, so maybe it doesn't matter in practice. > Jobs using HBaseStorage fail to ship dependency jars > > > Key: PIG-3285 > URL: https://issues.apache.org/jira/browse/PIG-3285 > Project: Pig > Issue Type: Bug >Reporter: Nick Dimiduk >Assignee: Nick Dimiduk > Fix For: 0.11.1 > > Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, > 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig > > > Launching a job consuming {{HBaseStorage}} fails out of the box. The user > must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. > Exceptions look something like this: > {noformat} > 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running > child : java.lang.NoClassDefFoundError: com/google/protobuf/Message > at > org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266) > at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139) > at > org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612) > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84) > at $Proxy7.getProtocolVersion(Unknown Source) > at > org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136) > at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2315) Make as clause work in generate
[ https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649851#comment-13649851 ] Ruslan Al-Fakikh commented on PIG-2315: --- Daniel, I think that deprecating/removing the "cast in the as clause" is easier, because it is not working anyway. I guess the "()" should stay. I also have a suggestion to make this issue a duplicate of PIG-2216 instead of PIG-2216 being a duplicate of this issue. It seems that the description of PIG-2216 explains just everything and does not cause confusion. > Make as clause work in generate > --- > > Key: PIG-2315 > URL: https://issues.apache.org/jira/browse/PIG-2315 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Gianmarco De Francisci Morales > Fix For: 0.12 > > > Currently, the following syntax is supported and ignored causing confusing > with users: > A1 = foreach A1 generate a as a:chararray ; > After this statement a just retains its previous type -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig package supporting both hadoop 1 and 2
Thanks Rohini, I was thinking of using "hadoop version" to decide. Do you think there's a difference? Julien On Mon, May 6, 2013 at 7:06 AM, Rohini Palaniswamy wrote: > Hi Julien, >We use a perl script internally instead of bin/pig shell script which > has some Y! deployment stuff. That is why changes have not been done to > bin/pig to support both versions already. We have two lib directories - one > for hadoop20, other for hadoop23(hadoop2) and choose the one to have in > classpath based on presence of hadoop-core.jar. > > my $hadoopClasspath = `$hadoop_cmd classpath`; > > my $jarDir = "$pigJarRoot/lib-hadoop23"; > > if ($hadoopClasspath =~ m/hadoop-core-/) { > >$jarDir = "$pigJarRoot/lib-hadoop20"; > > } > > Regards, > Rohini > > > > On Fri, May 3, 2013 at 3:40 PM, Julien Le Dem wrote: > > > Hi Pig developers, > > I'm looking into having a Pig package that works both for Hadoop 1.0 and > > Hadoop 2.0 > > That means have both pig*.jar and pig*-h2.jar in the package and choosing > > the right one dynamically. > > In particular I created this JIRA as a first step: > > https://issues.apache.org/jira/browse/PIG-3311 > > I'm curious to know how others do it. > > Yahoo! for example? (Rohini?) > > Thanks, > > Julien > > > > >
Re: CHANGES.txt in trunk
Cool, just wanted to make sure. I agree this is a good idea. Alan. On May 5, 2013, at 7:06 PM, Rohini Palaniswamy wrote: > Alan, > I meant relocating only - Moving jiras from 0.12 to 0.11.x releases > section :). > > Regards, > Rohini > > > On Fri, May 3, 2013 at 3:08 PM, Alan Gates wrote: > >> What do mean by remove? They should still be in the file. They may need >> to be relocated under the 0.11 section. But the trunk CHANGES file should >> include all changes that are on trunk. >> >> Alan. >> >> On May 3, 2013, at 1:34 PM, Rohini Palaniswamy wrote: >> >>> Hi, >>> I see lot of patches that went into 0.11 are under trunk in the >>> CHANGES.txt. Should we sync the file with the CHANGES.txt in branch-0.11 >>> and remove those jiras from trunk that went into 0.11? What is the usual >>> process of updating CHANGES.txt when a jira is checked both into a branch >>> and also trunk? >>> >>> Regards, >>> Rohini >> >>
Re: A major addition to Pig. Working with spatial data
You can give them all the same label or tag and filter on that later on. 2013/5/6 Ahmed Eldawy > Thanks all for taking the time to respond. Danial, I didn't know that Solr > uses JTS. This is a good finding and we can definitely ask them to see if > there is a work around we can do. Jonathan, I thought of the same idea of > serializing/deserializing a bytearray each time a UDF is called. The > deserialization part is good for letting Pig auto detect spatial types if > not set explicitly in the schema. What is the best way to start this? I > want to add an initial set of JIRA issues and start working on them but I > also need to keep the work grouped in some sense just for organization. > > Thanks > Ahmed > > Best regards, > Ahmed Eldawy > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney > wrote: > > > I agree that this is cool, and if other projects are using JTS it is > worth > > talking them to see how. I also agree that licensing is very frustrating. > > > > In the short term, however, while it is annoying to have to manage the > > serialization and deserialization yourself, you can have the geometry > type > > be passed around as a bytearray type. Your UDF's will have to know this > and > > treat it accordingly, but if you did this then all of the tools could be > in > > an external project on github instead of a branch in Pig. Then, if we can > > get the licensing done, we could add the Geometry type to Pig. Adding > > types, honestly, is kind of tedious but not super difficult, so once the > > rest is done, that shouldn't be too difficult. > > > > > > 2013/5/4 Russell Jurney > > > > > If a way could be found, this would be an awesome addition to Pig. > > > > > > Russell Jurney http://datasyndrome.com > > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai wrote: > > > > > > > I am not sure how other Apache projects dealing with it? Seems Solr > > also > > > > has some connector to JTS? > > > > > > > > Thanks, > > > > Daniel > > > > > > > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy > > > wrote: > > > > > > > >> Thanks Alan for your interest. It's too bad that an open source > > > licensing > > > >> issue is holding me back from doing some open source work. I > > understand > > > the > > > >> issue and your workarounds make sense. However, as I mentioned in > the > > > >> beginning, I don't want to have my own branch of Pig because it > makes > > my > > > >> extension less portable. I'll think of another way to do it. I'll > ask > > > vivid > > > >> solutions if they can double license their code although I think the > > > answer > > > >> will be no. I'll also think of a way to ship my extension as a set > of > > > jar > > > >> files without the need to change the core of Pig. This way, it can > be > > > >> easily ported to newer versions of Pig. > > > >> > > > >> Thanks > > > >> Ahmed > > > >> > > > >> Best regards, > > > >> Ahmed Eldawy > > > >> > > > >> > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates > > > wrote: > > > >> > > > >>> I know this is frustrating, but the different licenses do have > > > different > > > >>> requirements that make it so that Apache can't ship GPL code. A > > legal > > > >>> explanation is at > > > http://www.apache.org/licenses/GPL-compatibility.htmlFor additional > info > > > on the LGPL specific questions see > > > >>> http://www.apache.org/legal/3party.html > > > >>> > > > >>> As far as pulling it in via ivy, the issue isn't so much where the > > code > > > >>> lives as much as what code we are requiring to make Pig work. If > > > >> something > > > >>> that is [L]GPL is required for Pig it violates Apache rules as > > outlined > > > >>> above. It also would be a show stopper for a lot of companies that > > > >>> redistribute Pig and that are allergic to GPL software. > > > >>> > > > >>> So, as I said before, if you wanted to continue with that library > and > > > >> they > > > >>> are not willing to relicense it then it would have to be bolted on > > > after > > > >>> Apache Pig is built. Nothing stops you from doing this by > > downloading > > > >>> Apache Pig, adding this library and your code, and redistributing, > > > though > > > >>> it wouldn't then be open to all Pig users. > > > >>> > > > >>> Alan. > > > >>> > > > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote: > > > >>> > > > Thanks for your response. I was never good at differentiating all > > > those > > > open source licenses. I mean what is the point making open source > > > >>> licenses > > > if it blocks me from using a library in an open source project. > Any > > > >> way, > > > I'm not going into debate here. Just one question, if we use JTS > as > > a > > > library (jar file) without adding the code in Pig, is it still a > > > >>> violation? > > > We'll use ivy, for example, to download the jar file when > compiling. > > > On May 1, 2013 7:50 PM, "Alan Gates" > wrote: > > > > > > > Passing on the technical details for a mom
Re: Pig package supporting both hadoop 1 and 2
Hi Julien, We use a perl script internally instead of bin/pig shell script which has some Y! deployment stuff. That is why changes have not been done to bin/pig to support both versions already. We have two lib directories - one for hadoop20, other for hadoop23(hadoop2) and choose the one to have in classpath based on presence of hadoop-core.jar. my $hadoopClasspath = `$hadoop_cmd classpath`; my $jarDir = "$pigJarRoot/lib-hadoop23"; if ($hadoopClasspath =~ m/hadoop-core-/) { $jarDir = "$pigJarRoot/lib-hadoop20"; } Regards, Rohini On Fri, May 3, 2013 at 3:40 PM, Julien Le Dem wrote: > Hi Pig developers, > I'm looking into having a Pig package that works both for Hadoop 1.0 and > Hadoop 2.0 > That means have both pig*.jar and pig*-h2.jar in the package and choosing > the right one dynamically. > In particular I created this JIRA as a first step: > https://issues.apache.org/jira/browse/PIG-3311 > I'm curious to know how others do it. > Yahoo! for example? (Rohini?) > Thanks, > Julien > >