[jira] [Created] (PIG-3313) pig job hang if the job tracker is bounced during execution

2013-05-06 Thread Chenjie Yu (JIRA)
Chenjie Yu created PIG-3313:
---

 Summary: pig job hang if the job tracker is bounced during 
execution
 Key: PIG-3313
 URL: https://issues.apache.org/jira/browse/PIG-3313
 Project: Pig
  Issue Type: Bug
Reporter: Chenjie Yu


When running a pig job through PigRunner, after the mapreduce job is submitted, 
if there is a job tracker bounce which doesn't get back up very soon, the pig 
job will hang there.
The reason is pig is keeping all the JobControl objects, which are non-deamon 
threads, that keeps connecting to jobtracker. If the job tracker is down, pig 
will fail, but those jobcontrol threads keep running and there is no one who 
can stop them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-05-06 Thread jira
Issue Subscription
Filter: PIG patch available (22 issues)

Subscriber: pigdaily

Key Summary
PIG-3297Avro files with stringType set to String cannot be read by the 
AvroStorage LoadFunc
https://issues.apache.org/jira/browse/PIG-3297
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3291TestExampleGenerator fails on Windows because of lack of file name 
escaping
https://issues.apache.org/jira/browse/PIG-3291
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-06 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650111#comment-13650111
 ] 

Daniel Dai commented on PIG-3307:
-

That's what I am expecting. Would love to see some performance data with this 
approach.

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A major addition to Pig. Working with spatial data

2013-05-06 Thread Jonathan Coveney
Nick: the only issue is that the way types are implemented in Pig don't
allow us to easily "plug-in" types externally. Adding support for that
would be cool, but a fair bit of work.


2013/5/6 Nick Dimiduk 

> I'm to a lawyer, but I see no reason why this cannot be an external
> extension to Pig. It would behave the same way PostGIS is an external
> extension to Postgres. Any Apache issues would be toward general
> purpose enhancements, not specific to your project.
>
> Good on you!
> -n
>
> On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy  wrote:
>
> > I contacted solr developers to see how JTS can be included in an Apache
> > project. See
> >
> >
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
> > As far as I understand, they did not include it in the main solr project,
> > rather, they created a separate project (spatial 4j) which is still
> > licensed under Apache license and refers to JTS. Users will have to
> > download JTS libraries separately to make it run. That's pretty much the
> > same plan that Jonathan mentioned. We will still have the overhead of
> > serializing/deserializing the shapes each time a function is called.
> Also,
> > we will have to use the ugly bytearray data type for spatial data instead
> > of creating its own data type (e.g., Geometry).
> > I think using spatial 4j instead of JTS will not be sufficient for our
> case
> > as we need to provide an access to all spatial functions of JTS such as
> > Union, Intersection, Difference, ... etc. This way we can claim
> conformity
> > with OGC standards which gives visibility and appreciations of the
> spatial
> > community.
> > I think also that this means I will not add any issues to JIRA as it is
> now
> > a separate project. I'm planning to host it on github and have all the
> > issues there.
> > Let me know if you have any suggestions or comments.
> >
> > Thanks
> > Ahmed
> >
> >
> > Best regards,
> > Ahmed Eldawy
> >
> >
> > On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney 
> > wrote:
> >
> > > You can give them all the same label or tag and filter on that later
> on.
> > >
> > >
> > > 2013/5/6 Ahmed Eldawy 
> > >
> > > > Thanks all for taking the time to respond. Danial, I didn't know that
> > > Solr
> > > > uses JTS. This is a good finding and we can definitely ask them to
> see
> > if
> > > > there is a work around we can do. Jonathan, I thought of the same
> idea
> > of
> > > > serializing/deserializing a bytearray each time a UDF is called. The
> > > > deserialization part is good for letting Pig auto detect spatial
> types
> > if
> > > > not set explicitly in the schema. What is the best way to start
> this? I
> > > > want to add an initial set of JIRA issues and start working on them
> > but I
> > > > also need to keep the work grouped in some sense just for
> organization.
> > > >
> > > > Thanks
> > > > Ahmed
> > > >
> > > > Best regards,
> > > > Ahmed Eldawy
> > > >
> > > >
> > > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney  >
> > > > wrote:
> > > >
> > > > > I agree that this is cool, and if other projects are using JTS it
> is
> > > > worth
> > > > > talking them to see how. I also agree that licensing is very
> > > frustrating.
> > > > >
> > > > > In the short term, however, while it is annoying to have to manage
> > the
> > > > > serialization and deserialization yourself, you can have the
> geometry
> > > > type
> > > > > be passed around as a bytearray type. Your UDF's will have to know
> > this
> > > > and
> > > > > treat it accordingly, but if you did this then all of the tools
> could
> > > be
> > > > in
> > > > > an external project on github instead of a branch in Pig. Then, if
> we
> > > can
> > > > > get the licensing done, we could add the Geometry type to Pig.
> Adding
> > > > > types, honestly, is kind of tedious but not super difficult, so
> once
> > > the
> > > > > rest is done, that shouldn't be too difficult.
> > > > >
> > > > >
> > > > > 2013/5/4 Russell Jurney 
> > > > >
> > > > > > If a way could be found, this would be an awesome addition to
> Pig.
> > > > > >
> > > > > > Russell Jurney http://datasyndrome.com
> > > > > >
> > > > > > On May 3, 2013, at 4:09 PM, Daniel Dai 
> > > wrote:
> > > > > >
> > > > > > > I am not sure how other Apache projects dealing with it? Seems
> > Solr
> > > > > also
> > > > > > > has some connector to JTS?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Daniel
> > > > > > >
> > > > > > >
> > > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <
> > aseld...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Thanks Alan for your interest. It's too bad that an open
> source
> > > > > > licensing
> > > > > > >> issue is holding me back from doing some open source work. I
> > > > > understand
> > > > > > the
> > > > > > >> issue and your workarounds make sense. However, as I mentioned
> > in
> > > > the
> > > > > > >> beginning, I don't want to have my own branch of Pig because
> it

Re: A major addition to Pig. Working with spatial data

2013-05-06 Thread Nick Dimiduk
I'm to a lawyer, but I see no reason why this cannot be an external
extension to Pig. It would behave the same way PostGIS is an external
extension to Postgres. Any Apache issues would be toward general
purpose enhancements, not specific to your project.

Good on you!
-n

On Mon, May 6, 2013 at 10:12 AM, Ahmed Eldawy  wrote:

> I contacted solr developers to see how JTS can be included in an Apache
> project. See
>
> http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
> As far as I understand, they did not include it in the main solr project,
> rather, they created a separate project (spatial 4j) which is still
> licensed under Apache license and refers to JTS. Users will have to
> download JTS libraries separately to make it run. That's pretty much the
> same plan that Jonathan mentioned. We will still have the overhead of
> serializing/deserializing the shapes each time a function is called. Also,
> we will have to use the ugly bytearray data type for spatial data instead
> of creating its own data type (e.g., Geometry).
> I think using spatial 4j instead of JTS will not be sufficient for our case
> as we need to provide an access to all spatial functions of JTS such as
> Union, Intersection, Difference, ... etc. This way we can claim conformity
> with OGC standards which gives visibility and appreciations of the spatial
> community.
> I think also that this means I will not add any issues to JIRA as it is now
> a separate project. I'm planning to host it on github and have all the
> issues there.
> Let me know if you have any suggestions or comments.
>
> Thanks
> Ahmed
>
>
> Best regards,
> Ahmed Eldawy
>
>
> On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney 
> wrote:
>
> > You can give them all the same label or tag and filter on that later on.
> >
> >
> > 2013/5/6 Ahmed Eldawy 
> >
> > > Thanks all for taking the time to respond. Danial, I didn't know that
> > Solr
> > > uses JTS. This is a good finding and we can definitely ask them to see
> if
> > > there is a work around we can do. Jonathan, I thought of the same idea
> of
> > > serializing/deserializing a bytearray each time a UDF is called. The
> > > deserialization part is good for letting Pig auto detect spatial types
> if
> > > not set explicitly in the schema. What is the best way to start this? I
> > > want to add an initial set of JIRA issues and start working on them
> but I
> > > also need to keep the work grouped in some sense just for organization.
> > >
> > > Thanks
> > > Ahmed
> > >
> > > Best regards,
> > > Ahmed Eldawy
> > >
> > >
> > > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney 
> > > wrote:
> > >
> > > > I agree that this is cool, and if other projects are using JTS it is
> > > worth
> > > > talking them to see how. I also agree that licensing is very
> > frustrating.
> > > >
> > > > In the short term, however, while it is annoying to have to manage
> the
> > > > serialization and deserialization yourself, you can have the geometry
> > > type
> > > > be passed around as a bytearray type. Your UDF's will have to know
> this
> > > and
> > > > treat it accordingly, but if you did this then all of the tools could
> > be
> > > in
> > > > an external project on github instead of a branch in Pig. Then, if we
> > can
> > > > get the licensing done, we could add the Geometry type to Pig. Adding
> > > > types, honestly, is kind of tedious but not super difficult, so once
> > the
> > > > rest is done, that shouldn't be too difficult.
> > > >
> > > >
> > > > 2013/5/4 Russell Jurney 
> > > >
> > > > > If a way could be found, this would be an awesome addition to Pig.
> > > > >
> > > > > Russell Jurney http://datasyndrome.com
> > > > >
> > > > > On May 3, 2013, at 4:09 PM, Daniel Dai 
> > wrote:
> > > > >
> > > > > > I am not sure how other Apache projects dealing with it? Seems
> Solr
> > > > also
> > > > > > has some connector to JTS?
> > > > > >
> > > > > > Thanks,
> > > > > > Daniel
> > > > > >
> > > > > >
> > > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy <
> aseld...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> Thanks Alan for your interest. It's too bad that an open source
> > > > > licensing
> > > > > >> issue is holding me back from doing some open source work. I
> > > > understand
> > > > > the
> > > > > >> issue and your workarounds make sense. However, as I mentioned
> in
> > > the
> > > > > >> beginning, I don't want to have my own branch of Pig because it
> > > makes
> > > > my
> > > > > >> extension less portable. I'll think of another way to do it.
> I'll
> > > ask
> > > > > vivid
> > > > > >> solutions if they can double license their code although I think
> > the
> > > > > answer
> > > > > >> will be no. I'll also think of a way to ship my extension as a
> set
> > > of
> > > > > jar
> > > > > >> files without the need to change the core of Pig. This way, it
> can
> > > be
> > > > > >> easily ported to newer versions of Pig.
> > > > > >>
> > >

[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13650021#comment-13650021
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy] This is removing parameters that were not used. I have not tested 
performance but I think it could only improve performance.
(see latest patch PIG-3307_2.patch)

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-05-06 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649958#comment-13649958
 ] 

Daniel Dai commented on PIG-3285:
-

If HBase adding another method only adding hbase.jar/guava.jar/protobuf.jar 
etc(but not inputformat/outputformat jar), Pig can switch to use that. But for 
now, TableMapReduce.addDependencyJars(job) is not what we want.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2013-05-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649924#comment-13649924
 ] 

Rohini Palaniswamy commented on PIG-1824:
-

to use a jython install, the Lib dir must be in the jython search path
 * via env variable JYTHON_HOME=jy_home or JYTHON_PATH=jy_home/Lib:... or
 * jython-standalone.jar should be in the classpath

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10.0
>
> Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, 
> 1824_final.patch, 1824.patch, 1824x.patch, 
> TEST-org.apache.pig.test.TestGrunt.txt, 
> TEST-org.apache.pig.test.TestScriptLanguage.txt, 
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1824) Support import modules in Jython UDF

2013-05-06 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649920#comment-13649920
 ] 

Rohini Palaniswamy commented on PIG-1824:
-

You need to have jython/Lib directory in the classpath. We bundle it with our 
deployment. Else need to have jython-standalone.jar instead of jython.jar as in 
Pig 0.11. 

> Support import modules in Jython UDF
> 
>
> Key: PIG-1824
> URL: https://issues.apache.org/jira/browse/PIG-1824
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.8.0, 0.9.0
>Reporter: Richard Ding
>Assignee: Woody Anderson
> Fix For: 0.10.0
>
> Attachments: 1824a.patch, 1824b.patch, 1824c.patch, 1824d.patch, 
> 1824_final.patch, 1824.patch, 1824x.patch, 
> TEST-org.apache.pig.test.TestGrunt.txt, 
> TEST-org.apache.pig.test.TestScriptLanguage.txt, 
> TEST-org.apache.pig.test.TestScriptUDF.txt
>
>
> Currently, Jython UDF script doesn't support Jython import statement as in 
> the following example:
> {code}
> #!/usr/bin/python
> import re
> @outputSchema("word:chararray")
> def resplit(content, regex, index):
> return re.compile(regex).split(content)[index]
> {code}
> Can Pig automatically locate the Jython module file and ship it to the 
> backend? Or should we add a ship clause to let user explicitly specify the 
> module to ship? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-06 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem resolved PIG-3311.


   Resolution: Fixed
Fix Version/s: 0.12

> add pig-withouthadoop-h2 to mvn-jar
> ---
>
> Key: PIG-3311
> URL: https://issues.apache.org/jira/browse/PIG-3311
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 0.12
>
> Attachments: PIG-3311.patch
>
>
> mvn-jar currently creates pig-version.jar and pig-version-h2.jar
> I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
> that are needed to run pig from the command line.
> This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3311) add pig-withouthadoop-h2 to mvn-jar

2013-05-06 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649907#comment-13649907
 ] 

Julien Le Dem commented on PIG-3311:


Committed to TRUNK

> add pig-withouthadoop-h2 to mvn-jar
> ---
>
> Key: PIG-3311
> URL: https://issues.apache.org/jira/browse/PIG-3311
> Project: Pig
>  Issue Type: Improvement
>  Components: build
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3311.patch
>
>
> mvn-jar currently creates pig-version.jar and pig-version-h2.jar
> I'm adding pig-version-withouthadoop.jar and pig-version-withouthadoop-h2.jar 
> that are needed to run pig from the command line.
> This will allow a dual-version package.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


dev@pig.apache.org

2013-05-06 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3293:
--

Attachment: pig-3293-test-only-v01.patch

bq. Must be the "caster" in D's POCast is null. Can you attach MyLoader?

Attaching a test case using  
{noformat}
   public class PigStorageWithStatistics extends PigStorage {
{noformat}
from org.apache.pig.test.  Even though both PigStorage and 
PigStorageWithStatistics returns Utf8StorageConverter, testcase fails with 
"Cannot determine how to convert the
bytearray to string."

Note that I created PIG-3295 for dealing with the case when casting fails even 
when union comes from the same loader.

Figuring out if the loaders were same was easy with calling 'equals' for the 
FuncSpec instances.  I don't know how to achieve this easily for comparing 
casters.


> Casting fails after Union from two data sources&loaders
> ---
>
> Key: PIG-3293
> URL: https://issues.apache.org/jira/browse/PIG-3293
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Priority: Minor
> Attachments: pig-3293-test-only-v01.patch
>
>
> Script similar to 
> {noformat}
> A = load 'data1' using MyLoader() as (a:bytearray);
> B = load 'data2' as (a:bytearray);
> C = union onschema A,B;
> D = foreach C generate (chararray)a;
> Store D into './out';
> {noformat}
> fails with 
>java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: 
> ERROR 1075: Received a bytearray from the UDF. Cannot determine how to 
> convert the bytearray to string.
> Both MyLoader and PigStorage use the default Utf8StorageConverter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: A major addition to Pig. Working with spatial data

2013-05-06 Thread Ahmed Eldawy
I contacted solr developers to see how JTS can be included in an Apache
project. See
http://mail-archives.apache.org/mod_mbox/lucene-dev/201305.mbox/raw/%3C1367815102914-4060969.post%40n3.nabble.com%3E/
As far as I understand, they did not include it in the main solr project,
rather, they created a separate project (spatial 4j) which is still
licensed under Apache license and refers to JTS. Users will have to
download JTS libraries separately to make it run. That's pretty much the
same plan that Jonathan mentioned. We will still have the overhead of
serializing/deserializing the shapes each time a function is called. Also,
we will have to use the ugly bytearray data type for spatial data instead
of creating its own data type (e.g., Geometry).
I think using spatial 4j instead of JTS will not be sufficient for our case
as we need to provide an access to all spatial functions of JTS such as
Union, Intersection, Difference, ... etc. This way we can claim conformity
with OGC standards which gives visibility and appreciations of the spatial
community.
I think also that this means I will not add any issues to JIRA as it is now
a separate project. I'm planning to host it on github and have all the
issues there.
Let me know if you have any suggestions or comments.

Thanks
Ahmed


Best regards,
Ahmed Eldawy


On Mon, May 6, 2013 at 9:53 AM, Jonathan Coveney  wrote:

> You can give them all the same label or tag and filter on that later on.
>
>
> 2013/5/6 Ahmed Eldawy 
>
> > Thanks all for taking the time to respond. Danial, I didn't know that
> Solr
> > uses JTS. This is a good finding and we can definitely ask them to see if
> > there is a work around we can do. Jonathan, I thought of the same idea of
> > serializing/deserializing a bytearray each time a UDF is called. The
> > deserialization part is good for letting Pig auto detect spatial types if
> > not set explicitly in the schema. What is the best way to start this? I
> > want to add an initial set of JIRA issues and start working on them but I
> > also need to keep the work grouped in some sense just for organization.
> >
> > Thanks
> > Ahmed
> >
> > Best regards,
> > Ahmed Eldawy
> >
> >
> > On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney 
> > wrote:
> >
> > > I agree that this is cool, and if other projects are using JTS it is
> > worth
> > > talking them to see how. I also agree that licensing is very
> frustrating.
> > >
> > > In the short term, however, while it is annoying to have to manage the
> > > serialization and deserialization yourself, you can have the geometry
> > type
> > > be passed around as a bytearray type. Your UDF's will have to know this
> > and
> > > treat it accordingly, but if you did this then all of the tools could
> be
> > in
> > > an external project on github instead of a branch in Pig. Then, if we
> can
> > > get the licensing done, we could add the Geometry type to Pig. Adding
> > > types, honestly, is kind of tedious but not super difficult, so once
> the
> > > rest is done, that shouldn't be too difficult.
> > >
> > >
> > > 2013/5/4 Russell Jurney 
> > >
> > > > If a way could be found, this would be an awesome addition to Pig.
> > > >
> > > > Russell Jurney http://datasyndrome.com
> > > >
> > > > On May 3, 2013, at 4:09 PM, Daniel Dai 
> wrote:
> > > >
> > > > > I am not sure how other Apache projects dealing with it? Seems Solr
> > > also
> > > > > has some connector to JTS?
> > > > >
> > > > > Thanks,
> > > > > Daniel
> > > > >
> > > > >
> > > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy 
> > > > wrote:
> > > > >
> > > > >> Thanks Alan for your interest. It's too bad that an open source
> > > > licensing
> > > > >> issue is holding me back from doing some open source work. I
> > > understand
> > > > the
> > > > >> issue and your workarounds make sense. However, as I mentioned in
> > the
> > > > >> beginning, I don't want to have my own branch of Pig because it
> > makes
> > > my
> > > > >> extension less portable. I'll think of another way to do it. I'll
> > ask
> > > > vivid
> > > > >> solutions if they can double license their code although I think
> the
> > > > answer
> > > > >> will be no. I'll also think of a way to ship my extension as a set
> > of
> > > > jar
> > > > >> files without the need to change the core of Pig. This way, it can
> > be
> > > > >> easily ported to newer versions of Pig.
> > > > >>
> > > > >> Thanks
> > > > >> Ahmed
> > > > >>
> > > > >> Best regards,
> > > > >> Ahmed Eldawy
> > > > >>
> > > > >>
> > > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates <
> ga...@hortonworks.com>
> > > > wrote:
> > > > >>
> > > > >>> I know this is frustrating, but the different licenses do have
> > > > different
> > > > >>> requirements that make it so that Apache can't ship GPL code.  A
> > > legal
> > > > >>> explanation is at
> > > > http://www.apache.org/licenses/GPL-compatibility.htmlFor additional
> > info
> > > > on the LGPL specific questions see
> > > > >>> http://www.apache.org/legal/3party.html

[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-05-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649875#comment-13649875
 ] 

Nick Dimiduk commented on PIG-3285:
---

I don't want this to become a game of dependency tracking across projects. That 
said, HBase doesn't add dependencies all that often, so maybe it doesn't matter 
in practice.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2315) Make as clause work in generate

2013-05-06 Thread Ruslan Al-Fakikh (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13649851#comment-13649851
 ] 

Ruslan Al-Fakikh commented on PIG-2315:
---

Daniel, I think that deprecating/removing the "cast in the as clause" is 
easier, because it is not working anyway. I guess the "()" should stay.

I also have a suggestion to make this issue a duplicate of PIG-2216 instead of 
PIG-2216 being a duplicate of this issue. It seems that the description of 
PIG-2216 explains just everything and does not cause confusion.

> Make as clause work in generate
> ---
>
> Key: PIG-2315
> URL: https://issues.apache.org/jira/browse/PIG-2315
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Gianmarco De Francisci Morales
> Fix For: 0.12
>
>
> Currently, the following syntax is supported and ignored causing confusing 
> with users:
> A1 = foreach A1 generate a as a:chararray ;
> After this statement a just retains its previous type

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig package supporting both hadoop 1 and 2

2013-05-06 Thread Julien Le Dem
Thanks Rohini,
I was thinking of using "hadoop version" to decide.
Do you think there's a difference?
Julien


On Mon, May 6, 2013 at 7:06 AM, Rohini Palaniswamy
wrote:

> Hi Julien,
>We use a perl script internally instead of bin/pig shell script which
> has some Y! deployment stuff. That is why changes have not been done to
> bin/pig to support both versions already. We have two lib directories - one
> for hadoop20, other for hadoop23(hadoop2) and choose the one to have in
> classpath based on presence of hadoop-core.jar.
>
> my $hadoopClasspath = `$hadoop_cmd classpath`;
>
>  my $jarDir = "$pigJarRoot/lib-hadoop23";
>
> if ($hadoopClasspath =~ m/hadoop-core-/) {
>
>$jarDir = "$pigJarRoot/lib-hadoop20";
>
> }
>
> Regards,
> Rohini
>
>
>
> On Fri, May 3, 2013 at 3:40 PM, Julien Le Dem  wrote:
>
> > Hi Pig developers,
> > I'm looking into having a Pig package that works both for Hadoop 1.0 and
> > Hadoop 2.0
> > That means have both pig*.jar and pig*-h2.jar in the package and choosing
> > the right one dynamically.
> > In particular I created this JIRA as a first step:
> > https://issues.apache.org/jira/browse/PIG-3311
> > I'm curious to know how others do it.
> > Yahoo! for example? (Rohini?)
> > Thanks,
> > Julien
> >
> >
>


Re: CHANGES.txt in trunk

2013-05-06 Thread Alan Gates
Cool, just wanted to make sure.  I agree this is a good idea.

Alan.

On May 5, 2013, at 7:06 PM, Rohini Palaniswamy wrote:

> Alan,
>  I meant relocating only - Moving jiras from 0.12 to 0.11.x releases
> section :).
> 
> Regards,
> Rohini
> 
> 
> On Fri, May 3, 2013 at 3:08 PM, Alan Gates  wrote:
> 
>> What do mean by remove?  They should still be in the file.  They may need
>> to be relocated under the 0.11 section.  But the trunk CHANGES file should
>> include all changes that are on trunk.
>> 
>> Alan.
>> 
>> On May 3, 2013, at 1:34 PM, Rohini Palaniswamy wrote:
>> 
>>> Hi,
>>>  I see lot of patches that went into 0.11 are under trunk in the
>>> CHANGES.txt. Should we sync the file with the CHANGES.txt in branch-0.11
>>> and remove those jiras from trunk that went into 0.11? What is the usual
>>> process of updating CHANGES.txt when a jira is checked both into a branch
>>> and also trunk?
>>> 
>>> Regards,
>>> Rohini
>> 
>> 



Re: A major addition to Pig. Working with spatial data

2013-05-06 Thread Jonathan Coveney
You can give them all the same label or tag and filter on that later on.


2013/5/6 Ahmed Eldawy 

> Thanks all for taking the time to respond. Danial, I didn't know that Solr
> uses JTS. This is a good finding and we can definitely ask them to see if
> there is a work around we can do. Jonathan, I thought of the same idea of
> serializing/deserializing a bytearray each time a UDF is called. The
> deserialization part is good for letting Pig auto detect spatial types if
> not set explicitly in the schema. What is the best way to start this? I
> want to add an initial set of JIRA issues and start working on them but I
> also need to keep the work grouped in some sense just for organization.
>
> Thanks
> Ahmed
>
> Best regards,
> Ahmed Eldawy
>
>
> On Sat, May 4, 2013 at 4:47 PM, Jonathan Coveney 
> wrote:
>
> > I agree that this is cool, and if other projects are using JTS it is
> worth
> > talking them to see how. I also agree that licensing is very frustrating.
> >
> > In the short term, however, while it is annoying to have to manage the
> > serialization and deserialization yourself, you can have the geometry
> type
> > be passed around as a bytearray type. Your UDF's will have to know this
> and
> > treat it accordingly, but if you did this then all of the tools could be
> in
> > an external project on github instead of a branch in Pig. Then, if we can
> > get the licensing done, we could add the Geometry type to Pig. Adding
> > types, honestly, is kind of tedious but not super difficult, so once the
> > rest is done, that shouldn't be too difficult.
> >
> >
> > 2013/5/4 Russell Jurney 
> >
> > > If a way could be found, this would be an awesome addition to Pig.
> > >
> > > Russell Jurney http://datasyndrome.com
> > >
> > > On May 3, 2013, at 4:09 PM, Daniel Dai  wrote:
> > >
> > > > I am not sure how other Apache projects dealing with it? Seems Solr
> > also
> > > > has some connector to JTS?
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > > >
> > > > On Thu, May 2, 2013 at 11:59 AM, Ahmed Eldawy 
> > > wrote:
> > > >
> > > >> Thanks Alan for your interest. It's too bad that an open source
> > > licensing
> > > >> issue is holding me back from doing some open source work. I
> > understand
> > > the
> > > >> issue and your workarounds make sense. However, as I mentioned in
> the
> > > >> beginning, I don't want to have my own branch of Pig because it
> makes
> > my
> > > >> extension less portable. I'll think of another way to do it. I'll
> ask
> > > vivid
> > > >> solutions if they can double license their code although I think the
> > > answer
> > > >> will be no. I'll also think of a way to ship my extension as a set
> of
> > > jar
> > > >> files without the need to change the core of Pig. This way, it can
> be
> > > >> easily ported to newer versions of Pig.
> > > >>
> > > >> Thanks
> > > >> Ahmed
> > > >>
> > > >> Best regards,
> > > >> Ahmed Eldawy
> > > >>
> > > >>
> > > >> On Thu, May 2, 2013 at 12:33 PM, Alan Gates 
> > > wrote:
> > > >>
> > > >>> I know this is frustrating, but the different licenses do have
> > > different
> > > >>> requirements that make it so that Apache can't ship GPL code.  A
> > legal
> > > >>> explanation is at
> > > http://www.apache.org/licenses/GPL-compatibility.htmlFor additional
> info
> > > on the LGPL specific questions see
> > > >>> http://www.apache.org/legal/3party.html
> > > >>>
> > > >>> As far as pulling it in via ivy, the issue isn't so much where the
> > code
> > > >>> lives as much as what code we are requiring to make Pig work.  If
> > > >> something
> > > >>> that is [L]GPL is required for Pig it violates Apache rules as
> > outlined
> > > >>> above.  It also would be a show stopper for a lot of companies that
> > > >>> redistribute Pig and that are allergic to GPL software.
> > > >>>
> > > >>> So, as I said before, if you wanted to continue with that library
> and
> > > >> they
> > > >>> are not willing to relicense it then it would have to be bolted on
> > > after
> > > >>> Apache Pig is built.  Nothing stops you from doing this by
> > downloading
> > > >>> Apache Pig, adding this library and your code, and redistributing,
> > > though
> > > >>> it wouldn't then be open to all Pig users.
> > > >>>
> > > >>> Alan.
> > > >>>
> > > >>> On May 1, 2013, at 6:08 PM, Ahmed Eldawy wrote:
> > > >>>
> > >  Thanks for your response. I was never good at differentiating all
> > > those
> > >  open source licenses. I mean what is the point making open source
> > > >>> licenses
> > >  if it blocks me from using a library in an open source project.
> Any
> > > >> way,
> > >  I'm not going into debate here. Just one question, if we use JTS
> as
> > a
> > >  library (jar file) without adding the code in Pig, is it still a
> > > >>> violation?
> > >  We'll use ivy, for example, to download the jar file when
> compiling.
> > >  On May 1, 2013 7:50 PM, "Alan Gates" 
> wrote:
> > > 
> > > > Passing on the technical details for a mom

Re: Pig package supporting both hadoop 1 and 2

2013-05-06 Thread Rohini Palaniswamy
Hi Julien,
   We use a perl script internally instead of bin/pig shell script which
has some Y! deployment stuff. That is why changes have not been done to
bin/pig to support both versions already. We have two lib directories - one
for hadoop20, other for hadoop23(hadoop2) and choose the one to have in
classpath based on presence of hadoop-core.jar.

my $hadoopClasspath = `$hadoop_cmd classpath`;

 my $jarDir = "$pigJarRoot/lib-hadoop23";

if ($hadoopClasspath =~ m/hadoop-core-/) {

   $jarDir = "$pigJarRoot/lib-hadoop20";

}

Regards,
Rohini



On Fri, May 3, 2013 at 3:40 PM, Julien Le Dem  wrote:

> Hi Pig developers,
> I'm looking into having a Pig package that works both for Hadoop 1.0 and
> Hadoop 2.0
> That means have both pig*.jar and pig*-h2.jar in the package and choosing
> the right one dynamically.
> In particular I created this JIRA as a first step:
> https://issues.apache.org/jira/browse/PIG-3311
> I'm curious to know how others do it.
> Yahoo! for example? (Rohini?)
> Thanks,
> Julien
>
>