Re: Unit test classpath trouble

2013-05-10 Thread Konstantin Boudnik
As you know we've been testing Pig 0.11 vs 2.0.4-alpha as a part of Bigtop's
validation for the latest hadoop release and it worked ok. Bigtop doesn't run
unit tests though, so it seems like a build issue to me.

Cos

On Sat, May 11, 2013 at 09:19AM, Andrew Purtell wrote:
> I've tried that, thanks. I did a bit more investigation and it seems the
> issue is recent Hadoop 2 releases. Has anyone tried running Pig unit tests
> using a more recent Hadoop release than 2.0.0-alpha? Maybe my trouble is a
> simple thing that someone with more experience with Pig internals would see
> right away? Cluster testing seems ok. It's just unit tests that fail. But
> that is concerning.
> 
> I'm trying HEAD of branch-0.11.
> 
> My Java is version "1.6.0_43" Java(TM) SE Runtime Environment (build
> 1.6.0_43-b01) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed
> mode). OS is Ubuntu 13.04 (GNU/Linux 3.8.0-19-generic x86_64).
> 
> With defaults and only -Dhadoopversion=23 on the Ant command line, it seems
> ok.
> 
> With build.properties of:
> 
> hadoopversion=23
> hadoop-common.version=2.0.4-alpha
> hadoop-hdfs.version=2.0.4-alpha
> hadoop-mapreduce.version=2.0.4-alpha
> 
> 
> or defined on the Ant command line, I'll see unit test failures like:
> 
> Testcase: testAccumWithDistinct took 0.868 sec
> Caused an ERROR
> org/apache/hadoop/mapred/ResourceMgrDelegate
> java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/ResourceMgrDelegate
> at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:112)
> at
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:94)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:81)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:74)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:482)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:461)
> at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:152)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
> at org.apache.pig.PigServer.storeEx(PigServer.java:931)
> at org.apache.pig.PigServer.store(PigServer.java:898)
> at org.apache.pig.PigServer.openIterator(PigServer.java:811)
> at
> org.apache.pig.test.TestAccumulator.testAccumWithDistinct(TestAccumulator.java:424)
> 
> That suggests a cause but I've not started spelunking code with the hope
> this is something simple that someone has already encountered.
> 
> 
> On Sat, May 11, 2013 at 1:31 AM, Johnny Zhang  wrote:
> 
> > Hi, Andrew:
> > Does something like "-Dhadoopversion=23" help ? eg. ant clean test
> > -Dhadoopversion=23 -Dtest.junit.output.format=xml
> >
> > Johnny
> >
> >
> > On Fri, May 10, 2013 at 3:39 AM, Andrew Purtell 
> > wrote:
> >
> > > Please pardon the basic question. I'm building Pig 0.11.2-SNAPSHOT
> > against
> > > Hadoop 2.0.4. 'ant package' and full cluster tests work fine, but I'm not
> > > having much luck with running the unit tests, 'ant test-core' or 'ant
> > > test'. The problem looks to be a MR app classpath issue.
> > >
> > > Sometimes: java.lang.NoClassDefFoundError:
> > > org/apache/hadoop/yarn/client/YarnClientImpl
> > >
> > > Sometimes: java.lang.NoClassDefFoundError:
> > > org/apache/hadoop/mapred/ResourceMgrDelegate
> > >
> > > A few Google searches have turned up no useful pointers. Maybe there is
> > > something simple I am missing? How do you set up for running unit tests
> > on
> > > your dev boxes?
> > >
> > > --
> > > Best regards,
> > >
> > >- Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > > (via Tom White)
> > >
> >
> 
> 
> 
> -- 
> Best regards,
> 
>- Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)


signature.asc
Description: Digital signature


Re: Unit test classpath trouble

2013-05-10 Thread Johnny Zhang
Hi, Andrew:
I just set up a job to run unit test against 2.0.4-alpha. I will
investigate failure and reply to thread.

Thanks,
Johnny Zhang


On Fri, May 10, 2013 at 6:19 PM, Andrew Purtell  wrote:

> I've tried that, thanks. I did a bit more investigation and it seems the
> issue is recent Hadoop 2 releases. Has anyone tried running Pig unit tests
> using a more recent Hadoop release than 2.0.0-alpha? Maybe my trouble is a
> simple thing that someone with more experience with Pig internals would see
> right away? Cluster testing seems ok. It's just unit tests that fail. But
> that is concerning.
>
> I'm trying HEAD of branch-0.11.
>
> My Java is version "1.6.0_43" Java(TM) SE Runtime Environment (build
> 1.6.0_43-b01) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed
> mode). OS is Ubuntu 13.04 (GNU/Linux 3.8.0-19-generic x86_64).
>
> With defaults and only -Dhadoopversion=23 on the Ant command line, it seems
> ok.
>
> With build.properties of:
>
> hadoopversion=23
> hadoop-common.version=2.0.4-alpha
> hadoop-hdfs.version=2.0.4-alpha
> hadoop-mapreduce.version=2.0.4-alpha
>
>
> or defined on the Ant command line, I'll see unit test failures like:
>
> Testcase: testAccumWithDistinct took 0.868 sec
> Caused an ERROR
> org/apache/hadoop/mapred/ResourceMgrDelegate
> java.lang.NoClassDefFoundError:
> org/apache/hadoop/mapred/ResourceMgrDelegate
> at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:112)
> at
>
> org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
> at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:94)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:81)
> at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:74)
> at org.apache.hadoop.mapred.JobClient.init(JobClient.java:482)
> at org.apache.hadoop.mapred.JobClient.(JobClient.java:461)
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:152)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
> at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
> at org.apache.pig.PigServer.storeEx(PigServer.java:931)
> at org.apache.pig.PigServer.store(PigServer.java:898)
> at org.apache.pig.PigServer.openIterator(PigServer.java:811)
> at
>
> org.apache.pig.test.TestAccumulator.testAccumWithDistinct(TestAccumulator.java:424)
>
> That suggests a cause but I've not started spelunking code with the hope
> this is something simple that someone has already encountered.
>
>
> On Sat, May 11, 2013 at 1:31 AM, Johnny Zhang 
> wrote:
>
> > Hi, Andrew:
> > Does something like "-Dhadoopversion=23" help ? eg. ant clean test
> > -Dhadoopversion=23 -Dtest.junit.output.format=xml
> >
> > Johnny
> >
> >
> > On Fri, May 10, 2013 at 3:39 AM, Andrew Purtell 
> > wrote:
> >
> > > Please pardon the basic question. I'm building Pig 0.11.2-SNAPSHOT
> > against
> > > Hadoop 2.0.4. 'ant package' and full cluster tests work fine, but I'm
> not
> > > having much luck with running the unit tests, 'ant test-core' or 'ant
> > > test'. The problem looks to be a MR app classpath issue.
> > >
> > > Sometimes: java.lang.NoClassDefFoundError:
> > > org/apache/hadoop/yarn/client/YarnClientImpl
> > >
> > > Sometimes: java.lang.NoClassDefFoundError:
> > > org/apache/hadoop/mapred/ResourceMgrDelegate
> > >
> > > A few Google searches have turned up no useful pointers. Maybe there is
> > > something simple I am missing? How do you set up for running unit tests
> > on
> > > your dev boxes?
> > >
> > > --
> > > Best regards,
> > >
> > >- Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>
>
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


Re: Unit test classpath trouble

2013-05-10 Thread Andrew Purtell
I've tried that, thanks. I did a bit more investigation and it seems the
issue is recent Hadoop 2 releases. Has anyone tried running Pig unit tests
using a more recent Hadoop release than 2.0.0-alpha? Maybe my trouble is a
simple thing that someone with more experience with Pig internals would see
right away? Cluster testing seems ok. It's just unit tests that fail. But
that is concerning.

I'm trying HEAD of branch-0.11.

My Java is version "1.6.0_43" Java(TM) SE Runtime Environment (build
1.6.0_43-b01) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed
mode). OS is Ubuntu 13.04 (GNU/Linux 3.8.0-19-generic x86_64).

With defaults and only -Dhadoopversion=23 on the Ant command line, it seems
ok.

With build.properties of:

hadoopversion=23
hadoop-common.version=2.0.4-alpha
hadoop-hdfs.version=2.0.4-alpha
hadoop-mapreduce.version=2.0.4-alpha


or defined on the Ant command line, I'll see unit test failures like:

Testcase: testAccumWithDistinct took 0.868 sec
Caused an ERROR
org/apache/hadoop/mapred/ResourceMgrDelegate
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/ResourceMgrDelegate
at org.apache.hadoop.mapred.YARNRunner.(YARNRunner.java:112)
at
org.apache.hadoop.mapred.YarnClientProtocolProvider.create(YarnClientProtocolProvider.java:34)
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:94)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:81)
at org.apache.hadoop.mapreduce.Cluster.(Cluster.java:74)
at org.apache.hadoop.mapred.JobClient.init(JobClient.java:482)
at org.apache.hadoop.mapred.JobClient.(JobClient.java:461)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:152)
at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1249)
at org.apache.pig.PigServer.storeEx(PigServer.java:931)
at org.apache.pig.PigServer.store(PigServer.java:898)
at org.apache.pig.PigServer.openIterator(PigServer.java:811)
at
org.apache.pig.test.TestAccumulator.testAccumWithDistinct(TestAccumulator.java:424)

That suggests a cause but I've not started spelunking code with the hope
this is something simple that someone has already encountered.


On Sat, May 11, 2013 at 1:31 AM, Johnny Zhang  wrote:

> Hi, Andrew:
> Does something like "-Dhadoopversion=23" help ? eg. ant clean test
> -Dhadoopversion=23 -Dtest.junit.output.format=xml
>
> Johnny
>
>
> On Fri, May 10, 2013 at 3:39 AM, Andrew Purtell 
> wrote:
>
> > Please pardon the basic question. I'm building Pig 0.11.2-SNAPSHOT
> against
> > Hadoop 2.0.4. 'ant package' and full cluster tests work fine, but I'm not
> > having much luck with running the unit tests, 'ant test-core' or 'ant
> > test'. The problem looks to be a MR app classpath issue.
> >
> > Sometimes: java.lang.NoClassDefFoundError:
> > org/apache/hadoop/yarn/client/YarnClientImpl
> >
> > Sometimes: java.lang.NoClassDefFoundError:
> > org/apache/hadoop/mapred/ResourceMgrDelegate
> >
> > A few Google searches have turned up no useful pointers. Maybe there is
> > something simple I am missing? How do you set up for running unit tests
> on
> > your dev boxes?
> >
> > --
> > Best regards,
> >
> >- Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


[jira] Subscription: PIG patch available

2013-05-10 Thread jira
Issue Subscription
Filter: PIG patch available (22 issues)

Subscriber: pigdaily

Key Summary
PIG-3317disable optimizations via pig properties
https://issues.apache.org/jira/browse/PIG-3317
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3291TestExampleGenerator fails on Windows because of lack of file name 
escaping
https://issues.apache.org/jira/browse/PIG-3291
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-2244Macros cannot be passed relation names
https://issues.apache.org/jira/browse/PIG-2244
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Commented] (PIG-3316) Pig failed to interpret DateTime values in some special cases

2013-05-10 Thread Johnny Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13655069#comment-13655069
 ] 

Johnny Zhang commented on PIG-3316:
---

it is a little bit late :) but the unit tests with patch pass for me

> Pig failed to interpret DateTime values in some special cases
> -
>
> Key: PIG-3316
> URL: https://issues.apache.org/jira/browse/PIG-3316
> Project: Pig
>  Issue Type: Bug
>  Components: data, impl
>Affects Versions: 0.11
> Environment: 1970-01-01
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.11
>
> Attachments: PIG-3316.patch
>
>
> For the query
> A = load 'date.txt' as ( f1:int, f2:datetime );
> dump A;
> with input data
> 1,1970-01-01
> 2,1970-01
> pig generates the following output
> (1,1970-01-01T00:00:00.000-01:00)
> (2,1970-01-01T00:00:00.000-01:00)
> which seemingly incorrectly interprets the day or month part as time zone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-3279) Support nested RANK

2013-05-10 Thread Johnny Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Johnny Zhang reassigned PIG-3279:
-

Assignee: Johnny Zhang

> Support nested RANK
> ---
>
> Key: PIG-3279
> URL: https://issues.apache.org/jira/browse/PIG-3279
> Project: Pig
>  Issue Type: Improvement
>Reporter: Gianmarco De Francisci Morales
>Assignee: Johnny Zhang
> Attachments: PIG-3279-1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3307) Refactor physical operators to remove methods parameters that are always null

2013-05-10 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654887#comment-13654887
 ] 

Julien Le Dem commented on PIG-3307:


[~daijy] Also most likely it wont make any difference performance wise.

> Refactor physical operators to remove methods parameters that are always null
> -
>
> Key: PIG-3307
> URL: https://issues.apache.org/jira/browse/PIG-3307
> Project: Pig
>  Issue Type: Improvement
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Attachments: PIG-3307_0.patch, PIG-3307_1.patch, PIG-3307_2.patch
>
>
> The physical operators are sometimes overly complex. I'm trying to cleanup 
> some unnecessary code.
> in particular there is an array of getNext(*T* v) where the value v does not 
> seem to have any importance and is just used to pick the correct method.
> I have started a refactoring for a more readable getNext*T*().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3279) Support nested RANK

2013-05-10 Thread Johnny Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654834#comment-13654834
 ] 

Johnny Zhang commented on PIG-3279:
---

the exception seems happen between 'global rearrange' and 'POPackage', still 
looking at it.

> Support nested RANK
> ---
>
> Key: PIG-3279
> URL: https://issues.apache.org/jira/browse/PIG-3279
> Project: Pig
>  Issue Type: Improvement
>Reporter: Gianmarco De Francisci Morales
> Attachments: PIG-3279-1.patch.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3316) Pig failed to interpret DateTime values in some special cases

2013-05-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654814#comment-13654814
 ] 

Xuefu Zhang commented on PIG-3316:
--

Thanks, Santhosh.

Patch committed to trunk.

> Pig failed to interpret DateTime values in some special cases
> -
>
> Key: PIG-3316
> URL: https://issues.apache.org/jira/browse/PIG-3316
> Project: Pig
>  Issue Type: Bug
>  Components: data, impl
>Affects Versions: 0.11
> Environment: 1970-01-01
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.11
>
> Attachments: PIG-3316.patch
>
>
> For the query
> A = load 'date.txt' as ( f1:int, f2:datetime );
> dump A;
> with input data
> 1,1970-01-01
> 2,1970-01
> pig generates the following output
> (1,1970-01-01T00:00:00.000-01:00)
> (2,1970-01-01T00:00:00.000-01:00)
> which seemingly incorrectly interprets the day or month part as time zone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3316) Pig failed to interpret DateTime values in some special cases

2013-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3316:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Pig failed to interpret DateTime values in some special cases
> -
>
> Key: PIG-3316
> URL: https://issues.apache.org/jira/browse/PIG-3316
> Project: Pig
>  Issue Type: Bug
>  Components: data, impl
>Affects Versions: 0.11
> Environment: 1970-01-01
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.11
>
> Attachments: PIG-3316.patch
>
>
> For the query
> A = load 'date.txt' as ( f1:int, f2:datetime );
> dump A;
> with input data
> 1,1970-01-01
> 2,1970-01
> pig generates the following output
> (1,1970-01-01T00:00:00.000-01:00)
> (2,1970-01-01T00:00:00.000-01:00)
> which seemingly incorrectly interprets the day or month part as time zone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3297) Avro files with stringType set to String cannot be read by the AvroStorage LoadFunc

2013-05-10 Thread Michael Moss (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13654700#comment-13654700
 ] 

Michael Moss commented on PIG-3297:
---

Niels, I've run into this also (and a similar issue with Hive), and it seems 
that it might be brought on not by the code you patched, but perhaps in the 
avro-1.x.y.jar files itself.

We are serializing strings as avro.java.string and everything was working fine 
on our HDP1.2 (Hortonworks) cluster, but when I upgraded the avro jar that pig 
uses to avro-1.7.4 from avro-1.5.3, I get this exception.

I'm also have this issue on the latest version of CDH4.2 (with Impala1.0) in 
both pig and hive and the culprit there seems to be the avro-1.7.x.jar that 
they use.

I'm just starting to dig into finding out why, but was hoping you or someone 
here might have some insight.

Thanks.

> Avro files with stringType set to String cannot be read by the AvroStorage 
> LoadFunc
> ---
>
> Key: PIG-3297
> URL: https://issues.apache.org/jira/browse/PIG-3297
> Project: Pig
>  Issue Type: Bug
>  Components: piggybank
>Affects Versions: 0.11.1
>Reporter: Niels Basjes
> Attachments: PIG-3297-1.patch, test_record.avro
>
>
> When an Avro file is created there exists the option to set the "String Type" 
> to a different class than the default Utf8.
> A very common situation is that the "String Type" is set to the default 
> String class.
> When trying to read such an Avro file in Pig using the AvroStorage LoadFunc 
> from the included piggybank this gives the following Exception:
> Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.avro.util.Utf8
> at 
> org.apache.pig.piggybank.storage.avro.PigAvroDatumReader.readString(PigAvroDatumReader.java:154)
> at 
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:150)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: disable optimizations via pig properties

2013-05-10 Thread Travis Crawford

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11032/#review20427
---


Julien raised a good question asking if "set" in the script works because query 
parsing may not have happened yet. He's correct - I did not explicitly test 
that and it doesn't work. Taking a look at how to proceed. It would be ideal if 
individual scripts can disable optimizations.

- Travis Crawford


On May 9, 2013, 9:03 p.m., Travis Crawford wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/11032/
> ---
> 
> (Updated May 9, 2013, 9:03 p.m.)
> 
> 
> Review request for pig, Julien Le Dem, Bill Graham, and Feng Peng.
> 
> 
> Description
> ---
> 
> Update pig to allow disabling optimizations via pig properties. Currently 
> optimizations must be disabled via command-line options. Pig properties can 
> be set in pig.properties, "set" commands in scripts themselves, and 
> command-line -D options.
> 
> The use-case is, for scripts that require certain optimizations to be 
> disabled, allowing the script itself to disable the optimization. Currently 
> whatever runs the script needs to specially handle disabling the optimization 
> for that specific query.
> 
> 
> This addresses bug PIG-3317.
> https://issues.apache.org/jira/browse/PIG-3317
> 
> 
> Diffs
> -
> 
>   src/docs/src/documentation/content/xdocs/perf.xml 108ae7e 
>   src/org/apache/pig/Main.java f97ed9f 
>   src/org/apache/pig/PigConstants.java ea77e97 
>   src/org/apache/pig/newplan/logical/optimizer/LogicalPlanOptimizer.java 
> d26f381 
> 
> Diff: https://reviews.apache.org/r/11032/diff/
> 
> 
> Testing
> ---
> 
> Manually tested on a fully-distributed cluster.
> 
> THIS FAILS:
> PIG_CONF_DIR=/etc/pig/conf ./bin/pig -c query.pig
> 
> THIS WORKS:
> PIG_CONF_DIR=/etc/pig/conf ./bin/pig 
> -Dpig.optimizer.rules.disabled=ColumnMapKeyPrune -c query.pig
> 
> Notice how "-Dpig.optimizer.rules.disabled=ColumnMapKeyPrune" specifies a pig 
> property, which could be in pig.properties, or the script itself.
> 
> 
> Failure message:
> 
> Pig Stack Trace
> ---
> ERROR 2229: Couldn't find matching uid -1 for project (Name: Project Type: 
> bytearray Uid: 97550 Input: 0 Column: 1)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1067: Unable to 
> explain alias null
>   at org.apache.pig.PigServer.explain(PigServer.java:1057)
>   at 
> org.apache.pig.tools.grunt.GruntParser.explainCurrentBatch(GruntParser.java:419)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processExplain(GruntParser.java:351)
>   at org.apache.pig.tools.grunt.Grunt.checkScript(Grunt.java:98)
>   at org.apache.pig.Main.run(Main.java:607)
>   at org.apache.pig.Main.main(Main.java:152)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2000: 
> Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune
>   at 
> org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:122)
>   at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:281)
>   at org.apache.pig.PigServer.compilePp(PigServer.java:1380)
>   at org.apache.pig.PigServer.explain(PigServer.java:1042)
>   ... 10 more
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2229: 
> Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 
> 97550 Input: 0 Column: 1)
>   at 
> org.apache.pig.newplan.logical.optimizer.ProjectionPatcher$ProjectionRewriter.visit(ProjectionPatcher.java:91)
>   at 
> org.apache.pig.newplan.logical.expression.ProjectExpression.accept(ProjectExpression.java:207)
>   at 
> org.apache.pig.newplan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:64)
>   at 
> org.apache.pig.newplan.DepthFirstWalker.walk(DepthFirstWalker.java:53)
>   at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
>   at 
> org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:142)
>   at 
> org.apache.pig.newplan.logical.relational.LOInnerLoad.accept(LOInnerLoad.java:128)
>   at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
>   at 
> org.apache.pig.newplan.logical.optimizer.AllExpressionVisitor.visit(AllExpressionVisitor.java:124)
> 

Re: JavaScript UDFs

2013-05-10 Thread Julien Le Dem
Yes that would be great.
I contributed the javascript UDFs. Let me know if you find bugs


On Thu, May 9, 2013 at 7:58 PM, Russell Jurney  wrote:
> That would be so cool!
>
> Russell Jurney http://datasyndrome.com
>
> On May 9, 2013, at 7:30 PM, Ruan Pethiyagoda  wrote:
>
>> At Hack Reactor, we are setting up a computationally heavy MapReduce job
>> across a 1,000 machine cluster. It is an algorithmic tree traversal
>> expected to run over several hours, comprising over 10 quintillion
>> computations, and occupying at least a few petabytes of storage across
>> HDFS.
>>
>> We plan to run the job using a Java implementation of our functions. The
>> original, however, is in JavaScript, and I noticed that JavaScript UDFs are
>> still considered experimental for want of additional testing. If our
>> project could be of any use in testing edge cases or proving out the
>> JavaScript UDF functionality in Pig, we would be more than happy to help.
>>
>> Cheers,
>>
>> RP


[jira] [Updated] (PIG-3316) Pig failed to interpret DateTime values in some special cases

2013-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3316:
-

Status: Patch Available  (was: In Progress)

> Pig failed to interpret DateTime values in some special cases
> -
>
> Key: PIG-3316
> URL: https://issues.apache.org/jira/browse/PIG-3316
> Project: Pig
>  Issue Type: Bug
>  Components: data, impl
>Affects Versions: 0.11
> Environment: 1970-01-01
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.11
>
> Attachments: PIG-3316.patch
>
>
> For the query
> A = load 'date.txt' as ( f1:int, f2:datetime );
> dump A;
> with input data
> 1,1970-01-01
> 2,1970-01
> pig generates the following output
> (1,1970-01-01T00:00:00.000-01:00)
> (2,1970-01-01T00:00:00.000-01:00)
> which seemingly incorrectly interprets the day or month part as time zone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3316) Pig failed to interpret DateTime values in some special cases

2013-05-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-3316:
-

Status: In Progress  (was: Patch Available)

> Pig failed to interpret DateTime values in some special cases
> -
>
> Key: PIG-3316
> URL: https://issues.apache.org/jira/browse/PIG-3316
> Project: Pig
>  Issue Type: Bug
>  Components: data, impl
>Affects Versions: 0.11
> Environment: 1970-01-01
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Fix For: 0.11
>
> Attachments: PIG-3316.patch
>
>
> For the query
> A = load 'date.txt' as ( f1:int, f2:datetime );
> dump A;
> with input data
> 1,1970-01-01
> 2,1970-01
> pig generates the following output
> (1,1970-01-01T00:00:00.000-01:00)
> (2,1970-01-01T00:00:00.000-01:00)
> which seemingly incorrectly interprets the day or month part as time zone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Unit test classpath trouble

2013-05-10 Thread Johnny Zhang
Hi, Andrew:
Does something like "-Dhadoopversion=23" help ? eg. ant clean test
-Dhadoopversion=23 -Dtest.junit.output.format=xml

Johnny


On Fri, May 10, 2013 at 3:39 AM, Andrew Purtell  wrote:

> Please pardon the basic question. I'm building Pig 0.11.2-SNAPSHOT against
> Hadoop 2.0.4. 'ant package' and full cluster tests work fine, but I'm not
> having much luck with running the unit tests, 'ant test-core' or 'ant
> test'. The problem looks to be a MR app classpath issue.
>
> Sometimes: java.lang.NoClassDefFoundError:
> org/apache/hadoop/yarn/client/YarnClientImpl
>
> Sometimes: java.lang.NoClassDefFoundError:
> org/apache/hadoop/mapred/ResourceMgrDelegate
>
> A few Google searches have turned up no useful pointers. Maybe there is
> something simple I am missing? How do you set up for running unit tests on
> your dev boxes?
>
> --
> Best regards,
>
>- Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>


Unit test classpath trouble

2013-05-10 Thread Andrew Purtell
Please pardon the basic question. I'm building Pig 0.11.2-SNAPSHOT against
Hadoop 2.0.4. 'ant package' and full cluster tests work fine, but I'm not
having much luck with running the unit tests, 'ant test-core' or 'ant
test'. The problem looks to be a MR app classpath issue.

Sometimes: java.lang.NoClassDefFoundError:
org/apache/hadoop/yarn/client/YarnClientImpl

Sometimes: java.lang.NoClassDefFoundError:
org/apache/hadoop/mapred/ResourceMgrDelegate

A few Google searches have turned up no useful pointers. Maybe there is
something simple I am missing? How do you set up for running unit tests on
your dev boxes?

-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)