[jira] [Commented] (PIG-3224) Reservoir sampling

2013-04-23 Thread Vicki Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640042#comment-13640042
 ] 

Vicki Fu commented on PIG-3224:
---

Hi Gianmarco,
Please correct me if my understand is wrong.

The algorithm for Reservoir Sample should be(Not big Data Version):
Assume that you have the memory to store k elements. Store the first k elements 
in the memory in an array. Now when you receive the nth element (where n > k), 
generate a random number r between 1 and n. If r > k discard the nth element. 
Otherwise replace the rth element in the array with the nth element. This 
approach will ensure that at any stage your array would contain k elements that 
are uniformly randomly selected from the input elements received so far.

When we need to consider Big Data, the input Data M split into N block into 
different node, we can do the algorithm above parallel. So it should be same. 
Then it will keep each element will evenly 



reference: http://en.wikipedia.org/wiki/Reservoir_sampling

> Reservoir sampling
> --
>
> Key: PIG-3224
> URL: https://issues.apache.org/jira/browse/PIG-3224
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gianmarco De Francisci Morales
>  Labels: gsoc2013
>
> Implement a reservoir sampling option, or make it the default ( 
> http://en.wikipedia.org/wiki/Reservoir_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3221) Bootstrap sampling

2013-04-23 Thread Vicki Fu (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640029#comment-13640029
 ] 

Vicki Fu commented on PIG-3221:
---

Thank you Gianmarco.
The output of the sampling is k set of resample data. If the small data run in 
R using a matrix as the input could be:
---R code as the following will be easy--
A <- matrix(seq(1,100),10,10)
k <- 10 # 10 bootstrap replicate set
replicate(k, apply(A, 2, sample, replace = TRUE))

Y, you are right, the statistics result can be collected by UDF.
My plan is implement bootstrap, Reservoir and Stratified Sampling in order in 
this project.
Please correct me if my understand is not right.
Thanks
Vicky


> Bootstrap sampling
> --
>
> Key: PIG-3221
> URL: https://issues.apache.org/jira/browse/PIG-3221
> Project: Pig
>  Issue Type: New Feature
>Reporter: Gianmarco De Francisci Morales
>  Labels: gsoc2013
>
> Implement a bootstrap sampling option ( 
> http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
> operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3290) TestLogicalPlanBuilder.testQuery85 fail in trunk

2013-04-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639977#comment-13639977
 ] 

Cheolsoo Park commented on PIG-3290:


ACtually, I was totally wrong. I should have run the following script:
{code}
a = load '1.txt';
b = group a by ($0, $1);
c = foreach b generate group.($0, $1);
{code}
In this case, "group.($0, $1)" is named as "group__", which seems fine. Sorry 
that I misunderstood your patch.

+1.

> TestLogicalPlanBuilder.testQuery85 fail in trunk
> 
>
> Key: PIG-3290
> URL: https://issues.apache.org/jira/browse/PIG-3290
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.2
>Reporter: Johnny Zhang
>Assignee: Daniel Dai
> Attachments: PIG-3290-1.patch
>
>
> I can reproduce it locally as well, the exception is
> {noformat}
> junit.framework.AssertionFailedError: 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
>  Duplicate schema alias: group
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.buildPlan(TestLogicalPlanBuilder.java:2211)
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.testQuery85(TestLogicalPlanBuilder.java:1011)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3290) TestLogicalPlanBuilder.testQuery85 fail in trunk

2013-04-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639966#comment-13639966
 ] 

Cheolsoo Park commented on PIG-3290:


Thanks for fixing it, Daniel! Overall looks good to me.

Have a minor question regarding the following code:
{code}
if (subAlias==null) {
subAlias = "";
}
alias = alias + "_" + subAlias;
{code}
Is it ever possible for subAlias to be null in the dereference expression? I 
actually tried something like the following:
{code}
a = load '1.txt';
b = group a by ($0, $1);
c = foreach b generate group.$0, group.$1, COUNT(a.gpa);
{code}
But this gives me a NPE before I hit these lines of code. So I suppose subAlias 
doesn't need to be checked whether it's null.

However, if there is a case where subAlias is null, won't your code cause an 
alias conflict because multiple columns will be named as "alias_"?

So my questions is:
* Can we get rid of these lines of code at all?
* If not, shouldn't we append something unique per column when subAlias is null?

Please correct me if I am wrong.

> TestLogicalPlanBuilder.testQuery85 fail in trunk
> 
>
> Key: PIG-3290
> URL: https://issues.apache.org/jira/browse/PIG-3290
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.2
>Reporter: Johnny Zhang
>Assignee: Daniel Dai
> Attachments: PIG-3290-1.patch
>
>
> I can reproduce it locally as well, the exception is
> {noformat}
> junit.framework.AssertionFailedError: 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
>  Duplicate schema alias: group
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.buildPlan(TestLogicalPlanBuilder.java:2211)
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.testQuery85(TestLogicalPlanBuilder.java:1011)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3088) Add a builtin udf which removes prefixes

2013-04-23 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639925#comment-13639925
 ] 

Prashant Kommireddi commented on PIG-3088:
--

I agree this might be a special case but is extremely common in our production 
scripts.
A lot of our jobs run off of data exported from db tables, and a lot of these 
have common fields (Id, CreatedBy, CreatedDate ...). Just wanted to highlight 
one needs to be careful in making this the default behavior.



> Add a builtin udf which removes prefixes
> 
>
> Key: PIG-3088
> URL: https://issues.apache.org/jira/browse/PIG-3088
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3088-0.patch
>
>
> This is something that I always hear people complaining about. Note that this 
> depends on the FlattenOutput annotation.
> This UDF supports the following.
> {code}
> a = load 'a' as (x1, y1, z1);
> b = load 'a' as (x2, y2, z2);
> c = join a by x1, b by x2;
> describe c;
> --c: {a::x1: bytearray,a::y1: bytearray,a::z1: bytearray,b::x2: 
> bytearray,b::y2: bytearray,b::z2: bytearray}
> d = foreach c generate RemovePrefix(*);
> describe d;
> --d: {x1: bytearray,y1: bytearray,z1: bytearray,x2: bytearray,y2: 
> bytearray,z2: bytearray}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2013-04-23 Thread jira
Issue Subscription
Filter: PIG patch available (29 issues)

Subscriber: pigdaily

Key Summary
PIG-3291TestExampleGenerator fails on Windows because of lack of file name 
escaping
https://issues.apache.org/jira/browse/PIG-3291
PIG-3286TestPigContext.testImportList fails in trunk
https://issues.apache.org/jira/browse/PIG-3286
PIG-3285Jobs using HBaseStorage fail to ship dependency jars
https://issues.apache.org/jira/browse/PIG-3285
PIG-3281Pig version in pig.pom is incorrect in branch-0.11
https://issues.apache.org/jira/browse/PIG-3281
PIG-3258Patch to allow MultiStorage to use more than one index to generate 
output tree
https://issues.apache.org/jira/browse/PIG-3258
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3247Piggybank functions to mimic OVER clause in SQL
https://issues.apache.org/jira/browse/PIG-3247
PIG-3223AvroStorage does not handle comma separated input paths
https://issues.apache.org/jira/browse/PIG-3223
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3169Remove intermediate data after a job finishes
https://issues.apache.org/jira/browse/PIG-3169
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3105Fix TestJobSubmission unit test failure.
https://issues.apache.org/jira/browse/PIG-3105
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness
https://issues.apache.org/jira/browse/PIG-3069
PIG-3028testGrunt dev test needs some command filters to run correctly 
without cygwin
https://issues.apache.org/jira/browse/PIG-3028
PIG-3026Pig checked-in baseline comparisons need a pre-filter to address 
OS-specific newline differences
https://issues.apache.org/jira/browse/PIG-3026
PIG-3025TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline 
script needs simplification
https://issues.apache.org/jira/browse/PIG-3025
PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is 
brittle
https://issues.apache.org/jira/browse/PIG-3024
PIG-3015Rewrite of AvroStorage
https://issues.apache.org/jira/browse/PIG-3015
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-2970Nested foreach getting incorrect schema when having unrelated inner 
query
https://issues.apache.org/jira/browse/PIG-2970
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2641Create toJSON function for all complex types: tuples, bags and maps
https://issues.apache.org/jira/browse/PIG-2641
PIG-2248Pig parser does not detect when a macro name masks a UDF name
https://issues.apache.org/jira/browse/PIG-2248
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3290) TestLogicalPlanBuilder.testQuery85 fail in trunk

2013-04-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3290:


Attachment: PIG-3290-1.patch

Ops, my mistake. Attach patch.

> TestLogicalPlanBuilder.testQuery85 fail in trunk
> 
>
> Key: PIG-3290
> URL: https://issues.apache.org/jira/browse/PIG-3290
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.2
>Reporter: Johnny Zhang
> Attachments: PIG-3290-1.patch
>
>
> I can reproduce it locally as well, the exception is
> {noformat}
> junit.framework.AssertionFailedError: 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
>  Duplicate schema alias: group
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.buildPlan(TestLogicalPlanBuilder.java:2211)
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.testQuery85(TestLogicalPlanBuilder.java:1011)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3290) TestLogicalPlanBuilder.testQuery85 fail in trunk

2013-04-23 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-3290:


Assignee: Daniel Dai

> TestLogicalPlanBuilder.testQuery85 fail in trunk
> 
>
> Key: PIG-3290
> URL: https://issues.apache.org/jira/browse/PIG-3290
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.2
>Reporter: Johnny Zhang
>Assignee: Daniel Dai
> Attachments: PIG-3290-1.patch
>
>
> I can reproduce it locally as well, the exception is
> {noformat}
> junit.framework.AssertionFailedError: 
> org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
>  Duplicate schema alias: group
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.buildPlan(TestLogicalPlanBuilder.java:2211)
>   at 
> org.apache.pig.test.TestLogicalPlanBuilder.testQuery85(TestLogicalPlanBuilder.java:1011)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3088) Add a builtin udf which removes prefixes

2013-04-23 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639872#comment-13639872
 ] 

Russell Jurney commented on PIG-3088:
-

Prashant: that special case is the only time the prefix should be retained. 
Otherwise it is pollution, not helpful.

> Add a builtin udf which removes prefixes
> 
>
> Key: PIG-3088
> URL: https://issues.apache.org/jira/browse/PIG-3088
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jonathan Coveney
>Assignee: Jonathan Coveney
> Fix For: 0.12
>
> Attachments: PIG-3088-0.patch
>
>
> This is something that I always hear people complaining about. Note that this 
> depends on the FlattenOutput annotation.
> This UDF supports the following.
> {code}
> a = load 'a' as (x1, y1, z1);
> b = load 'a' as (x2, y2, z2);
> c = join a by x1, b by x2;
> describe c;
> --c: {a::x1: bytearray,a::y1: bytearray,a::z1: bytearray,b::x2: 
> bytearray,b::y2: bytearray,b::z2: bytearray}
> d = foreach c generate RemovePrefix(*);
> describe d;
> --d: {x1: bytearray,y1: bytearray,z1: bytearray,x2: bytearray,y2: 
> bytearray,z2: bytearray}
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig-GSoC2013

2013-04-23 Thread Daniel Dai
Yes, I think it is a bundle. I know there is already one student interested
in it. Make sure you talk to Gianmarco (g...@gdfm.me) before you apply.

Thanks,
Daniel


On Fri, Apr 19, 2013 at 10:14 AM, Sadari Jayawardena <
sjayawardena...@gmail.com> wrote:

> I am a final year undergraduate in Computer Science & Engineering. I have a
> good experience in Java programming and interested in mathematics and
> statistics. Therefore I think implementing sampling algorithms for Apache
> Pig would be interesting to me.
>
> I would like to know whether all the three sampling algorithms have to
> be implemented as a one project. Could someone provide me more details
> regarding this.
>
> Thanks in advance.
>


Re: Build failed in Jenkins: Pig-trunk #1463

2013-04-23 Thread Cheolsoo Park
PIG-3290 is tracking this failure:
https://issues.apache.org/jira/browse/PIG-3290

In addition, PIG-3286 is tracking another failing unit test:
https://issues.apache.org/jira/browse/PIG-3286


On Mon, Apr 22, 2013 at 3:32 PM, Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
> Changes:
>
> [daijy] PIG-2767: Pig creates wrong schema after dereferencing nested
> tuple fields
>
> --
> [...truncated 38216 lines...]
> [junit] at
> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
> [junit] at
> sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
> [junit] at
> sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
> [junit] at
> org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
> [junit] at java.lang.Thread.run(Thread.java:662)
> [junit]
> [junit] 439752 [IPC Server listener on 56499] INFO
>  org.apache.hadoop.ipc.Server  - Stopping IPC Server listener on 56499
> [junit] 439753
> [org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@baf589] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Exiting DataXceiveServer
> [junit] 439756
> [org.apache.hadoop.hdfs.server.datanode.DataBlockScanner@7ec028] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataBlockScanner  - Exiting
> DataBlockScanner thread.
> [junit] 440083 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_-6177855010699382237_1189 file
> build/test/data/dfs/data/data1/current/blk_-6177855010699382237 for deletion
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_-2088787686098981270_1184 file
> build/test/data/dfs/data/data2/current/blk_-2088787686098981270 for deletion
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_1448845170710403685_1186 file
> build/test/data/dfs/data/data2/current/blk_1448845170710403685 for deletion
> [junit] 440084 [Thread-307] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Deleted block
> blk_-6177855010699382237_1189 at file
> build/test/data/dfs/data/data1/current/blk_-6177855010699382237
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_1505983587429670804_1190 file
> build/test/data/dfs/data/data2/current/blk_1505983587429670804 for deletion
> [junit] 440084 [Thread-256] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Deleted block
> blk_-2088787686098981270_1184 at file
> build/test/data/dfs/data/data2/current/blk_-2088787686098981270
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_1969082855003938608_1191 file
> build/test/data/dfs/data/data1/current/blk_1969082855003938608 for deletion
> [junit] 440084 [Thread-256] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Deleted block
> blk_1448845170710403685_1186 at file
> build/test/data/dfs/data/data2/current/blk_1448845170710403685
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_2600316783254142406_1192 file
> build/test/data/dfs/data/data2/current/blk_2600316783254142406 for deletion
> [junit] 440084 [Thread-307] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Deleted block
> blk_1969082855003938608_1191 at file
> build/test/data/dfs/data/data1/current/blk_1969082855003938608
> [junit] 440084 [Thread-256] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Deleted block
> blk_1505983587429670804_1190 at file
> build/test/data/dfs/data/data2/current/blk_1505983587429670804
> [junit] 440084 [Thread-256] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Deleted block
> blk_2600316783254142406_1192 at file
> build/test/data/dfs/data/data2/current/blk_2600316783254142406
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] INFO
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Scheduling block
> blk_3621221455178812140_1185 file
> build/test/data/dfs/data/data1/current/blk_3621221455178812140 for deletion
> [junit] 440084 [DataNode:
> [build/test/data/dfs/data/data1,build/test/data/dfs/data/data2]] WARN
>  org.apache.hadoop.hdfs.server.datanode.DataNode  - Unexpected error trying
> to dele

Build failed in Jenkins: Pig-trunk #1464

2013-04-23 Thread Apache Jenkins Server
See 

Changes:

[gates] PIG-3027 pigTest unit test needs a newline filter for comparisons of 
golden multi-line

--
[...truncated 38867 lines...]
[junit] 465861 [main] WARN  org.apache.hadoop.metrics2.util.MBeans  - 
Hadoop:service=DataNode,name=FSDatasetState-UndefinedStorageId197802805
[junit] javax.management.InstanceNotFoundException: 
Hadoop:service=DataNode,name=FSDatasetState-UndefinedStorageId197802805
[junit] at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
[junit] at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415)
[junit] at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403)
[junit] at 
com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:506)
[junit] at 
org.apache.hadoop.metrics2.util.MBeans.unregister(MBeans.java:71)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.shutdown(FSDataset.java:1934)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataNode.shutdown(DataNode.java:788)
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdownDataNodes(MiniDFSCluster.java:566)
[junit] at 
org.apache.hadoop.hdfs.MiniDFSCluster.shutdown(MiniDFSCluster.java:550)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsClusters(MiniGenericCluster.java:87)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutdownMiniDfsAndMrClusters(MiniGenericCluster.java:77)
[junit] at 
org.apache.pig.test.MiniGenericCluster.shutDown(MiniGenericCluster.java:68)
[junit] at 
org.apache.pig.test.TestStore.oneTimeTearDown(TestStore.java:138)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
[junit] at java.lang.reflect.Method.invoke(Method.java:597)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[junit] at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:33)
[junit] at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
[junit] at 
junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:38)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:420)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:911)
[junit] at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:768)
[junit] 465862 [main] WARN  
org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService  - 
AsyncDiskService has already shut down.
[junit] Shutting down DataNode 2
[junit] 465863 [main] INFO  org.mortbay.log  - Stopped 
SelectChannelConnector@localhost:0
[junit] 465964 [main] INFO  org.apache.hadoop.ipc.Server  - Stopping server 
on 54622
[junit] 465964 [IPC Server handler 0 on 54622] INFO  
org.apache.hadoop.ipc.Server  - IPC Server handler 0 on 54622: exiting
[junit] 465964 [IPC Server listener on 54622] INFO  
org.apache.hadoop.ipc.Server  - Stopping IPC Server listener on 54622
[junit] 465964 [IPC Server handler 2 on 54622] INFO  
org.apache.hadoop.ipc.Server  - IPC Server handler 2 on 54622: exiting
[junit] 465964 [main] INFO  
org.apache.hadoop.ipc.metrics.RpcInstrumentation  - shut down
[junit] 465964 [IPC Server Responder] INFO  org.apache.hadoop.ipc.Server  - 
Stopping IPC Server Responder
[junit] 465964 [IPC Server handler 1 on 54622] INFO  
org.apache.hadoop.ipc.Server  - IPC Server handler 1 on 54622: exiting
[junit] 465965 
[org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@ac97cc] WARN  
org.apache.hadoop.hdfs.server.datanode.DataNode  - 
DatanodeRegistration(127.0.0.1:52565, 
storageID=DS-191453938-67.195.138.20-52565-1366755910696, infoPort=53716, 
ipcPort=54622):DataXceiveServer:java.nio.channels.AsynchronousCloseException
[junit] at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
[junit] at 
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159)
[junit] at 
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84)
[junit] at 
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:131)
[junit] at java

[jira] [Updated] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-23 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated PIG-3285:
--

Assignee: Nick Dimiduk
  Status: Patch Available  (was: Open)

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
>Assignee: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-23 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated PIG-3285:
--

Attachment: 0001-PIG-3285-Add-HBase-dependency-jars.patch

Updated patch smoothes over hbase-0.94 versions.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 
> 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-23 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639532#comment-13639532
 ] 

Nick Dimiduk commented on PIG-3285:
---

Ah, right. This bug was fixed in HBASE-8146, which will go out with 
hbase-0.94.7. I'll update the patch to smooth over both versions.

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 
> 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3025) TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification

2013-04-23 Thread David Wannemacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Wannemacher updated PIG-3025:
---

Attachment: PIG-3025.trunk.patch

Ported to trunk and tested on Windows

> TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script 
> needs simplification
> --
>
> Key: PIG-3025
> URL: https://issues.apache.org/jira/browse/PIG-3025
> Project: Pig
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 0.10.0
>Reporter: John Gordon
>Assignee: John Gordon
> Fix For: 0.12
>
> Attachments: PIG-3025.branch-0.10.1-2.patch, 
> PIG-3025.branch-0.10.1.patch, PIG-3025.trunk.patch
>
>
> The "SimpleEchoStreamingCommand" string, which is an inline perl script, is 
> unnecessarily complicated by escaping nested quote characters on the 
> command-line.  As a result, it ends up unstable across shell implementations 
> and operating systems.
> Considering that perl has qq and can print unquoted values, this seems like 
> it is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3025) TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script needs simplification

2013-04-23 Thread David Wannemacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Wannemacher updated PIG-3025:
---

Affects Version/s: 0.12
   Status: Patch Available  (was: Open)

> TestPruneColumn unit test - SimpleEchoStreamingCommand perl inline script 
> needs simplification
> --
>
> Key: PIG-3025
> URL: https://issues.apache.org/jira/browse/PIG-3025
> Project: Pig
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 0.10.0, 0.12
>Reporter: John Gordon
>Assignee: John Gordon
> Fix For: 0.12
>
> Attachments: PIG-3025.branch-0.10.1-2.patch, 
> PIG-3025.branch-0.10.1.patch, PIG-3025.trunk.patch
>
>
> The "SimpleEchoStreamingCommand" string, which is an inline perl script, is 
> unnecessarily complicated by escaping nested quote characters on the 
> command-line.  As a result, it ends up unstable across shell implementations 
> and operating systems.
> Considering that perl has qq and can print unquoted values, this seems like 
> it is not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3291) TestExampleGenerator fails on Windows because of lack of file name escaping

2013-04-23 Thread David Wannemacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Wannemacher updated PIG-3291:
---

Attachment: PIG-3291.trunk.patch

Fixes the problem on windows and tested on linux

> TestExampleGenerator fails on Windows because of lack of file name escaping
> ---
>
> Key: PIG-3291
> URL: https://issues.apache.org/jira/browse/PIG-3291
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12
> Environment: Windows
>Reporter: David Wannemacher
> Fix For: 0.12
>
> Attachments: PIG-3291.trunk.patch
>
>
> On Windows, all tests fail with an exception like this:
> Testcase: testFilterGroupCountStore took 0.022 sec
>   Caused an ERROR
> Error during parsing.   Unexpected character 'S'
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing.   Unexpected character 'S'
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1669)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1607)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:563)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:576)
>   at 
> org.apache.pig.test.TestExampleGenerator.testFilterGroupCountStore(TestExampleGenerator.java:394)
> Caused by: Failed to parse:   Unexpected character 'S'
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:235)
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:174)
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1660)
> Looks like a change in https://issues.apache.org/jira/browse/PIG-2170 caused 
> the file names to stop being escaped properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3291) TestExampleGenerator fails on Windows because of lack of file name escaping

2013-04-23 Thread David Wannemacher (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Wannemacher updated PIG-3291:
---

Status: Patch Available  (was: Open)

> TestExampleGenerator fails on Windows because of lack of file name escaping
> ---
>
> Key: PIG-3291
> URL: https://issues.apache.org/jira/browse/PIG-3291
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12
> Environment: Windows
>Reporter: David Wannemacher
> Fix For: 0.12
>
> Attachments: PIG-3291.trunk.patch
>
>
> On Windows, all tests fail with an exception like this:
> Testcase: testFilterGroupCountStore took 0.022 sec
>   Caused an ERROR
> Error during parsing.   Unexpected character 'S'
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing.   Unexpected character 'S'
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1669)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1607)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:563)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:576)
>   at 
> org.apache.pig.test.TestExampleGenerator.testFilterGroupCountStore(TestExampleGenerator.java:394)
> Caused by: Failed to parse:   Unexpected character 'S'
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:235)
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:174)
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1660)
> Looks like a change in https://issues.apache.org/jira/browse/PIG-2170 caused 
> the file names to stop being escaped properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3291) TestExampleGenerator fails on Windows because of lack of file name escaping

2013-04-23 Thread David Wannemacher (JIRA)
David Wannemacher created PIG-3291:
--

 Summary: TestExampleGenerator fails on Windows because of lack of 
file name escaping
 Key: PIG-3291
 URL: https://issues.apache.org/jira/browse/PIG-3291
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.12
 Environment: Windows
Reporter: David Wannemacher
 Fix For: 0.12


On Windows, all tests fail with an exception like this:
Testcase: testFilterGroupCountStore took 0.022 sec
Caused an ERROR
Error during parsing.   Unexpected character 'S'
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing.   Unexpected character 'S'
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1669)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1607)
at org.apache.pig.PigServer.registerQuery(PigServer.java:563)
at org.apache.pig.PigServer.registerQuery(PigServer.java:576)
at 
org.apache.pig.test.TestExampleGenerator.testFilterGroupCountStore(TestExampleGenerator.java:394)
Caused by: Failed to parse:   Unexpected character 'S'
at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:235)
at 
org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:174)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1660)

Looks like a change in https://issues.apache.org/jira/browse/PIG-2170 caused 
the file names to stop being escaped properly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3169) Remove intermediate data after a job finishes

2013-04-23 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639336#comment-13639336
 ] 

Cheolsoo Park commented on PIG-3169:


Let's do option #2! I think it's safe to remove 
{{getTemporaryPath(ElementDescriptor, PigContext)}} in 0.12. Do you mind 
updating the patch by removing this method?

Btw, you have unused imports in MapReduceLauncher.java and FileLocalizer.java:
{code}
> +import java.util.Set;
> +import org.apache.pig.backend.datastorage.ElementDescriptor;

> +import java.util.HashSet;
> +import java.util.Set;
{code}
I am running unit tests with PIG-3169.5.patch now. I will let you know if I see 
any failures.

Thanks!

> Remove intermediate data after a job finishes
> -
>
> Key: PIG-3169
> URL: https://issues.apache.org/jira/browse/PIG-3169
> Project: Pig
>  Issue Type: Improvement
>Reporter: Mark Wagner
>Assignee: Mark Wagner
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-3169.1.patch, PIG-3169.2.patch, PIG-3169.3.patch, 
> PIG-3169.4.patch, PIG-3169.5.patch, PIG-3169-hotfix.patch
>
>
> When using Grunt, intermediate data and distributed caches files are left in 
> 'pig.temp.dir' until the session is closed. It would be nice to cleanup files 
> as they are no longer needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3285) Jobs using HBaseStorage fail to ship dependency jars

2013-04-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639328#comment-13639328
 ] 

Daniel Dai commented on PIG-3285:
-

TableMapReduce.addDependencyJars(job) works for HBase 0.95+. For HBase 0.94, 
there is a bug which does not add hbase.jar. Also a side effect for 
addDependencyJars(job) is it adds hadoop.jar and pig.jar into tmpjars. Both of 
which are already taken care of by Pig. I am not sure if we double ship those 
jars if we doing this. Actually I would prefer a 
TableMapReduce.addDependencyJars version which only adds 
hbase.jar/guava.jar/protobuf.jar and additional dependencies when hbase evolves 
(but no hadoop.jar/pig.jar)

> Jobs using HBaseStorage fail to ship dependency jars
> 
>
> Key: PIG-3285
> URL: https://issues.apache.org/jira/browse/PIG-3285
> Project: Pig
>  Issue Type: Bug
>Reporter: Nick Dimiduk
> Fix For: 0.11.1
>
> Attachments: 0001-PIG-3285-Add-HBase-dependency-jars.patch, 1.pig, 
> 1.txt, 2.pig
>
>
> Launching a job consuming {{HBaseStorage}} fails out of the box. The user 
> must specify {{-Dpig.additional.jars}} for HBase and all of its dependencies. 
> Exceptions look something like this:
> {noformat}
> 2013-04-19 18:58:39,360 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.NoClassDefFoundError: com/google/protobuf/Message
>   at 
> org.apache.hadoop.hbase.io.HbaseObjectWritable.(HbaseObjectWritable.java:266)
>   at org.apache.hadoop.hbase.ipc.Invocation.write(Invocation.java:139)
>   at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.sendParam(HBaseClient.java:612)
>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:975)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:84)
>   at $Proxy7.getProtocolVersion(Unknown Source)
>   at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine.getProxy(WritableRpcEngine.java:136)
>   at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:208)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3027) pigTest unit test needs a newline filter for comparisons of golden multi-line

2013-04-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-3027:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks John.

> pigTest unit test needs a newline filter for comparisons of golden multi-line
> -
>
> Key: PIG-3027
> URL: https://issues.apache.org/jira/browse/PIG-3027
> Project: Pig
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 0.10.0
>Reporter: John Gordon
>Assignee: John Gordon
> Fix For: 0.12
>
> Attachments: PIG-3027.trunk.1.patch
>
>
> pigTest leverages assertOutput throughout for text file comparisons to golden 
> checked-in baselines.  This method doesn't take into account line ending 
> differences across platforms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3290) TestLogicalPlanBuilder.testQuery85 fail in trunk

2013-04-23 Thread Johnny Zhang (JIRA)
Johnny Zhang created PIG-3290:
-

 Summary: TestLogicalPlanBuilder.testQuery85 fail in trunk
 Key: PIG-3290
 URL: https://issues.apache.org/jira/browse/PIG-3290
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.2
Reporter: Johnny Zhang


I can reproduce it locally as well, the exception is
{noformat}
junit.framework.AssertionFailedError: 
org.apache.pig.impl.plan.PlanValidationException: ERROR 1108: 
 Duplicate schema alias: group
at 
org.apache.pig.test.TestLogicalPlanBuilder.buildPlan(TestLogicalPlanBuilder.java:2211)
at 
org.apache.pig.test.TestLogicalPlanBuilder.testQuery85(TestLogicalPlanBuilder.java:1011)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira