[jira] Subscription: PIG patch available

2015-04-02 Thread jira
Issue Subscription
Filter: PIG patch available (28 issues)

Subscriber: pigdaily

Key Summary
PIG-4493Pig on Tez gives wrong results if Union is followed by Split
https://issues.apache.org/jira/browse/PIG-4493
PIG-4491Streaming Python Bytearray Bugs
https://issues.apache.org/jira/browse/PIG-4491
PIG-4490MIN/MAX builtin UDFs return wrong results when accumulating for 
strings
https://issues.apache.org/jira/browse/PIG-4490
PIG-4481e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and  
StreamingPerformance_4 produce different result on Windows
https://issues.apache.org/jira/browse/PIG-4481
PIG-4468Pig's jackson version conflicts with that of hadoop 2.6.0
https://issues.apache.org/jira/browse/PIG-4468
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4452Embedded SQL using "SQL" instead of "sql" fails with string index 
out of range: -1 error
https://issues.apache.org/jira/browse/PIG-4452
PIG-4422Implement visitMergeJoin in SparkCompiler
https://issues.apache.org/jira/browse/PIG-4422
PIG-4417Pig's register command should support automatic fetching of jars 
from repo.
https://issues.apache.org/jira/browse/PIG-4417
PIG-4377Skewed outer join produce wrong result in some cases
https://issues.apache.org/jira/browse/PIG-4377
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4193Make collected group work with Spark
https://issues.apache.org/jira/browse/PIG-4193
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce
https://issues.apache.org/jira/browse/PIG-4004
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3635Fix e2e tests for Hadoop 2.X on Windows
https://issues.apache.org/jira/browse/PIG-3635
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587
PIG-3294Allow Pig use Hive UDFs
https://issues.apache.org/jira/browse/PIG-3294

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384


Build failed in Jenkins: Pig-trunk-commit #2084

2015-04-02 Thread Apache Jenkins Server
See 

Changes:

[rohini] PIG-4487: Pig on Tez gives wrong success message on failure in case of 
multiple outputs (rohini)
PIG-4483: Pig on Tez output statistics shows storing to same directory twice 
for union (rohini)

--
[...truncated 2925 lines...]
  [javadoc] Loading source files for package org.apache.pig.impl.logicalLayer...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.schema...
  [javadoc] Loading source files for package 
org.apache.pig.impl.logicalLayer.validators...
  [javadoc] Loading source files for package org.apache.pig.impl.plan...
  [javadoc] Loading source files for package 
org.apache.pig.impl.plan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.impl.streaming...
  [javadoc] Loading source files for package org.apache.pig.impl.util...
  [javadoc] Loading source files for package org.apache.pig.impl.util.avro...
  [javadoc] Loading source files for package org.apache.pig.impl.util.orc...
  [javadoc] Loading source files for package org.apache.pig.newplan...
  [javadoc] Loading source files for package org.apache.pig.newplan.logical...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.expression...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.optimizer...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.relational...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.rules...
  [javadoc] Loading source files for package 
org.apache.pig.newplan.logical.visitor...
  [javadoc] Loading source files for package org.apache.pig.newplan.optimizer...
  [javadoc] Loading source files for package org.apache.pig.parser...
  [javadoc] Loading source files for package org.apache.pig.pen...
  [javadoc] Loading source files for package org.apache.pig.pen.util...
  [javadoc] Loading source files for package org.apache.pig.scripting...
  [javadoc] Loading source files for package org.apache.pig.scripting.groovy...
  [javadoc] Loading source files for package org.apache.pig.scripting.jruby...
  [javadoc] Loading source files for package org.apache.pig.scripting.js...
  [javadoc] Loading source files for package org.apache.pig.scripting.jython...
  [javadoc] Loading source files for package 
org.apache.pig.scripting.streaming.python...
  [javadoc] Loading source files for package org.apache.pig.tools...
  [javadoc] Loading source files for package org.apache.pig.tools.cmdline...
  [javadoc] Loading source files for package org.apache.pig.tools.counters...
  [javadoc] Loading source files for package org.apache.pig.tools.grunt...
  [javadoc] Loading source files for package org.apache.pig.tools.parameters...
  [javadoc] Loading source files for package org.apache.pig.tools.pigstats...
  [javadoc] Loading source files for package 
org.apache.pig.tools.pigstats.mapreduce...
  [javadoc] Loading source files for package 
org.apache.pig.tools.pigstats.tez...
  [javadoc] Loading source files for package org.apache.pig.tools.streams...
  [javadoc] Loading source files for package org.apache.pig.tools.timer...
  [javadoc] Loading source files for package org.apache.pig.validator...
  [javadoc] Constructing Javadoc information...
  [javadoc] 
/home/jenkins/.ivy2/cache/org.apache.hbase/hbase-common/jars/hbase-common-0.96.0-hadoop2.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class):
 warning: Cannot find annotation method 'value()' in type 'SuppressWarnings': 
class file for edu.umd.cs.findbugs.annotations.SuppressWarnings not found
  [javadoc] 
/home/jenkins/.ivy2/cache/org.apache.hbase/hbase-common/jars/hbase-common-0.96.0-hadoop2.jar(org/apache/hadoop/hbase/io/ImmutableBytesWritable.class):
 warning: Cannot find annotation method 'justification()' in type 
'SuppressWarnings'
  [javadoc] Standard Doclet version 1.7.0_65
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
:96:
 warning - Tag @see:illegal character: "123" in "{@link 
EvalFunc#getSchemaType()}"
  [javadoc] 
:96:
 warning - Tag @see:illegal character: "64" in "{@link 
EvalFunc#getSchemaType()}"
  [javadoc] 
:295:
 warning - Tag @link: reference not found: FuncUtils
  [javadoc] 
:96:
 warning - Tag @see: reference not found: {@link EvalFunc#getSchemaType()}
  [javadoc] 
:90:
 warning - @return tag has no arguments.
  [javadoc] 


[jira] [Updated] (PIG-4496) Fix CBZip2InputStream to close underlying stream

2015-04-02 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated PIG-4496:
---
Description: CBZip2InputStream doesn't close the underlying 
FSDataInputStream when itself is closed. However, users such as 
BZip2LineRecordReader and XMLLoader assume CBZip2InputStream will do so. This 
leads to leaking resources and possible failure in reading the next split 
depending on the FileSystem implementation.  (was: CBZip2InputStream doesn't 
close the underlying FSDataInputStream when itself is closed. Users such as 
BZip2LineRecordReader assume CBZip2InputStream will do so. This leads to 
leaking resources and possible failure in reading the next split depending on 
the FileSystem implementation.)

> Fix CBZip2InputStream to close underlying stream
> 
>
> Key: PIG-4496
> URL: https://issues.apache.org/jira/browse/PIG-4496
> Project: Pig
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 0.12.0, 0.15.0
>Reporter: Peter Slawski
>
> CBZip2InputStream doesn't close the underlying FSDataInputStream when itself 
> is closed. However, users such as BZip2LineRecordReader and XMLLoader assume 
> CBZip2InputStream will do so. This leads to leaking resources and possible 
> failure in reading the next split depending on the FileSystem implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4496) Fix CBZip2InputStream to close underlying stream

2015-04-02 Thread Peter Slawski (JIRA)
Peter Slawski created PIG-4496:
--

 Summary: Fix CBZip2InputStream to close underlying stream
 Key: PIG-4496
 URL: https://issues.apache.org/jira/browse/PIG-4496
 Project: Pig
  Issue Type: Bug
  Components: tools
Affects Versions: 0.12.0, 0.15.0
Reporter: Peter Slawski


CBZip2InputStream doesn't close the underlying FSDataInputStream when itself is 
closed. Users such as BZip2LineRecordReader assume CBZip2InputStream will do 
so. This leads to leaking resources and possible failure in reading the next 
split depending on the FileSystem implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs

2015-04-02 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393792#comment-14393792
 ] 

Daniel Dai commented on PIG-3294:
-

bq. The checking in of Hive code is ugly. We need to make sure that gets 
removed before a release so we don't end up forking.
Sure, the duplication is ugly. I already commented on the code "Will remove 
once we switch to use Hive 1.2.0". I expect to remove those classes in the next 
release. However, in this release, I don't want to create a dependency on Hive 
1.2.0. That will complicate the release process. Those classes are simple 
enough and don't likely to cause trouble for a short while.

bq. In POForEach you are visiting the physical plan at run time to determine if 
we need the last record
Yes, I can cache the flag and not do this in the backend. Will update patch 
shortly.

bq. HiveUtils.java: much of this code to convert Hive types to Pig types must 
already be in HCat. Is it not possible to re-use that?
Sure, we can consolidate these code. Since Pig don't depend on HCat, but HCat 
depends on Pig, I guess we shall rework HCat to use the same code to do the 
type conversion.

> Allow Pig use Hive UDFs
> ---
>
> Key: PIG-3294
> URL: https://issues.apache.org/jira/browse/PIG-3294
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>  Labels: gsoc2013, java
> Fix For: 0.15.0
>
> Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
> PIG-3294-4.patch, PIG-3294-before-refactory.patch
>
>
> It would be nice if Pig provide some interoperability with Hive. We can wrap 
> Hive UDF in Pig so we can use Hive UDF in Pig.
> This is a candidate project for Google summer of code 2013. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4493) Pig on Tez gives wrong results if Union is followed by Split

2015-04-02 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4493:

Description: POSplit subplans were not being cloned when the plans were 
cloned for pushing up in UnionOptimizer. This caused input plans being detached 
for one of them and they processed and produced 0 records.

> Pig on Tez gives wrong results if Union is followed by Split
> 
>
> Key: PIG-4493
> URL: https://issues.apache.org/jira/browse/PIG-4493
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4493-1.patch, PIG-4493-2.patch
>
>
> POSplit subplans were not being cloned when the plans were cloned for pushing 
> up in UnionOptimizer. This caused input plans being detached for one of them 
> and they processed and produced 0 records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4493) Pig on Tez gives wrong results if Union is followed by Split

2015-04-02 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4493:

Attachment: PIG-4493-2.patch

Updated patch to fix TestTezCompiler test failure. Also added clone to POFilter 
just in case even though there were no test failures as same operators in 
expression were being reused in the different plans. 

> Pig on Tez gives wrong results if Union is followed by Split
> 
>
> Key: PIG-4493
> URL: https://issues.apache.org/jira/browse/PIG-4493
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4493-1.patch, PIG-4493-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4495) Better multi-query planning in case of union and multiple edges

2015-04-02 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393142#comment-14393142
 ] 

Rohini Palaniswamy commented on PIG-4495:
-

[~daijy],
Would it be ok to do this in MultiQueryOptimizer itself by checking if 
union optimizer is turned on and the successor vertex is union or we should 
write another optimizer after UnionOptimizer to do it? It is more easy to do in 
MultiQueryOptimizer and would be less error prone. 

> Better multi-query planning in case of union and multiple edges
> ---
>
> Key: PIG-4495
> URL: https://issues.apache.org/jira/browse/PIG-4495
> Project: Pig
>  Issue Type: Sub-task
>  Components: tez
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
> Fix For: 0.15.0
>
>
> Details in 
> https://issues.apache.org/jira/browse/TEZ-1190?focusedCommentId=14393033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393033
> People split the data, perform some foreach transformations/filter, union 
> them and then do some operation like group by or join with other data. In 
> those cases it creates multiple edges from same Split, so we do not merge 
> them, but  
> write out the data to another dummy vertex to avoid multiple edges and this 
> adds overhead and affects performance. Vertex groups accept multiple edges 
> from same vertex. So if the multiple edges end up in a vertex group (and not 
> a vertex which is the case in self join) we can avoid the dummy vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4495) Better multi-query planning in case of union and multiple edges

2015-04-02 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4495:
---

 Summary: Better multi-query planning in case of union and multiple 
edges
 Key: PIG-4495
 URL: https://issues.apache.org/jira/browse/PIG-4495
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.14.0
Reporter: Rohini Palaniswamy


Details in 
https://issues.apache.org/jira/browse/TEZ-1190?focusedCommentId=14393033&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14393033

People split the data, perform some foreach transformations/filter, union them 
and then do some operation like group by or join with other data. In those 
cases it creates multiple edges from same Split, so we do not merge them, but  
write out the data to another dummy vertex to avoid multiple edges and this 
adds overhead and affects performance. Vertex groups accept multiple edges from 
same vertex. So if the multiple edges end up in a vertex group (and not a 
vertex which is the case in self join) we can avoid the dummy vertex.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-3294) Allow Pig use Hive UDFs

2015-04-02 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393057#comment-14393057
 ] 

Alan Gates commented on PIG-3294:
-

The checking in of Hive code is ugly.  We need to make sure that gets removed 
before a release so we don't end up forking.

In POForEach you are visiting the physical plan at run time to determine if we 
need the last record.  Could this not be done at compile time to save time and 
runtime?

HiveUtils.java: much of this code to convert Hive types to Pig types must 
already be in HCat.  Is it not possible to re-use that?

> Allow Pig use Hive UDFs
> ---
>
> Key: PIG-3294
> URL: https://issues.apache.org/jira/browse/PIG-3294
> Project: Pig
>  Issue Type: New Feature
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>  Labels: gsoc2013, java
> Fix For: 0.15.0
>
> Attachments: PIG-3294-1.patch, PIG-3294-2.patch, PIG-3294-3.patch, 
> PIG-3294-4.patch, PIG-3294-before-refactory.patch
>
>
> It would be nice if Pig provide some interoperability with Hive. We can wrap 
> Hive UDF in Pig so we can use Hive UDF in Pig.
> This is a candidate project for Google summer of code 2013. More information 
> about the program can be found at 
> https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PIG-4490) MIN/MAX builtin UDFs return wrong results when accumulating for strings

2015-04-02 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393006#comment-14393006
 ] 

Rohini Palaniswamy edited comment on PIG-4490 at 4/2/15 5:34 PM:
-

This is bad. Can you please add one testcase in TestBuiltin including both two 
UDFs for the case of accumulator?


was (Author: rohini):
This is bad. Can you please add a testcase in TestBuiltin for these two UDFs 
for the case of accumulator?

> MIN/MAX builtin UDFs return wrong results when accumulating for strings
> ---
>
> Key: PIG-4490
> URL: https://issues.apache.org/jira/browse/PIG-4490
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.12.0, 0.13.0, 0.14.0
>Reporter: xplenty
> Attachments: fix-min-max.patch
>
>
> When using MIN/MAX UDFs with strings in a job that uses the accumulator 
> interface the results are wrong - The UDF won't return the correct MIN/MAX 
> value.
> this is caused by a reverse 'GreaterThan/SmallerThan" (<>) sign in the 
> accumulate() function of both StringMin/StringMax UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4490) MIN/MAX builtin UDFs return wrong results when accumulating for strings

2015-04-02 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14393006#comment-14393006
 ] 

Rohini Palaniswamy commented on PIG-4490:
-

This is bad. Can you please add a testcase in TestBuiltin for these two UDFs 
for the case of accumulator?

> MIN/MAX builtin UDFs return wrong results when accumulating for strings
> ---
>
> Key: PIG-4490
> URL: https://issues.apache.org/jira/browse/PIG-4490
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.12.0, 0.13.0, 0.14.0
>Reporter: xplenty
> Attachments: fix-min-max.patch
>
>
> When using MIN/MAX UDFs with strings in a job that uses the accumulator 
> interface the results are wrong - The UDF won't return the correct MIN/MAX 
> value.
> this is caused by a reverse 'GreaterThan/SmallerThan" (<>) sign in the 
> accumulate() function of both StringMin/StringMax UDFs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4483) Pig on Tez output statistics shows storing to same directory twice for union

2015-04-02 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4483:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the review Daniel.

> Pig on Tez output statistics shows storing to same directory twice for union
> 
>
> Key: PIG-4483
> URL: https://issues.apache.org/jira/browse/PIG-4483
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4483-1.patch
>
>
> For the below script
> A = LOAD 'data1';
> B = LOAD 'data2';
> C = UNION A, B;
> STORE C into 'data3';
> Output message is shown as below due to vertex group and storing from 
> separate vertices.
> Successfully stored 10 records (xxx bytes) in: "data3"
> Successfully stored 20 records (yyy bytes) in: "data3"
> Even though it is correct it can be confusing for users and they have to sum 
> it up before comparing to Pig on MR output message. OutputStats with same 
> filename should be combined and shown as
> Successfully stored 30 records (xxx bytes) in: "data3"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)