[jira] Subscription: PIG patch available

2016-03-10 Thread jira
Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key Summary
PIG-4796Authenticate with Kerberos using a keytab file
https://issues.apache.org/jira/browse/PIG-4796
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3906ant site errors out
https://issues.apache.org/jira/browse/PIG-3906
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384


[jira] [Assigned] (PIG-4837) TestNativeMapReduce test fix

2016-03-10 Thread Xianda Ke (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianda Ke reassigned PIG-4837:
--

Assignee: Xianda Ke

> TestNativeMapReduce test fix
> 
>
> Key: PIG-4837
> URL: https://issues.apache.org/jira/browse/PIG-4837
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: liyunzhang_intel
>Assignee: Xianda Ke
> Fix For: spark-branch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4834) Left Outer Skewed Join produces incorrect results

2016-03-10 Thread Nathan Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189920#comment-15189920
 ] 

Nathan Smith commented on PIG-4834:
---

The head of 0.15.1 branch works as expected. Thanks.

> Left Outer Skewed Join produces incorrect results
> -
>
> Key: PIG-4834
> URL: https://issues.apache.org/jira/browse/PIG-4834
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
> Environment: HDP 2.3.2
> Pig 0.15.0.2.3.2.0-2950
> 5 node cluster (2 name, 3 data)
>Reporter: Nathan Smith
> Attachments: non-skewed-version.png, skewed-version.png
>
>
> I've been working on a Pig script to join some datasets recently and I think 
> I found a bug in Left Outer Join using "skewed". In an attempt to speed up 
> what seemed to be some joins on skewed data I used the 'skewed' keyword, but 
> the skewed version produced a different number of results. The dataflow is 
> quite large, but I've isolated the jobs where the results start to differ.
> Non-skewed version:
> * 36 map tasks
> * 5 reduce tasks
> * shortest reducer: 46sec
> * longest reducer: 7min, 9sec
> * input records: 16,903,866
> * output records: 16,891,935
> {code}
> out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1;
> {code}
> Skewed version:
> * 36 map tasks
> * 5 reduce tasks
> * shortest reducer: 1min, 34sec
> * longest reducer: 2min, 15sec
> * input records: 16,903,866
> * output records: 7,916,768
> {code}
> out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1 USING 'skewed';
> {code}
> The two scripts are identical except for each join has {{USING 'skewed'}}. My 
> understanding is that using "skewed" should produce the same results, except 
> that it does a preliminary scan to determine the best reducer distribution 
> scheme.
> See attached for screenshots of the counters page for both versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4840) Do not turn off UnionOptimizer for unsupported storefuncs in case of no vertex groups

2016-03-10 Thread Rohini Palaniswamy (JIRA)
Rohini Palaniswamy created PIG-4840:
---

 Summary: Do not turn off UnionOptimizer for unsupported storefuncs 
in case of no vertex groups
 Key: PIG-4840
 URL: https://issues.apache.org/jira/browse/PIG-4840
 Project: Pig
  Issue Type: Improvement
Reporter: Rohini Palaniswamy
Assignee: Rohini Palaniswamy
 Fix For: 0.16.0


  We turn of UnionOptimizer for unsupported storefuncs as writing from two 
vertices may overwrite data. But in the case where there is only one unique 
union member we don't create vertex groups and merge the union operators into 
the Split vertex, we can turn it on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4834) Left Outer Skewed Join produces incorrect results

2016-03-10 Thread Nathan Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189588#comment-15189588
 ] 

Nathan Smith commented on PIG-4834:
---

Thanks for noting that, I'll try a newer version when I get the chance.

> Left Outer Skewed Join produces incorrect results
> -
>
> Key: PIG-4834
> URL: https://issues.apache.org/jira/browse/PIG-4834
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
> Environment: HDP 2.3.2
> Pig 0.15.0.2.3.2.0-2950
> 5 node cluster (2 name, 3 data)
>Reporter: Nathan Smith
> Attachments: non-skewed-version.png, skewed-version.png
>
>
> I've been working on a Pig script to join some datasets recently and I think 
> I found a bug in Left Outer Join using "skewed". In an attempt to speed up 
> what seemed to be some joins on skewed data I used the 'skewed' keyword, but 
> the skewed version produced a different number of results. The dataflow is 
> quite large, but I've isolated the jobs where the results start to differ.
> Non-skewed version:
> * 36 map tasks
> * 5 reduce tasks
> * shortest reducer: 46sec
> * longest reducer: 7min, 9sec
> * input records: 16,903,866
> * output records: 16,891,935
> {code}
> out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1;
> {code}
> Skewed version:
> * 36 map tasks
> * 5 reduce tasks
> * shortest reducer: 1min, 34sec
> * longest reducer: 2min, 15sec
> * input records: 16,903,866
> * output records: 7,916,768
> {code}
> out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1 USING 'skewed';
> {code}
> The two scripts are identical except for each join has {{USING 'skewed'}}. My 
> understanding is that using "skewed" should produce the same results, except 
> that it does a preliminary scan to determine the best reducer distribution 
> scheme.
> See attached for screenshots of the counters page for both versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4836) Fix TestEvalPipeline test failure

2016-03-10 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-4836:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to Spark branch. Thanks, Pallavi!

> Fix TestEvalPipeline test failure
> -
>
> Key: PIG-4836
> URL: https://issues.apache.org/jira/browse/PIG-4836
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4836.patch
>
>
> There are two test failures:
> testMapUDF
> testLimit 
> testLimit will get fixed by PIG-4832. This JIRA will only fix testMapUDF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4836) Fix TestEvalPipeline test failure

2016-03-10 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189445#comment-15189445
 ] 

Mohit Sabharwal commented on PIG-4836:
--

[~xuefuz], please commit when you get get a chance.

> Fix TestEvalPipeline test failure
> -
>
> Key: PIG-4836
> URL: https://issues.apache.org/jira/browse/PIG-4836
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4836.patch
>
>
> There are two test failures:
> testMapUDF
> testLimit 
> testLimit will get fixed by PIG-4832. This JIRA will only fix testMapUDF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4836) Fix TestEvalPipeline test failure

2016-03-10 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189224#comment-15189224
 ] 

Mohit Sabharwal commented on PIG-4836:
--

+1 
Thanks, [~pallavi.rao]. 


> Fix TestEvalPipeline test failure
> -
>
> Key: PIG-4836
> URL: https://issues.apache.org/jira/browse/PIG-4836
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Pallavi Rao
>Assignee: Pallavi Rao
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: PIG-4836.patch
>
>
> There are two test failures:
> testMapUDF
> testLimit 
> testLimit will get fixed by PIG-4832. This JIRA will only fix testMapUDF.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4839) MultiQueryOptimizerSpark doesn't remove all redudant nodes in spark plan

2016-03-10 Thread liyunzhang_intel (JIRA)
liyunzhang_intel created PIG-4839:
-

 Summary: MultiQueryOptimizerSpark doesn't remove all redudant 
nodes in spark plan
 Key: PIG-4839
 URL: https://issues.apache.org/jira/browse/PIG-4839
 Project: Pig
  Issue Type: Sub-task
Reporter: liyunzhang_intel
Assignee: liyunzhang_intel


TestMultiQueryBasic#testMultiQueryWithFJ_2
{code}
a = load './passwd' using PigStorage(':') as (uname:chararray,passwd:chararray, 
uid:int, gid:int);
b = load './passwd' using PigStorage(':') as (uname:chararray,passwd:chararray, 
uid:int, gid:int);
c = filter a by uid > 5;
store c into './multiQueryFJ.output';
d = filter b by gid > 10;
store d into './multiQueryFJ.output.2';
e = join c by gid, d by gid using 'repl';
store e into './multiQueryFJ.output.3';
{code}

The spark plan:
{code}
before multiquery optimization:
scope-57->scope-60 scope-66
scope-60
scope-61->scope-64 scope-68
scope-64
scope-66
scope-68->scope-66
#--
# Spark Plan 
#--

Spark node scope-61
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage)
 - scope-62
|
|---d: Filter[bag] - scope-36
|   |
|   Greater Than[boolean] - scope-39
|   |
|   |---Project[int][3] - scope-37
|   |
|   |---Constant(10) - scope-38
|
|---b: New For Each(false,false,false,false)[bag] - scope-35
|   |
|   Cast[chararray] - scope-24
|   |
|   |---Project[bytearray][0] - scope-23
|   |
|   Cast[chararray] - scope-27
|   |
|   |---Project[bytearray][1] - scope-26
|   |
|   Cast[int] - scope-30
|   |
|   |---Project[bytearray][2] - scope-29
|   |
|   Cast[int] - scope-33
|   |
|   |---Project[bytearray][3] - scope-32
|
|---b: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
scope-22

Spark node scope-64
d: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage)
 - scope-43
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage)
 - scope-63

Spark node scope-68
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1233897062:org.apache.pig.impl.io.InterStorage)
 - scope-69
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage)
 - scope-67

Spark node scope-66
e: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.3:org.apache.pig.builtin.PigStorage)
 - scope-56
|
|---e: FRJoin[tuple] - scope-50
|   |
|   Project[int][3] - scope-48
|   |
|   Project[int][3] - scope-49
|

|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
 - scope-65

Spark node scope-57
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
 - scope-58
|
|---c: Filter[bag] - scope-14
|   |
|   Greater Than[boolean] - scope-17
|   |
|   |---Project[int][2] - scope-15
|   |
|   |---Constant(5) - scope-16
|
|---a: New For Each(false,false,false,false)[bag] - scope-13
|   |
|   Cast[chararray] - scope-2
|   |
|   |---Project[bytearray][0] - scope-1
|   |
|   Cast[chararray] - scope-5
|   |
|   |---Project[bytearray][1] - scope-4
|   |
|   Cast[int] - scope-8
|   |
|   |---Project[bytearray][2] - scope-7
|   |
|   Cast[int] - scope-11
|   |
|   |---Project[bytearray][3] - scope-10
|
|---a: 
Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - 
scope-0

Spark node scope-60
c: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output:org.apache.pig.builtin.PigStorage)
 - scope-21
|
|---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage)
 - scope-59
{code}

After spark multiquery optimization, 6 spark nodes will be reduced to 4.
scope-60 should be combined with scope-57 but not.
{code}
scope-57->scope-60 scope-66 
scope-60
scope-61->scope-66
scope-66
#--
# Spark Plan 
#--

Spark node scope-61
Split - scope-70
|   |
|   d: 
Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage)
 - scope-43
|   |
|   
Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1233897062:org.apache.pig.impl.io.InterStorage)
 - scope-69
|
|---d: Filter[bag] - scope-36
|