[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (30 issues) Subscriber: pigdaily Key Summary PIG-4796Authenticate with Kerberos using a keytab file https://issues.apache.org/jira/browse/PIG-4796 PIG-4788the value BytesRead metric info always returns 0 even the length of input file is not 0 in spark engine https://issues.apache.org/jira/browse/PIG-4788 PIG-4745DataBag should protect content of passed list of tuples https://issues.apache.org/jira/browse/PIG-4745 PIG-4734TOMAP schema inferring breaks some scripts in type checking for bincond https://issues.apache.org/jira/browse/PIG-4734 PIG-4684Exception should be changed to warning when job diagnostics cannot be fetched https://issues.apache.org/jira/browse/PIG-4684 PIG-4656Improve String serialization and comparator performance in BinInterSedes https://issues.apache.org/jira/browse/PIG-4656 PIG-4641Print the instance of Object without using toString() https://issues.apache.org/jira/browse/PIG-4641 PIG-4598Allow user defined plan optimizer rules https://issues.apache.org/jira/browse/PIG-4598 PIG-4581thread safe issue in NodeIdGenerator https://issues.apache.org/jira/browse/PIG-4581 PIG-4551Partition filter is not pushed down in case of SPLIT https://issues.apache.org/jira/browse/PIG-4551 PIG-4539New PigUnit https://issues.apache.org/jira/browse/PIG-4539 PIG-4526Make setting up the build environment easier https://issues.apache.org/jira/browse/PIG-4526 PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException https://issues.apache.org/jira/browse/PIG-4515 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3906ant site errors out https://issues.apache.org/jira/browse/PIG-3906 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange handling of Daylight Saving Time with location based timezones https://issues.apache.org/jira/browse/PIG-3864 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384
[jira] [Assigned] (PIG-4837) TestNativeMapReduce test fix
[ https://issues.apache.org/jira/browse/PIG-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xianda Ke reassigned PIG-4837: -- Assignee: Xianda Ke > TestNativeMapReduce test fix > > > Key: PIG-4837 > URL: https://issues.apache.org/jira/browse/PIG-4837 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: liyunzhang_intel >Assignee: Xianda Ke > Fix For: spark-branch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4834) Left Outer Skewed Join produces incorrect results
[ https://issues.apache.org/jira/browse/PIG-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189920#comment-15189920 ] Nathan Smith commented on PIG-4834: --- The head of 0.15.1 branch works as expected. Thanks. > Left Outer Skewed Join produces incorrect results > - > > Key: PIG-4834 > URL: https://issues.apache.org/jira/browse/PIG-4834 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 > Environment: HDP 2.3.2 > Pig 0.15.0.2.3.2.0-2950 > 5 node cluster (2 name, 3 data) >Reporter: Nathan Smith > Attachments: non-skewed-version.png, skewed-version.png > > > I've been working on a Pig script to join some datasets recently and I think > I found a bug in Left Outer Join using "skewed". In an attempt to speed up > what seemed to be some joins on skewed data I used the 'skewed' keyword, but > the skewed version produced a different number of results. The dataflow is > quite large, but I've isolated the jobs where the results start to differ. > Non-skewed version: > * 36 map tasks > * 5 reduce tasks > * shortest reducer: 46sec > * longest reducer: 7min, 9sec > * input records: 16,903,866 > * output records: 16,891,935 > {code} > out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1; > {code} > Skewed version: > * 36 map tasks > * 5 reduce tasks > * shortest reducer: 1min, 34sec > * longest reducer: 2min, 15sec > * input records: 16,903,866 > * output records: 7,916,768 > {code} > out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1 USING 'skewed'; > {code} > The two scripts are identical except for each join has {{USING 'skewed'}}. My > understanding is that using "skewed" should produce the same results, except > that it does a preliminary scan to determine the best reducer distribution > scheme. > See attached for screenshots of the counters page for both versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4840) Do not turn off UnionOptimizer for unsupported storefuncs in case of no vertex groups
Rohini Palaniswamy created PIG-4840: --- Summary: Do not turn off UnionOptimizer for unsupported storefuncs in case of no vertex groups Key: PIG-4840 URL: https://issues.apache.org/jira/browse/PIG-4840 Project: Pig Issue Type: Improvement Reporter: Rohini Palaniswamy Assignee: Rohini Palaniswamy Fix For: 0.16.0 We turn of UnionOptimizer for unsupported storefuncs as writing from two vertices may overwrite data. But in the case where there is only one unique union member we don't create vertex groups and merge the union operators into the Split vertex, we can turn it on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4834) Left Outer Skewed Join produces incorrect results
[ https://issues.apache.org/jira/browse/PIG-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189588#comment-15189588 ] Nathan Smith commented on PIG-4834: --- Thanks for noting that, I'll try a newer version when I get the chance. > Left Outer Skewed Join produces incorrect results > - > > Key: PIG-4834 > URL: https://issues.apache.org/jira/browse/PIG-4834 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 > Environment: HDP 2.3.2 > Pig 0.15.0.2.3.2.0-2950 > 5 node cluster (2 name, 3 data) >Reporter: Nathan Smith > Attachments: non-skewed-version.png, skewed-version.png > > > I've been working on a Pig script to join some datasets recently and I think > I found a bug in Left Outer Join using "skewed". In an attempt to speed up > what seemed to be some joins on skewed data I used the 'skewed' keyword, but > the skewed version produced a different number of results. The dataflow is > quite large, but I've isolated the jobs where the results start to differ. > Non-skewed version: > * 36 map tasks > * 5 reduce tasks > * shortest reducer: 46sec > * longest reducer: 7min, 9sec > * input records: 16,903,866 > * output records: 16,891,935 > {code} > out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1; > {code} > Skewed version: > * 36 map tasks > * 5 reduce tasks > * shortest reducer: 1min, 34sec > * longest reducer: 2min, 15sec > * input records: 16,903,866 > * output records: 7,916,768 > {code} > out = JOIN leftrel BY prevrel::f1 LEFT OUTER, rightrel BY f1 USING 'skewed'; > {code} > The two scripts are identical except for each join has {{USING 'skewed'}}. My > understanding is that using "skewed" should produce the same results, except > that it does a preliminary scan to determine the best reducer distribution > scheme. > See attached for screenshots of the counters page for both versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4836) Fix TestEvalPipeline test failure
[ https://issues.apache.org/jira/browse/PIG-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-4836: - Resolution: Fixed Status: Resolved (was: Patch Available) Committed to Spark branch. Thanks, Pallavi! > Fix TestEvalPipeline test failure > - > > Key: PIG-4836 > URL: https://issues.apache.org/jira/browse/PIG-4836 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Pallavi Rao >Assignee: Pallavi Rao > Labels: spork > Fix For: spark-branch > > Attachments: PIG-4836.patch > > > There are two test failures: > testMapUDF > testLimit > testLimit will get fixed by PIG-4832. This JIRA will only fix testMapUDF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4836) Fix TestEvalPipeline test failure
[ https://issues.apache.org/jira/browse/PIG-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189445#comment-15189445 ] Mohit Sabharwal commented on PIG-4836: -- [~xuefuz], please commit when you get get a chance. > Fix TestEvalPipeline test failure > - > > Key: PIG-4836 > URL: https://issues.apache.org/jira/browse/PIG-4836 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Pallavi Rao >Assignee: Pallavi Rao > Labels: spork > Fix For: spark-branch > > Attachments: PIG-4836.patch > > > There are two test failures: > testMapUDF > testLimit > testLimit will get fixed by PIG-4832. This JIRA will only fix testMapUDF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4836) Fix TestEvalPipeline test failure
[ https://issues.apache.org/jira/browse/PIG-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189224#comment-15189224 ] Mohit Sabharwal commented on PIG-4836: -- +1 Thanks, [~pallavi.rao]. > Fix TestEvalPipeline test failure > - > > Key: PIG-4836 > URL: https://issues.apache.org/jira/browse/PIG-4836 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Pallavi Rao >Assignee: Pallavi Rao > Labels: spork > Fix For: spark-branch > > Attachments: PIG-4836.patch > > > There are two test failures: > testMapUDF > testLimit > testLimit will get fixed by PIG-4832. This JIRA will only fix testMapUDF. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4839) MultiQueryOptimizerSpark doesn't remove all redudant nodes in spark plan
liyunzhang_intel created PIG-4839: - Summary: MultiQueryOptimizerSpark doesn't remove all redudant nodes in spark plan Key: PIG-4839 URL: https://issues.apache.org/jira/browse/PIG-4839 Project: Pig Issue Type: Sub-task Reporter: liyunzhang_intel Assignee: liyunzhang_intel TestMultiQueryBasic#testMultiQueryWithFJ_2 {code} a = load './passwd' using PigStorage(':') as (uname:chararray,passwd:chararray, uid:int, gid:int); b = load './passwd' using PigStorage(':') as (uname:chararray,passwd:chararray, uid:int, gid:int); c = filter a by uid > 5; store c into './multiQueryFJ.output'; d = filter b by gid > 10; store d into './multiQueryFJ.output.2'; e = join c by gid, d by gid using 'repl'; store e into './multiQueryFJ.output.3'; {code} The spark plan: {code} before multiquery optimization: scope-57->scope-60 scope-66 scope-60 scope-61->scope-64 scope-68 scope-64 scope-66 scope-68->scope-66 #-- # Spark Plan #-- Spark node scope-61 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage) - scope-62 | |---d: Filter[bag] - scope-36 | | | Greater Than[boolean] - scope-39 | | | |---Project[int][3] - scope-37 | | | |---Constant(10) - scope-38 | |---b: New For Each(false,false,false,false)[bag] - scope-35 | | | Cast[chararray] - scope-24 | | | |---Project[bytearray][0] - scope-23 | | | Cast[chararray] - scope-27 | | | |---Project[bytearray][1] - scope-26 | | | Cast[int] - scope-30 | | | |---Project[bytearray][2] - scope-29 | | | Cast[int] - scope-33 | | | |---Project[bytearray][3] - scope-32 | |---b: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-22 Spark node scope-64 d: Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage) - scope-43 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage) - scope-63 Spark node scope-68 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1233897062:org.apache.pig.impl.io.InterStorage) - scope-69 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1814908586:org.apache.pig.impl.io.InterStorage) - scope-67 Spark node scope-66 e: Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.3:org.apache.pig.builtin.PigStorage) - scope-56 | |---e: FRJoin[tuple] - scope-50 | | | Project[int][3] - scope-48 | | | Project[int][3] - scope-49 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage) - scope-65 Spark node scope-57 Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage) - scope-58 | |---c: Filter[bag] - scope-14 | | | Greater Than[boolean] - scope-17 | | | |---Project[int][2] - scope-15 | | | |---Constant(5) - scope-16 | |---a: New For Each(false,false,false,false)[bag] - scope-13 | | | Cast[chararray] - scope-2 | | | |---Project[bytearray][0] - scope-1 | | | Cast[chararray] - scope-5 | | | |---Project[bytearray][1] - scope-4 | | | Cast[int] - scope-8 | | | |---Project[bytearray][2] - scope-7 | | | Cast[int] - scope-11 | | | |---Project[bytearray][3] - scope-10 | |---a: Load(hdfs://zly1.sh.intel.com:8020/user/root/passwd:PigStorage(':')) - scope-0 Spark node scope-60 c: Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output:org.apache.pig.builtin.PigStorage) - scope-21 | |---Load(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp929915440:org.apache.pig.impl.io.InterStorage) - scope-59 {code} After spark multiquery optimization, 6 spark nodes will be reduced to 4. scope-60 should be combined with scope-57 but not. {code} scope-57->scope-60 scope-66 scope-60 scope-61->scope-66 scope-66 #-- # Spark Plan #-- Spark node scope-61 Split - scope-70 | | | d: Store(hdfs://zly1.sh.intel.com:8020/user/root/multiQueryFJ.output.2:org.apache.pig.builtin.PigStorage) - scope-43 | | | Store(hdfs://zly1.sh.intel.com:8020/tmp/temp-1338833908/tmp-1233897062:org.apache.pig.impl.io.InterStorage) - scope-69 | |---d: Filter[bag] - scope-36 |