[jira] [Updated] (PIG-4481) e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and StreamingPerformance_4 produce different result on Windows
[ https://issues.apache.org/jira/browse/PIG-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4481: Attachment: PIG-4481-3.patch > e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and > StreamingPerformance_4 produce different result on Windows > -- > > Key: PIG-4481 > URL: https://issues.apache.org/jira/browse/PIG-4481 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Labels: windows > Fix For: 0.15.0 > > Attachments: PIG-4481-1.patch, PIG-4481-2.patch, PIG-4481-3.patch > > > ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and > StreamingPerformance_4 produce the wrong result on Windows. Since Pig compare > the test result with old version of Pig, which also produce wrong result, the > test still pass. > The cause of the issue is the parameter passing under Windows. Some parameter > of executable cannot pass correctly on Windows. StreamingPerformance_3, > StreamingPerformance_4 requires a simple quoting change and command line > change. However, I didn't find a proper way to fix ComputeSpec_1 and > ComputeSpec_2. Changing the test slightly to get around (not changing the > intention of the test). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4481) e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and StreamingPerformance_4 produce different result on Windows
[ https://issues.apache.org/jira/browse/PIG-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4481: Attachment: PIG-4481-2.patch > e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and > StreamingPerformance_4 produce different result on Windows > -- > > Key: PIG-4481 > URL: https://issues.apache.org/jira/browse/PIG-4481 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Labels: windows > Fix For: 0.15.0 > > Attachments: PIG-4481-1.patch, PIG-4481-2.patch > > > ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and > StreamingPerformance_4 produce the wrong result on Windows. Since Pig compare > the test result with old version of Pig, which also produce wrong result, the > test still pass. > The cause of the issue is the parameter passing under Windows. Some parameter > of executable cannot pass correctly on Windows. StreamingPerformance_3, > StreamingPerformance_4 requires a simple quoting change and command line > change. However, I didn't find a proper way to fix ComputeSpec_1 and > ComputeSpec_2. Changing the test slightly to get around (not changing the > intention of the test). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4481) e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and StreamingPerformance_4 produce different result on Windows
[ https://issues.apache.org/jira/browse/PIG-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4481: Description: ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and StreamingPerformance_4 produce the wrong result on Windows. Since Pig compare the test result with old version of Pig, which also produce wrong result, the test still pass. The cause of the issue is the parameter passing under Windows. Some parameter of executable cannot pass correctly on Windows. StreamingPerformance_3, StreamingPerformance_4 requires a simple quoting change and command line change. However, I didn't find a proper way to fix ComputeSpec_1 and ComputeSpec_2. Changing the test slightly to get around (not changing the intention of the test). was: ComputeSpec_1, ComputeSpec_2 and StreamingPerformance_3 produce the wrong result on Windows. Since Pig compare the test result with old version of Pig, which also produce wrong result, the test still pass. The cause of the issue is the parameter passing under Windows. Some parameter of executable cannot pass correctly on Windows. StreamingPerformance_3 requires a simple quoting change. However, I didn't find a proper way to fix ComputeSpec_1 and ComputeSpec_2. Changing the test slightly to get around (not changing the intention of the test). Summary: e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and StreamingPerformance_4 produce different result on Windows (was: e2e tests ComputeSpec_1, ComputeSpec_2 and StreamingPerformance_3 produce different result on Windows) > e2e tests ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and > StreamingPerformance_4 produce different result on Windows > -- > > Key: PIG-4481 > URL: https://issues.apache.org/jira/browse/PIG-4481 > Project: Pig > Issue Type: Bug > Components: e2e harness >Reporter: Daniel Dai >Assignee: Daniel Dai > Labels: windows > Fix For: 0.15.0 > > Attachments: PIG-4481-1.patch, PIG-4481-2.patch > > > ComputeSpec_1, ComputeSpec_2, StreamingPerformance_3 and > StreamingPerformance_4 produce the wrong result on Windows. Since Pig compare > the test result with old version of Pig, which also produce wrong result, the > test still pass. > The cause of the issue is the parameter passing under Windows. Some parameter > of executable cannot pass correctly on Windows. StreamingPerformance_3, > StreamingPerformance_4 requires a simple quoting change and command line > change. However, I didn't find a proper way to fix ComputeSpec_1 and > ComputeSpec_2. Changing the test slightly to get around (not changing the > intention of the test). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (PIG-4482) Pig pushes matches operator to HCatLoader causing script to fail
Rohini Palaniswamy created PIG-4482: --- Summary: Pig pushes matches operator to HCatLoader causing script to fail Key: PIG-4482 URL: https://issues.apache.org/jira/browse/PIG-4482 Project: Pig Issue Type: Bug Affects Versions: 0.12.0 Reporter: Rohini Palaniswamy HCatLoader fails with as it cannot understand the matches operator. Even if we don't push down, specifying regular expression in partition key will be bad for performance as it will scan the whole table. Need to see if hcat can indeed support basic wildcard regular expression and translate it to LIKE clause in database query. {code} java.io.IOException: MetaException(message:Error parsing partition filter; lexer error: null; exception NoViableAltException(11@[])) at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:95) at org.apache.hcatalog.mapreduce.HCatInputFormat.setInput(HCatInputFormat.java:59) at org.apache.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:121) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4434) Improve auto-parallelism for tez
[ https://issues.apache.org/jira/browse/PIG-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4434: Attachment: PIG-4434-2.patch Several updates to the original patch. This patch also depends on some Tez fixes. I will link the Tez Jira later once being created. > Improve auto-parallelism for tez > > > Key: PIG-4434 > URL: https://issues.apache.org/jira/browse/PIG-4434 > Project: Pig > Issue Type: Improvement > Components: tez >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.15.0 > > Attachments: PIG-4434-1.patch, PIG-4434-2.patch > > > Tez auto-parallelism currently has some limitation: > 1. ShuffledVertexManager only decrease parallelism not increase > 2. Pig currently exaggerate parallelism at frontend, ShuffledVertexManager > might get initial parallelism way large than actual, that would be costly > Instead of that, we can gradually adjust initial vertex parallelism at > runtime once upstream vertexes finishes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.
[ https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378313#comment-14378313 ] Ratandeep Ratti commented on PIG-4417: -- Some of the reasons I can think of for this feature * Dependencies (udf jars) are much more declarative which this change, instead of copy jar to gateway and register in pig script, all the user has to do is add the ivy coordinates in his register command. This saves an annoying step. Also if the udf has other dependencies the annoyance is compounded. * Platforms like Oozie can greatly benefit from this. Instead of shipping a large zip of pig udfs along with the pig-script, users could upload the minimal zip, the Pig script could take care of downloading those dependencies from the internal/external repository. Most commonly used udfs/jars will automatically be cached (ivy cached) . Ivy will resolve these commonly used jars from the local cache. Instead of say every user bundling up the udf jar in his/her zip. * By having ivy coordinates for udf jars we know exactly what version of a udf jar is being used in a Pig script. > Pig's register command should support automatic fetching of jars from repo. > --- > > Key: PIG-4417 > URL: https://issues.apache.org/jira/browse/PIG-4417 > Project: Pig > Issue Type: Improvement >Reporter: Akshay Rai >Assignee: Akshay Rai > > Currently Pig's register command takes a local path to a dependency jar . > This clutters the local file-system as users may forget to remove this jar > later. > It would be nice if Pig supported a Gradle like notation to download the jar > from a repository. > Ex: At the top of the Pig script a user could add > register '::'; > It should be backward compatible and should support a local file path if so > desired. > RB: https://reviews.apache.org/r/31662/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4458) Support UDFs in a FOREACH Before a Merge Join
[ https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378251#comment-14378251 ] William Watson commented on PIG-4458: - No problem, thanks for merging it down. > Support UDFs in a FOREACH Before a Merge Join > - > > Key: PIG-4458 > URL: https://issues.apache.org/jira/browse/PIG-4458 > Project: Pig > Issue Type: New Feature >Reporter: William Watson >Assignee: William Watson > Fix For: 0.15.0 > > Attachments: PIG-4458.04.remove-merge-join-udf-restriction.patch, > PIG-4458.05.remove-merge-join-udf-restriction.patch > > > Right now, the MapSideMergeValidator outright rejects any foreach that has a > UDF in it: > {code} > private boolean isAcceptableForEachOp(Operator lo) throws > LogicalToPhysicalTranslatorException { > if (lo instanceof LOForEach) { > OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan(); > validateMapSideMerge(innerPlan.getSinks(), innerPlan); > return !containsUDFs((LOForEach) lo); > } else { > return false; > } > } > {code} > There is a TODO for this later on in that same class (inside containsUDFs): > {code} > // TODO (dvryaboy): in the future we could relax this rule by tracing what > fields > // are being passed into the UDF, and only refusing if the UDF is working on > the > // join key. Transforms of other fields should be ok. > {code} > We should do the TODO and relax this requirement or just remove it altogether -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.
[ https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378244#comment-14378244 ] Daniel Dai commented on PIG-4417: - [~akshayrai09], try to understand why you need this. Sounds like you can simply download jar and do register in Pig. Does other language use similar syntax? > Pig's register command should support automatic fetching of jars from repo. > --- > > Key: PIG-4417 > URL: https://issues.apache.org/jira/browse/PIG-4417 > Project: Pig > Issue Type: Improvement >Reporter: Akshay Rai >Assignee: Akshay Rai > > Currently Pig's register command takes a local path to a dependency jar . > This clutters the local file-system as users may forget to remove this jar > later. > It would be nice if Pig supported a Gradle like notation to download the jar > from a repository. > Ex: At the top of the Pig script a user could add > register '::'; > It should be backward compatible and should support a local file path if so > desired. > RB: https://reviews.apache.org/r/31662/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.
[ https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378240#comment-14378240 ] Ratandeep Ratti commented on PIG-4417: -- Thanks [~alangates] for the quick feedback. [~akshayrai09] please update the ticket with the latest reviewed patch. > Pig's register command should support automatic fetching of jars from repo. > --- > > Key: PIG-4417 > URL: https://issues.apache.org/jira/browse/PIG-4417 > Project: Pig > Issue Type: Improvement >Reporter: Akshay Rai >Assignee: Akshay Rai > > Currently Pig's register command takes a local path to a dependency jar . > This clutters the local file-system as users may forget to remove this jar > later. > It would be nice if Pig supported a Gradle like notation to download the jar > from a repository. > Ex: At the top of the Pig script a user could add > register '::'; > It should be backward compatible and should support a local file path if so > desired. > RB: https://reviews.apache.org/r/31662/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4417) Pig's register command should support automatic fetching of jars from repo.
[ https://issues.apache.org/jira/browse/PIG-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378198#comment-14378198 ] Alan Gates commented on PIG-4417: - A couple of comments: # Review board is great for reviewing the patch, but to be official it has to be attached here too. # Why is the DownloadResolver all static? Why not make it an object with a single method? This is just a style gripe and not a blocker for checking in the code. > Pig's register command should support automatic fetching of jars from repo. > --- > > Key: PIG-4417 > URL: https://issues.apache.org/jira/browse/PIG-4417 > Project: Pig > Issue Type: Improvement >Reporter: Akshay Rai >Assignee: Akshay Rai > > Currently Pig's register command takes a local path to a dependency jar . > This clutters the local file-system as users may forget to remove this jar > later. > It would be nice if Pig supported a Gradle like notation to download the jar > from a repository. > Ex: At the top of the Pig script a user could add > register '::'; > It should be backward compatible and should support a local file path if so > desired. > RB: https://reviews.apache.org/r/31662/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4457) Error is thrown by JobStats.getOutputSize() when storing to a MySql table
[ https://issues.apache.org/jira/browse/PIG-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4457: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks for the review Daniel. > Error is thrown by JobStats.getOutputSize() when storing to a MySql table > - > > Key: PIG-4457 > URL: https://issues.apache.org/jira/browse/PIG-4457 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Kunal Kumar >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4457-1.patch > > > Here is an example of stack trace printed to console output. Actually, this > is a warning message and does not make the job fail. The data is getting > stored to mysql table, but i have no idea why pig is looking to store output > on hdfs. I am using PIg along with Tez. > using output size reader: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader > unable to find the output file > java.io.FileNotFoundException: File > hdfs://pts0021.persistent.co.in:9000/user/shareinsights/filtered_stock_data > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:647) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:101) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:705) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:701) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:701) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:81) > at > org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:351) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.addOutputStatistics(TezVertexStats.java:270) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.accumulateStats(TezVertexStats.java:188) > at > org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:209) > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4475) Keys in AvroMapWrapper are not proper Pig types
[ https://issues.apache.org/jira/browse/PIG-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4475: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Patch committed to trunk. Thanks Ratandeep, Anthony! > Keys in AvroMapWrapper are not proper Pig types > --- > > Key: PIG-4475 > URL: https://issues.apache.org/jira/browse/PIG-4475 > Project: Pig > Issue Type: Bug >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 0.15.0 > > Attachments: PIG-4475.patch, PIG-4475_1.patch > > > AvroMapWrapper could contain utf8 keys, which are not supported by Pig. Pig > expects keys to be of type String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Feedback on PIG-4417
Hi Folks, We'd appreciate it if we can get more feedback on this. Ticket: https://issues.apache.org/jira/browse/PIG-4417 RB: https://reviews.apache.org/r/31662/ Best, R
[jira] [Updated] (PIG-4458) Support UDFs in a FOREACH Before a Merge Join
[ https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4458: Resolution: Fixed Fix Version/s: 0.15.0 Assignee: William Watson Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Sorry miss this. The new patch looks good. Patch committed to trunk. Thanks William! > Support UDFs in a FOREACH Before a Merge Join > - > > Key: PIG-4458 > URL: https://issues.apache.org/jira/browse/PIG-4458 > Project: Pig > Issue Type: New Feature >Reporter: William Watson >Assignee: William Watson > Fix For: 0.15.0 > > Attachments: PIG-4458.04.remove-merge-join-udf-restriction.patch, > PIG-4458.05.remove-merge-join-udf-restriction.patch > > > Right now, the MapSideMergeValidator outright rejects any foreach that has a > UDF in it: > {code} > private boolean isAcceptableForEachOp(Operator lo) throws > LogicalToPhysicalTranslatorException { > if (lo instanceof LOForEach) { > OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan(); > validateMapSideMerge(innerPlan.getSinks(), innerPlan); > return !containsUDFs((LOForEach) lo); > } else { > return false; > } > } > {code} > There is a TODO for this later on in that same class (inside containsUDFs): > {code} > // TODO (dvryaboy): in the future we could relax this rule by tracing what > fields > // are being passed into the UDF, and only refusing if the UDF is working on > the > // join key. Transforms of other fields should be ok. > {code} > We should do the TODO and relax this requirement or just remove it altogether -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4457) Error is thrown by JobStats.getOutputSize() when storing to a MySql table
[ https://issues.apache.org/jira/browse/PIG-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378112#comment-14378112 ] Daniel Dai commented on PIG-4457: - +1 > Error is thrown by JobStats.getOutputSize() when storing to a MySql table > - > > Key: PIG-4457 > URL: https://issues.apache.org/jira/browse/PIG-4457 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Kunal Kumar >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4457-1.patch > > > Here is an example of stack trace printed to console output. Actually, this > is a warning message and does not make the job fail. The data is getting > stored to mysql table, but i have no idea why pig is looking to store output > on hdfs. I am using PIg along with Tez. > using output size reader: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader > unable to find the output file > java.io.FileNotFoundException: File > hdfs://pts0021.persistent.co.in:9000/user/shareinsights/filtered_stock_data > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:647) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:101) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:705) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:701) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:701) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:81) > at > org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:351) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.addOutputStatistics(TezVertexStats.java:270) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.accumulateStats(TezVertexStats.java:188) > at > org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:209) > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4475) Keys in AvroMapWrapper are not proper Pig types
[ https://issues.apache.org/jira/browse/PIG-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratandeep Ratti updated PIG-4475: - Attachment: PIG-4475_1.patch Addressing Anthony's comments > Keys in AvroMapWrapper are not proper Pig types > --- > > Key: PIG-4475 > URL: https://issues.apache.org/jira/browse/PIG-4475 > Project: Pig > Issue Type: Bug >Reporter: Ratandeep Ratti >Assignee: Ratandeep Ratti > Fix For: 0.15.0 > > Attachments: PIG-4475.patch, PIG-4475_1.patch > > > AvroMapWrapper could contain utf8 keys, which are not supported by Pig. Pig > expects keys to be of type String. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4458) Support UDFs in a FOREACH Before a Merge Join
[ https://issues.apache.org/jira/browse/PIG-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377780#comment-14377780 ] William Watson commented on PIG-4458: - Anything else I should do to get this merged down? Thanks! > Support UDFs in a FOREACH Before a Merge Join > - > > Key: PIG-4458 > URL: https://issues.apache.org/jira/browse/PIG-4458 > Project: Pig > Issue Type: New Feature >Reporter: William Watson > Attachments: PIG-4458.04.remove-merge-join-udf-restriction.patch, > PIG-4458.05.remove-merge-join-udf-restriction.patch > > > Right now, the MapSideMergeValidator outright rejects any foreach that has a > UDF in it: > {code} > private boolean isAcceptableForEachOp(Operator lo) throws > LogicalToPhysicalTranslatorException { > if (lo instanceof LOForEach) { > OperatorPlan innerPlan = ((LOForEach) lo).getInnerPlan(); > validateMapSideMerge(innerPlan.getSinks(), innerPlan); > return !containsUDFs((LOForEach) lo); > } else { > return false; > } > } > {code} > There is a TODO for this later on in that same class (inside containsUDFs): > {code} > // TODO (dvryaboy): in the future we could relax this rule by tracing what > fields > // are being passed into the UDF, and only refusing if the UDF is working on > the > // join key. Transforms of other fields should be ok. > {code} > We should do the TODO and relax this requirement or just remove it altogether -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4343) Tez auto parallelism fails at query compile time
[ https://issues.apache.org/jira/browse/PIG-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377594#comment-14377594 ] Rohini Palaniswamy commented on PIG-4343: - This could be a dupe of PIG-4474 > Tez auto parallelism fails at query compile time > > > Key: PIG-4343 > URL: https://issues.apache.org/jira/browse/PIG-4343 > Project: Pig > Issue Type: Bug > Components: tez >Affects Versions: 0.14.0 >Reporter: Cheolsoo Park > > I was running some legacy MR jobs in Tez mode to do perf benchmarks. But when > {{pig.tez.auto.parallelism}} is enabled (by default), Pig fails with the > following error- > {code} > org.apache.pig.impl.plan.VisitorException: ERROR 0: java.io.IOException: > Cannot estimate parallelism for scope-892, effective parallelism for > predecessor scope-892 is -1 > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.ParallelismSetter.visitTezOp(ParallelismSetter.java:189) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:232) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.TezOperator.visit(TezOperator.java:49) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:70) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.processLoadAndParallelism(TezLauncher.java:429) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher.launchPig(TezLauncher.java:143) > at > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:301) > at org.apache.pig.PigServer.launchPlan(PigServer.java:1390) > at org.apache.pig.LipstickPigServer.launchPlan(LipstickPigServer.java:151) > at > org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375) > at org.apache.pig.PigServer.execute(PigServer.java:1364) > at org.apache.pig.PigServer.executeBatch(PigServer.java:415) > at org.apache.pig.PigServer.executeBatch(PigServer.java:398) > at > org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81) > at com.netflix.lipstick.Main.run(Main.java:496) > at com.netflix.lipstick.Main.main(Main.java:171) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > Caused by: java.io.IOException: Cannot estimate parallelism for scope-892, > effective parallelism for predecessor scope-892 is -1 > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.TezOperDependencyParallelismEstimator.estimateParallelism(TezOperDependencyParallelismEstimator.java:116) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.optimizer.ParallelismSetter.visitTezOp(ParallelismSetter.java:134) > ... 24 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4457) Error is thrown by JobStats.getOutputSize() when storing to a MySql table
[ https://issues.apache.org/jira/browse/PIG-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4457: Attachment: PIG-4457-1.patch > Error is thrown by JobStats.getOutputSize() when storing to a MySql table > - > > Key: PIG-4457 > URL: https://issues.apache.org/jira/browse/PIG-4457 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Kunal Kumar >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4457-1.patch > > > Here is an example of stack trace printed to console output. Actually, this > is a warning message and does not make the job fail. The data is getting > stored to mysql table, but i have no idea why pig is looking to store output > on hdfs. I am using PIg along with Tez. > using output size reader: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader > unable to find the output file > java.io.FileNotFoundException: File > hdfs://pts0021.persistent.co.in:9000/user/shareinsights/filtered_stock_data > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:647) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:101) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:705) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:701) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:701) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:81) > at > org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:351) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.addOutputStatistics(TezVertexStats.java:270) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.accumulateStats(TezVertexStats.java:188) > at > org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:209) > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (PIG-4457) Error is thrown by JobStats.getOutputSize() when storing to a MySql table
[ https://issues.apache.org/jira/browse/PIG-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-4457: Fix Version/s: 0.15.0 Affects Version/s: 0.14.0 Status: Patch Available (was: Reopened) > Error is thrown by JobStats.getOutputSize() when storing to a MySql table > - > > Key: PIG-4457 > URL: https://issues.apache.org/jira/browse/PIG-4457 > Project: Pig > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Kunal Kumar >Assignee: Rohini Palaniswamy > Fix For: 0.15.0 > > Attachments: PIG-4457-1.patch > > > Here is an example of stack trace printed to console output. Actually, this > is a warning message and does not make the job fail. The data is getting > stored to mysql table, but i have no idea why pig is looking to store output > on hdfs. I am using PIg along with Tez. > using output size reader: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader > unable to find the output file > java.io.FileNotFoundException: File > hdfs://pts0021.persistent.co.in:9000/user/shareinsights/filtered_stock_data > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:647) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:101) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:705) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:701) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:701) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:81) > at > org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:351) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.addOutputStatistics(TezVertexStats.java:270) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.accumulateStats(TezVertexStats.java:188) > at > org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:209) > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (PIG-4457) Error is thrown by JobStats.getOutputSize() when storing to a MySql table
[ https://issues.apache.org/jira/browse/PIG-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy reopened PIG-4457: - Assignee: Rohini Palaniswamy Reopening to have pig.stats.output.size.reader.unsupported=org.apache.hcatalog.pig.HCatStorer,org.apache.hive.hcatalog.pig.HCatStorer,org.apache.pig.piggybank.storage.DBStorage added to pig-default.properties. > Error is thrown by JobStats.getOutputSize() when storing to a MySql table > - > > Key: PIG-4457 > URL: https://issues.apache.org/jira/browse/PIG-4457 > Project: Pig > Issue Type: Bug >Reporter: Kunal Kumar >Assignee: Rohini Palaniswamy > > Here is an example of stack trace printed to console output. Actually, this > is a warning message and does not make the job fail. The data is getting > stored to mysql table, but i have no idea why pig is looking to store output > on hdfs. I am using PIg along with Tez. > using output size reader: > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader > unable to find the output file > java.io.FileNotFoundException: File > hdfs://pts0021.persistent.co.in:9000/user/shareinsights/filtered_stock_data > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:647) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:101) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:705) > at > org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:701) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:701) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.FileBasedOutputSizeReader.getOutputSize(FileBasedOutputSizeReader.java:81) > at > org.apache.pig.tools.pigstats.JobStats.getOutputSize(JobStats.java:351) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.addOutputStatistics(TezVertexStats.java:270) > at > org.apache.pig.tools.pigstats.tez.TezVertexStats.accumulateStats(TezVertexStats.java:188) > at > org.apache.pig.tools.pigstats.tez.TezDAGStats.accumulateStats(TezDAGStats.java:209) > at > org.apache.pig.tools.pigstats.tez.TezPigScriptStats.accumulateStats(TezPigScriptStats.java:180) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezJob.run(TezJob.java:194) > at > org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher$1.run(TezLauncher.java:167) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4439) Getting exception java.lang.VerifyError: class org.apache.tez.dag.api.records.DAGProtos$DAGPlan overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFiel
[ https://issues.apache.org/jira/browse/PIG-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377420#comment-14377420 ] Kunal Kumar commented on PIG-4439: -- Thanks Daniel. I used the tez jars coming with Pig-0.14 distribution and it is working fine now. Earlier I was building the jars using tez-0.5.2 source. > Getting exception java.lang.VerifyError: class > org.apache.tez.dag.api.records.DAGProtos$DAGPlan overrides final method > getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet, while trying to run > Tez-0.5.2 on pig-0.14 > - > > Key: PIG-4439 > URL: https://issues.apache.org/jira/browse/PIG-4439 > Project: Pig > Issue Type: Bug >Reporter: Kunal Kumar > > Exception in thread "main" java.lang.VerifyError: class > org.apache.tez.dag.api.records.DAGProtos$DAGPlan overrides final method > getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:800) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) > at java.net.URLClassLoader.access$100(URLClassLoader.java:71) > at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.getDeclaredMethods0(Native Method) > at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) > at java.lang.Class.getMethod0(Class.java:2774) > at java.lang.Class.getMethod(Class.java:1663) > at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) > at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (26 issues) Subscriber: pigdaily Key Summary PIG-4481e2e tests ComputeSpec_1, ComputeSpec_2 and StreamingPerformance_3 produce different result on Windows https://issues.apache.org/jira/browse/PIG-4481 PIG-4476Fix logging in AvroStorage* classes and SchemaTuple class https://issues.apache.org/jira/browse/PIG-4476 PIG-4475Keys in AvroMapWrapper are not proper Pig types https://issues.apache.org/jira/browse/PIG-4475 PIG-4458Support UDFs in a FOREACH Before a Merge Join https://issues.apache.org/jira/browse/PIG-4458 PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in MRPrinter https://issues.apache.org/jira/browse/PIG-4455 PIG-4452Embedded SQL using "SQL" instead of "sql" fails with string index out of range: -1 error https://issues.apache.org/jira/browse/PIG-4452 PIG-4422Implement visitMergeJoin in SparkCompiler https://issues.apache.org/jira/browse/PIG-4422 PIG-4377Skewed outer join produce wrong result in some cases https://issues.apache.org/jira/browse/PIG-4377 PIG-4341Add CMX support to pig.tmpfilecompression.codec https://issues.apache.org/jira/browse/PIG-4341 PIG-4323PackageConverter hanging in Spark https://issues.apache.org/jira/browse/PIG-4323 PIG-4313StackOverflowError in LIMIT operation on Spark https://issues.apache.org/jira/browse/PIG-4313 PIG-4251Pig on Storm https://issues.apache.org/jira/browse/PIG-4251 PIG-4193Make collected group work with Spark https://issues.apache.org/jira/browse/PIG-4193 PIG-4111Make Pig compiles with avro-1.7.7 https://issues.apache.org/jira/browse/PIG-4111 PIG-4004Upgrade the Pigmix queries from the (old) mapred API to mapreduce https://issues.apache.org/jira/browse/PIG-4004 PIG-4002Disable combiner when map-side aggregation is used https://issues.apache.org/jira/browse/PIG-4002 PIG-3952PigStorage accepts '-tagSplit' to return full split information https://issues.apache.org/jira/browse/PIG-3952 PIG-3911Define unique fields with @OutputSchema https://issues.apache.org/jira/browse/PIG-3911 PIG-3877Getting Geo Latitude/Longitude from Address Lines https://issues.apache.org/jira/browse/PIG-3877 PIG-3873Geo distance calculation using Haversine https://issues.apache.org/jira/browse/PIG-3873 PIG-3866Create ThreadLocal classloader per PigContext https://issues.apache.org/jira/browse/PIG-3866 PIG-3851Upgrade jline to 2.11 https://issues.apache.org/jira/browse/PIG-3851 PIG-3668COR built-in function when atleast one of the coefficient values is NaN https://issues.apache.org/jira/browse/PIG-3668 PIG-3635Fix e2e tests for Hadoop 2.X on Windows https://issues.apache.org/jira/browse/PIG-3635 PIG-3587add functionality for rolling over dates https://issues.apache.org/jira/browse/PIG-3587 PIG-3294Allow Pig use Hive UDFs https://issues.apache.org/jira/browse/PIG-3294 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328&filterId=12322384