[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614964#comment-13614964 ] Harsh J commented on PIG-3261: -- I agree on both points from my own experience, but others have probably seen even more users than I. I'm not so very active on the user lists either, but have been a long time subscriber and searching shows PIG_CLASSPATH's only ever used for UDF and library additives, and hence the other intention's users (i.e. those who want to override what Pig auto discovers) wouldn't mind this behavior change either. Of course, this data set does not represent all of the users :) > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch, PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614940#comment-13614940 ] Prashant Kommireddi commented on PIG-3259: -- {quote} By counting the number of times exception has so far been thrown by .valueOf() {quote} I see what you mean. That could be an approach, though the heuristic for determining the threshold could be tricky. {quote}I wonder if there are good libraries that we can use for the sanity checks, as the decimal check seems bit more complicated{quote} I will try and look if any such libraries are available. There's a method to check for Double in the javadoc you pointed before, but it could be more expensive than we want http://docs.oracle.com/javase/6/docs/api/java/lang/Double.html#valueOf%28java.lang.String%29. > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614934#comment-13614934 ] Prashant Kommireddi commented on PIG-3261: -- {quote} IIRC the reason was to not have them step over the shipped library jars unintentionally with a simple HADOOP_CLASSPATH being set {quote} Based on that, I feel like keeping it simple and not having a toggle is better for following reasons: # Pig does not have a env file like hadoop does for specifying CLASSPATH. Most likely this would be set by the user, would be intentional and not be picked up from any of pig's env files. # Having a toggle for this seems like an additional step towards the same purpose. What do you think [~qwertymaniac]? It would be nice to have some others weight in on this. I am leaning more towards your initial patch, though I am not opposed to the latest patch either. > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch, PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated PIG-3261: - Status: Patch Available (was: Open) > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch, PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated PIG-3261: - Attachment: PIG-3261.patch Patch revised. Added an env-opt toggler PIG_USER_CLASSPATH_FIRST that preserves today's behavior if unset (default). Testing: Export: {{export PIG_CLASSPATH=Foo}} Default behavior: {code} bash -x bin/pig … CLASSPATH=/Users/harshchouraria/Work/installs/pig/conf:/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/tools.jar:Foo … {code} Set toggle: {{export PIG_USER_CLASSPATH_FIRST=true}} {code} bash -x bin/pig … CLASSPATH=Foo:/Users/harshchouraria/Work/installs/pig/conf:/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/tools.jar … {code} Disable toggle: {{export PIG_USER_CLASSPATH_FIRST=}} {code} bash -x bin/pig … CLASSPATH=/Users/harshchouraria/Work/installs/pig/conf:/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/tools.jar:Foo … {code} Unset toggle: {{unset PIG_USER_CLASSPATH_FIRST}} {code} bash -x bin/pig … CLASSPATH=/Users/harshchouraria/Work/installs/pig/conf:/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/lib/tools.jar:Foo … {code} > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch, PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (34 issues) Subscriber: pigdaily Key Summary PIG-3257Add unique identifier UDF https://issues.apache.org/jira/browse/PIG-3257 PIG-3247Piggybank functions to mimic OVER clause in SQL https://issues.apache.org/jira/browse/PIG-3247 PIG-3238Pig current releases lack a UDF Stuff(). This UDF deletes a specified length of characters and inserts another set of characters at a specified starting point. https://issues.apache.org/jira/browse/PIG-3238 PIG-3237Pig current releases lack a UDF MakeSet(). This UDF returns a set value (a string containing substrings separated by "," characters) consisting of the strings that have the corresponding bit in the first argument https://issues.apache.org/jira/browse/PIG-3237 PIG-3223AvroStorage does not handle comma separated input paths https://issues.apache.org/jira/browse/PIG-3223 PIG-3215[piggybank] Add LTSVLoader to load LTSV (Labeled Tab-separated Values) files https://issues.apache.org/jira/browse/PIG-3215 PIG-3210Pig fails to start when it cannot write log to log files https://issues.apache.org/jira/browse/PIG-3210 PIG-3198Let users use any function from PigType -> PigType as if it were builtlin https://issues.apache.org/jira/browse/PIG-3198 PIG-3193Fix "ant docs" warnings https://issues.apache.org/jira/browse/PIG-3193 PIG-3190Add LuceneTokenizer and SnowballTokenizer to Pig - useful text tokenization https://issues.apache.org/jira/browse/PIG-3190 PIG-3183rm or rmf commands should respect globbing/regex of path https://issues.apache.org/jira/browse/PIG-3183 PIG-3173Partition filter push down does not happen partition keys condition include a AND and OR construct https://issues.apache.org/jira/browse/PIG-3173 PIG-3166Update eclipse .classpath according to ivy library.properties https://issues.apache.org/jira/browse/PIG-3166 PIG-3164Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix. https://issues.apache.org/jira/browse/PIG-3164 PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections https://issues.apache.org/jira/browse/PIG-3123 PIG-3122Operators should not implicitly become reserved keywords https://issues.apache.org/jira/browse/PIG-3122 PIG-3114Duplicated macro name error when using pigunit https://issues.apache.org/jira/browse/PIG-3114 PIG-3105Fix TestJobSubmission unit test failure. https://issues.apache.org/jira/browse/PIG-3105 PIG-3088Add a builtin udf which removes prefixes https://issues.apache.org/jira/browse/PIG-3088 PIG-3069Native Windows Compatibility for Pig E2E Tests and Harness https://issues.apache.org/jira/browse/PIG-3069 PIG-3028testGrunt dev test needs some command filters to run correctly without cygwin https://issues.apache.org/jira/browse/PIG-3028 PIG-3027pigTest unit test needs a newline filter for comparisons of golden multi-line https://issues.apache.org/jira/browse/PIG-3027 PIG-3026Pig checked-in baseline comparisons need a pre-filter to address OS-specific newline differences https://issues.apache.org/jira/browse/PIG-3026 PIG-3024TestEmptyInputDir unit test - hadoop version detection logic is brittle https://issues.apache.org/jira/browse/PIG-3024 PIG-3015Rewrite of AvroStorage https://issues.apache.org/jira/browse/PIG-3015 PIG-3010Allow UDF's to flatten themselves https://issues.apache.org/jira/browse/PIG-3010 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2643Use bytecode generation to make a performance replacement for InvokeForLong, InvokeForString, etc https://issues.apache.org/jira/browse/PIG-2643 PIG-2641Create toJSON function for all complex types: tuples, bags and maps https://issues.apache.org/jira/browse/PIG-2641 PIG-2591Unit tests should not write to /tmp but respect java.io.tmpdir https://issues.apache.org/jira/browse/PIG-2591 PIG-2244Macros cannot be passed relation names https://issues.apache.org/jira/browse/PIG-2244 PIG-1914Support load/store JSON data in Pig https://issues.apache.org/jira/browse/PIG-1914 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384
[jira] [Commented] (PIG-3049) Cannot sort on a bag in nested foreach
[ https://issues.apache.org/jira/browse/PIG-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614749#comment-13614749 ] Johnny Zhang commented on PIG-3049: --- [~daijy], thanks for comments, sorry about late reply. I think it is the same root cause as PIG-2265, I left comments there to explain our find out so far. I don't have a patch ready yet, but yes, I am still looking for a fix. > Cannot sort on a bag in nested foreach > -- > > Key: PIG-3049 > URL: https://issues.apache.org/jira/browse/PIG-3049 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.12 >Reporter: Jonathan Coveney >Assignee: Johnny Zhang > Fix For: 0.12 > > > The following script fails. > {code} > a = load 'words_and_numbers' as (word:chararray, number:int); > b = foreach (group a by number) { > a_bag = a.word; > ord = order a_bag by word; > generate group, ord; > } > dump b; > {code} > On this data: > {code} > $ cat words_and_numbers > > hey 1 > hey 2 > you 3 > you 4 > I 5 > could 6 > {code} > it throws the following error: > {code} > ava.lang.ClassCastException: java.lang.String cannot be cast to > org.apache.pig.data.Tuple > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:469) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:308) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:160) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:384) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:340) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:333) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:283) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:210) > {code} > Is this a supported feature of Pig? Seems reasonable, just seems like > something weird is going on. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2265) Test case TestSecondarySort failure
[ https://issues.apache.org/jira/browse/PIG-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614734#comment-13614734 ] Johnny Zhang commented on PIG-2265: --- current the test is disabled in trunk. I enable it and can reproduce the issue. I think it is the same root cause as PIG-3049. [~cheolsoo] help me debug this issue a while back, and explains to me idea. The reason seems when secondary sort is enabled, the code needs inform POProject.java to process secondary sort key properly to avoid cast from the content of the tuple to tuple by POProject.java line 481 {code} res.result = (Tuple)ret; {code} the fix should be something like POProject.java line 422 change {code} ret = inpValue.get(columns.get(0)); {code} to {code} if (secondarySort) { ret = inpValue; } else { ret = inpValue.get(columns.get(0)); } {code} it is not clear to me whether this is the right guess, and don't have idea how to get the boolean value secondarySort in POProject.java though. > Test case TestSecondarySort failure > --- > > Key: PIG-2265 > URL: https://issues.apache.org/jira/browse/PIG-2265 > Project: Pig > Issue Type: Bug >Affects Versions: 0.8.0 >Reporter: Shengjun Xin > > Error message: > Testcase: testNestedSortEndToEnd3 took 53.076 sec > Caused an ERROR > Unable to open iterator for alias E. Backend error : > org.apache.pig.data.DataByteArray cannot be cast to org.apache.pig.data.Tuple > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias E. Backend error : org.apache.pig.data.DataByteArray > cannot be cast to org.apache.pig.data.Tuple > at org.apache.pig.PigServer.openIterator(PigServer.java:742) > at > org.apache.pig.test.TestSecondarySort.testNestedSortEndToEnd3(TestSecondarySort.java:550) > Caused by: java.lang.ClassCastException: org.apache.pig.data.DataByteArray > cannot be cast to org.apache.pig.data.Tuple > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:392) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:357) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:236) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:231) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3221) Bootstrap sampling
[ https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614618#comment-13614618 ] Gianmarco De Francisci Morales commented on PIG-3221: - Here an example http://hortonworks.com/blog/bootstrap-sampling-with-apache-pig > Bootstrap sampling > -- > > Key: PIG-3221 > URL: https://issues.apache.org/jira/browse/PIG-3221 > Project: Pig > Issue Type: New Feature >Reporter: Gianmarco De Francisci Morales > Labels: gsoc2013 > > Implement a bootstrap sampling option ( > http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE > operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3225) Stratified sampling
[ https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614617#comment-13614617 ] Gianmarco De Francisci Morales commented on PIG-3225: - Hi Dishara, Happy to see your interest. While we haven't discussed in detail with the rest of the Committers, my personal view on this project is that it should be combined with the one on Bootstrap sampling PIG-3221 to be worth of GSoC. Regarding the sampling, this part of the project requires designing and changing the parser to recognize new part of the syntax for the SAMPLE operator (to specify the strata), and implementing the logical and physical operators connected to it. > Stratified sampling > --- > > Key: PIG-3225 > URL: https://issues.apache.org/jira/browse/PIG-3225 > Project: Pig > Issue Type: New Feature >Reporter: Gianmarco De Francisci Morales > Labels: gsoc2013 > > Implement a stratified sampling option ( > http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Put a "Google summer of code 2013" cwiki page
I have another idea for GSoC project: parallel running the unit tests. I think several people mentioned this in last Pig meetup. The objective is enabling us to run whole unit tests before commit any patch. The fix should include two parts: (1) unit test doesn't interferes each other (e.g. moving test dir from /tmp to build/test/tmp so test doesn't delete other test's dir) (2) need to make sure Pig is thread safe Johnny On Fri, Mar 22, 2013 at 10:04 AM, Dmitriy Ryaboy wrote: > This is a little different than how we've done such things before, but how > about a project to get Pig to run on Spark (aka, Spork)? The Twitter pig > folks have some code we'd love to share that got us half-way there, it was > looking pretty promising (if anyone is curious, it's the "spork" branch on > my github fork of pig: https://github.com/dvryaboy/pig ) > > D > > On Thu, Mar 21, 2013 at 2:05 PM, Prasanth J >wrote: > > > One more idea for GSoC project. > > > > YSmart uses correlation between multiple MR jobs to reduce the number of > > MR jobs generated. I remember Dmitriy bringing this up early. The > > techniques specified in this paper (Input, Job Flow, Transit > correlations) > > has been patched into Hive. If Pig doesn't use these optimizations then I > > think it will be good to have them in Pig as well. > > > > Here is the link to the paper > > > http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf > > > > I think this can be a good candidate project for GSoC. > > > > Thanks > > -- Prasanth > > > > On Mar 21, 2013, at 3:51 PM, Olga Natkovich > wrote: > > > > > +1 on that > > > > > > > > > > > > From: Russell Jurney > > > To: "dev@pig.apache.org" > > > Sent: Thursday, March 21, 2013 11:54 AM > > > Subject: Re: Put a "Google summer of code 2013" cwiki page > > > > > > Make Grunt use Antlr - high priority one for me. Once Grunt uses Antlr, > > > macros will flourish. > > > > > > > > > On Wed, Mar 20, 2013 at 6:25 PM, Daniel Dai > > wrote: > > > > > >> https://cwiki.apache.org/confluence/display/PIG/GSoc2013 > > >> > > >> Feel free to add more project which could fit in the timeline of a > > >> student summer project. > > >> > > >> I remember there are several projects we discussed in our last meetup: > > >> * Allow Pig use Hive UDFs, Alan, do we have a ticket for that? > > >> * A general framework for Pig performance test, Rohini, do we have a > > >> ticket? > > >> > > >> Thanks, > > >> Daniel > > >> > > > > > > > > > > > > -- > > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com > > datasyndrome.com > > > > >
[jira] [Updated] (PIG-2244) Macros cannot be passed relation names
[ https://issues.apache.org/jira/browse/PIG-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-2244: -- Status: Patch Available (was: Open) [~alangates], I fix the antlr grammar file so that it be able to expand quoted relation in macro definition to relation in .expanded file. I added another test case testQuotedRelation() to verify this case. I run the whole test cases in TestMacroExpansion and it pass for me. > Macros cannot be passed relation names > -- > > Key: PIG-2244 > URL: https://issues.apache.org/jira/browse/PIG-2244 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.9.0 >Reporter: Alan Gates >Priority: Minor > Attachments: PIG-2244.patch.txt > > > If an alias is passed quoted, it gets expanded as if it were an alias in the > macro, which leads to a very strange error message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2244) Macros cannot be passed relation names
[ https://issues.apache.org/jira/browse/PIG-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang reassigned PIG-2244: - Assignee: Johnny Zhang > Macros cannot be passed relation names > -- > > Key: PIG-2244 > URL: https://issues.apache.org/jira/browse/PIG-2244 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.9.0 >Reporter: Alan Gates >Assignee: Johnny Zhang >Priority: Minor > Attachments: PIG-2244.patch.txt > > > If an alias is passed quoted, it gets expanded as if it were an alias in the > macro, which leads to a very strange error message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2244) Macros cannot be passed relation names
[ https://issues.apache.org/jira/browse/PIG-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Johnny Zhang updated PIG-2244: -- Attachment: PIG-2244.patch.txt > Macros cannot be passed relation names > -- > > Key: PIG-2244 > URL: https://issues.apache.org/jira/browse/PIG-2244 > Project: Pig > Issue Type: Bug > Components: parser >Affects Versions: 0.9.0 >Reporter: Alan Gates >Priority: Minor > Attachments: PIG-2244.patch.txt > > > If an alias is passed quoted, it gets expanded as if it were an alias in the > macro, which leads to a very strange error message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3247) Piggybank functions to mimic OVER clause in SQL
[ https://issues.apache.org/jira/browse/PIG-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-3247: Attachment: Over.2.patch A new version of the patch that fixes an error in the percent_rank calculation and adds the ability to specify the return type of the Over function. > Piggybank functions to mimic OVER clause in SQL > --- > > Key: PIG-3247 > URL: https://issues.apache.org/jira/browse/PIG-3247 > Project: Pig > Issue Type: New Feature > Components: piggybank >Reporter: Alan Gates >Assignee: Alan Gates > Fix For: 0.12 > > Attachments: Over.2.patch, Over.patch > > > In order to test Hive I have written some UDFs to mimic the behavior of SQL's > OVER clause. I thought they would be useful to share. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3259) Optimize byte to Long/Integer conversions
[ https://issues.apache.org/jira/browse/PIG-3259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614137#comment-13614137 ] Thejas M Nair commented on PIG-3259: bq. How do we determine the number of non-numbers without making calls to sanityCheck..()? By counting the number of times exception has so far been thrown by .valueOf(). Once a threshold has been crossed, we can introduce the sanity check for each new value. This will put a limit on worst ('incorrect') case performance without degrading the 'correct' case performance by much. I wonder if there are good libraries that we can use for the sanity checks, as the decimal check seems bit more complicated . > Optimize byte to Long/Integer conversions > - > > Key: PIG-3259 > URL: https://issues.apache.org/jira/browse/PIG-3259 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11, 0.11.1 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: byteToLong.xlsx > > > These conversions can be performing better. If the input is not numeric > (1234abcd) the code calls Double.valueOf(String) regardless before finally > returning null. Any script that inadvertently (user's mistake or not) tries > to cast non-numeric column to int or long would result in many wasteful > calls. > We can avoid this and only handle the cases we find the input to be a decimal > number (1234.56) and return null otherwise even before trying > Double.valueOf(String). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harsh J updated PIG-3261: - Status: Open (was: Patch Available) > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613705#comment-13613705 ] Harsh J commented on PIG-3261: -- IIRC the reason was to not have them step over the shipped library jars unintentionally with a simple HADOOP_CLASSPATH being set. I guess we can add a toggle instead of changing the behavior, would be safer. I'll update the patch. > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2988) start deploying pigunit maven artifact part of Pig release process
[ https://issues.apache.org/jira/browse/PIG-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613645#comment-13613645 ] Ioan Eugen Stan commented on PIG-2988: -- Great! I was just about to report this issue. > start deploying pigunit maven artifact part of Pig release process > -- > > Key: PIG-2988 > URL: https://issues.apache.org/jira/browse/PIG-2988 > Project: Pig > Issue Type: New Feature > Components: build >Affects Versions: 0.11, 0.10.1 >Reporter: Johnny Zhang >Assignee: Nick White > Fix For: 0.12, 0.11.1 > > Attachments: PIG-2988.0-branch11.patch, PIG-2988.0.patch > > > right now the Pig project doesn't publish pigunit Maven artifact, thins like > {noformat} > > org.apache.pig > pigunit > 0.10.0 > > {noformat} > doesn't work. Can we start deploy pigunit Maven artifacts as part of the > release process? Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3199) Expose LogicalPlan via PigServer API
[ https://issues.apache.org/jira/browse/PIG-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-3199: - Patch Info: (was: Patch Available) > Expose LogicalPlan via PigServer API > > > Key: PIG-3199 > URL: https://issues.apache.org/jira/browse/PIG-3199 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.10.0 >Reporter: Prashant Kommireddi >Assignee: Prashant Kommireddi > Fix For: 0.12 > > Attachments: PIG-3199.patch > > > LogicalPlan could be exposed to user in order for one to make validations > based on it. For eg, one could get Load/Store paths or other operators and be > able to perform checks such as whether I/O paths are valid etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613616#comment-13613616 ] Prashant Kommireddi commented on PIG-3261: -- I am actually happy with this patch. Looking through hadoop JIRAs, documentation and comments in bin/hadoop script I could not clearly comprehend the reason for existence of the prop HADOOP_USER_CLASSPATH_FIRST. Making sure we don't miss it here if there's a legit reason, otherwise PIG_CLASSPATH is set generally when a user has certain custom jar/classpath requirements. Like you said, I don't think a user would set PIG_CLASSPATH but want default CLASSPATH to have a precedence. > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3261) User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not appended
[ https://issues.apache.org/jira/browse/PIG-3261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13613608#comment-13613608 ] Harsh J commented on PIG-3261: -- We can make it configurable and document that I guess; but its kinda odd to have to do two toggles to get an override done. In most cases of an override requirement, users are aware of the overriding so the secondary toggle seems a tad unnecessary. If you prefer that strongly, I'll send in another patch - let me know :) An alternate fix would be to simply do the PIG_CLASSPATH addition before anything else is added to CLASSPATH, but this kinda position-in-code fix is harder to maintain over time. > User set PIG_CLASSPATH entries must be prepended to the CLASSPATH, not > appended > --- > > Key: PIG-3261 > URL: https://issues.apache.org/jira/browse/PIG-3261 > Project: Pig > Issue Type: Bug > Components: grunt >Affects Versions: 0.10.0 >Reporter: Harsh J >Assignee: Harsh J > Attachments: PIG-3261.patch > > > Currently we are doing this wrong: > {code} > if [ "$PIG_CLASSPATH" != "" ]; then > CLASSPATH=${CLASSPATH}:${PIG_CLASSPATH} > {code} > This means that anything added to CLASSPATH until that point will never be > able to get overridden by a user set environment, which is wrong behavior. > Hadoop libs for example are added to CLASSPATH, before this extension is > called in bin/pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira