Re: Pig and Storm

2013-07-24 Thread Pradeep Gollakota
I've added a wiki page for a "Pig on Storm Proposal" at
https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal

I've included a primer on Storm (and Trident) as well as some of the
challenges I foresee. Please read though my proposal and let me know what
your thoughts are.


On Wed, Jul 24, 2013 at 2:36 PM, Alan Gates  wrote:

> This sounds exciting.  The next question is how do you plan to do it?
>  Would a physical plan be translated to a Storm job (or jobs)?  Would it
> need a different physical plan?  Or would you just have the connection at
> the language layer and all the planning separate?  Do you envision needing
> extensions/changes to the language to support Storm?  Feel free to add a
> page to Pig's wiki with your thoughts on an approach.
>
> Alan.
>
> On Jul 23, 2013, at 9:52 AM, Pradeep Gollakota wrote:
>
> > Hi Pig Developers,
> >
> > I wanted to reach out to you all and ask for you opinion on something.
> >
> > As a Pig user, I have come to love Pig as a framework. Pig provides a
> great
> > set of abstractions that make working with large datasets easy. Currently
> > Pig is only backed by hadoop. However, with the new rise of Twitter Storm
> > as a distributed real time processing engine, Pig users are missing out
> on
> > a great opportunity to be able to work with Pig in Storm. As a user of
> Pig,
> > Hadoop and Storm, and keeping with the Pig philosophy of "Pigs live
> > anywhere," I'd like to get your thoughts on starting the implementation
> of
> > a Pig backend for Storm.
> >
> > Thanks
> > Pradeep
>
>


[jira] Subscription: PIG patch available

2013-07-24 Thread jira
Issue Subscription
Filter: PIG patch available (15 issues)

Subscriber: pigdaily

Key Summary
PIG-3389"Set job.name" does not work with dump command
https://issues.apache.org/jira/browse/PIG-3389
PIG-3374CASE and IN fail when expression includes dereferencing operator
https://issues.apache.org/jira/browse/PIG-3374
PIG-3359Register Statements and Param Substitution in Macros
https://issues.apache.org/jira/browse/PIG-3359
PIG-3346New property that controls the number of combined splits
https://issues.apache.org/jira/browse/PIG-3346
PIG-Fix remaining Windows core unit test failures
https://issues.apache.org/jira/browse/PIG-
PIG-3295Casting from bytearray failing after Union (even when each field is 
from a single Loader)
https://issues.apache.org/jira/browse/PIG-3295
PIG-3292Logical plan invalid state: duplicate uid in schema during 
self-join to get cross product
https://issues.apache.org/jira/browse/PIG-3292
PIG-3257Add unique identifier UDF
https://issues.apache.org/jira/browse/PIG-3257
PIG-3210Pig fails to start when it cannot write log to log files
https://issues.apache.org/jira/browse/PIG-3210
PIG-3199Expose LogicalPlan via PigServer API
https://issues.apache.org/jira/browse/PIG-3199
PIG-3166Update eclipse .classpath according to ivy library.properties
https://issues.apache.org/jira/browse/PIG-3166
PIG-3123Simplify Logical Plans By Removing Unneccessary Identity Projections
https://issues.apache.org/jira/browse/PIG-3123
PIG-3088Add a builtin udf which removes prefixes
https://issues.apache.org/jira/browse/PIG-3088
PIG-3021Split results missing records when there is null values in the 
column comparison
https://issues.apache.org/jira/browse/PIG-3021
PIG-1914Support load/store JSON data in Pig
https://issues.apache.org/jira/browse/PIG-1914

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225&filterId=12322384


[jira] [Updated] (PIG-3114) Duplicated macro name error when using pigunit

2013-07-24 Thread Sajid Raza (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sajid Raza updated PIG-3114:


Attachment: PatchedPigTest.java

Temporary workaround in end-user code to get PigTest to work.

> Duplicated macro name error when using pigunit
> --
>
> Key: PIG-3114
> URL: https://issues.apache.org/jira/browse/PIG-3114
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.11
>Reporter: Chetan Nadgire
>Assignee: Chetan Nadgire
> Fix For: 0.12
>
> Attachments: PatchedPigTest.java, PIG-3114.patch, PIG-3114.patch
>
>
> I'm using PigUnit to test a pig script within which a macro is defined.
> Pig runs fine on cluster but getting parsing error with pigunit.
> So I tried very basic pig script with macro and getting similar error.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing.  null. Reason: Duplicated macro name 'my_macro_1'
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
>   at 
> org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>   at 
> org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
>   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
>   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
>   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
>   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at junit.framework.TestCase.runBare(TestCase.java:141)
>   at junit.framework.TestResult$1.protect(TestResult.java:122)
>   at junit.framework.TestResult.runProtected(TestResult.java:142)
>   at junit.framework.TestResult.run(TestResult.java:125)
>   at junit.framework.TestCase.run(TestCase.java:129)
>   at junit.framework.TestSuite.runTest(TestSuite.java:255)
>   at junit.framework.TestSuite.run(TestSuite.java:250)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: Failed to parse:  null. Reason: Duplicated macro name 
> 'my_macro_1'
>   at 
> org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
>   at 
> org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
>   ... 30 more
>  
> Pig script which is failing :
> {code:title=test.pig|borderStyle=solid}
> DEFINE my_macro_1 (QUERY, A) RETURNS C {
> $C = ORDER $QUERY BY total DESC, $A;
> } ;
> data =  LOAD 'input' AS (query:CHARARRAY);
> queries_group = GROUP data BY query;
> queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS 
> total;
> queries_ordered = my_macro_1(queries_count, query);
> queries_limit = LIMIT queries_ordered 2;
> STORE queries_limit INTO 'output';
> {code}
> If I remove macro pigunit works fine. Even just defining macro without using 
> it results in parsing error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit

2013-07-24 Thread Sajid Raza (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13719060#comment-13719060
 ] 

Sajid Raza commented on PIG-3114:
-

Attached the workaround to this JIRA.

> Duplicated macro name error when using pigunit
> --
>
> Key: PIG-3114
> URL: https://issues.apache.org/jira/browse/PIG-3114
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.11
>Reporter: Chetan Nadgire
>Assignee: Chetan Nadgire
> Fix For: 0.12
>
> Attachments: PatchedPigTest.java, PIG-3114.patch, PIG-3114.patch
>
>
> I'm using PigUnit to test a pig script within which a macro is defined.
> Pig runs fine on cluster but getting parsing error with pigunit.
> So I tried very basic pig script with macro and getting similar error.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing.  null. Reason: Duplicated macro name 'my_macro_1'
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
>   at 
> org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>   at 
> org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
>   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
>   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
>   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
>   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at junit.framework.TestCase.runBare(TestCase.java:141)
>   at junit.framework.TestResult$1.protect(TestResult.java:122)
>   at junit.framework.TestResult.runProtected(TestResult.java:142)
>   at junit.framework.TestResult.run(TestResult.java:125)
>   at junit.framework.TestCase.run(TestCase.java:129)
>   at junit.framework.TestSuite.runTest(TestSuite.java:255)
>   at junit.framework.TestSuite.run(TestSuite.java:250)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: Failed to parse:  null. Reason: Duplicated macro name 
> 'my_macro_1'
>   at 
> org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
>   at 
> org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
>   ... 30 more
>  
> Pig script which is failing :
> {code:title=test.pig|borderStyle=solid}
> DEFINE my_macro_1 (QUERY, A) RETURNS C {
> $C = ORDER $QUERY BY total DESC, $A;
> } ;
> data =  LOAD 'input' AS (query:CHARARRAY);
> queries_group = GROUP data BY query;
> queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS 
> total;
> queries_ordered = my_macro_1(queries_count, query);
> queries_limit = LIMIT queries_ordered 2;
> STORE queries_limit INTO 'output';
> {code}
> If I remove macro pigunit works fine. Even just defining macro without using 
> it results in parsing error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3389) "Set job.name" does not work with dump command

2013-07-24 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718904#comment-13718904
 ] 

Alan Gates commented on PIG-3389:
-

+1

> "Set job.name" does not work with dump command
> --
>
> Key: PIG-3389
> URL: https://issues.apache.org/jira/browse/PIG-3389
> Project: Pig
>  Issue Type: Bug
>  Components: grunt
>Reporter: Cheolsoo Park
>Assignee: Cheolsoo Park
>Priority: Minor
> Fix For: 0.12
>
> Attachments: PIG-3389.patch
>
>
> The "job.name" property can be used to overwrite the default job name in Pig, 
> but the dump command does not honor it.
> To reproduce the issue, run the following commands in Grunt shell in MR mode:
> {code}
> SET job.name 'FOO';
> a = LOAD '/foo';
> DUMP a;
> {code}
> You will see the job name is not 'FOO' in the JT UI. However, using store 
> instead of dump sets the job name correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2248) Pig parser does not detect when a macro name masks a UDF name

2013-07-24 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-2248:


Status: Open  (was: Patch Available)

Canceling patch as discussion is still on-going as to best approach

> Pig parser does not detect when a macro name masks a UDF name
> -
>
> Key: PIG-2248
> URL: https://issues.apache.org/jira/browse/PIG-2248
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.9.0
>Reporter: Alan Gates
>Assignee: Johnny Zhang
>Priority: Minor
> Attachments: PIG-2248.patch.txt, PIG-2248.patch.txt, 
> PIG-2248.patch.txt, PIG-2248.patch.txt
>
>
> Pig accepts a macro like:
> {code}
> define COUNT(in_relation, min_gpa) returns c {
>b = filter $in_relation by gpa >= $min_gpa;
>$c = foreach b generate age, name;
>}
> {code}
> This should produce a warning that it is masking a UDF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3182) Pig currently lacks functions to trim the whitespace only on one side

2013-07-24 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718879#comment-13718879
 ] 

Kousuke Saruta commented on PIG-3182:
-

Thank you for updating the title and description of this JIRA, Cheolsoo!

> Pig currently lacks functions to trim the whitespace only on one side
> -
>
> Key: PIG-3182
> URL: https://issues.apache.org/jira/browse/PIG-3182
> Project: Pig
>  Issue Type: New Feature
>  Components: internal-udfs
>Reporter: Padma Ravindran
>Assignee: Kousuke Saruta
>Priority: Minor
>  Labels: patch
> Fix For: 0.12
>
> Attachments: LTrim.java.patch, PIG-3182.patch, PIG-3182.patch, 
> PIG-3182.patch
>
>
> Pig currently lacks function to trim the whitespace only on the left hand 
> side of a given word
> ltrim(' lorem ') = 'lorem '

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig and Storm

2013-07-24 Thread Alan Gates
This sounds exciting.  The next question is how do you plan to do it?  Would a 
physical plan be translated to a Storm job (or jobs)?  Would it need a 
different physical plan?  Or would you just have the connection at the language 
layer and all the planning separate?  Do you envision needing 
extensions/changes to the language to support Storm?  Feel free to add a page 
to Pig's wiki with your thoughts on an approach.

Alan.

On Jul 23, 2013, at 9:52 AM, Pradeep Gollakota wrote:

> Hi Pig Developers,
> 
> I wanted to reach out to you all and ask for you opinion on something.
> 
> As a Pig user, I have come to love Pig as a framework. Pig provides a great
> set of abstractions that make working with large datasets easy. Currently
> Pig is only backed by hadoop. However, with the new rise of Twitter Storm
> as a distributed real time processing engine, Pig users are missing out on
> a great opportunity to be able to work with Pig in Storm. As a user of Pig,
> Hadoop and Storm, and keeping with the Pig philosophy of "Pigs live
> anywhere," I'd like to get your thoughts on starting the implementation of
> a Pig backend for Storm.
> 
> Thanks
> Pradeep



[jira] [Created] (PIG-3393) STARTSWITH udf doesn't override outputSchema method

2013-07-24 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3393:
--

 Summary: STARTSWITH udf doesn't override outputSchema method
 Key: PIG-3393
 URL: https://issues.apache.org/jira/browse/PIG-3393
 Project: Pig
  Issue Type: Bug
  Components: internal-udfs
Reporter: Cheolsoo Park
Assignee: Sriram Krishnan
 Fix For: 0.12


It appears that a wrong patch was committed in PIG-2879. Looking at the code in 
trunk, the comments in the jira are not addressed yet committed:
# outputSchema() method should be overridden.
# exceptions should be handled better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3163) Pig current releases lack a UDF endsWith.This UDF tests if a given string ends with the specified suffix.

2013-07-24 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3163:
---

Release Note: Pig now includes a ENDSWITH built-in UDF that checks for 
presence of a given suffix in a chararray.

> Pig current releases lack a UDF endsWith.This UDF tests if a given string 
> ends with the specified suffix.
> -
>
> Key: PIG-3163
> URL: https://issues.apache.org/jira/browse/PIG-3163
> Project: Pig
>  Issue Type: New Feature
>  Components: piggybank
>Affects Versions: 0.10.0
>Reporter: Anuroopa George
>Assignee: Sriram Krishnan
> Fix For: 0.12
>
> Attachments: pig-3163.patch
>
>
> Pig current releases lack a UDF endsWith.This UDF tests if a given string  
> ends with the specified suffix.This UDF returns true if the character 
> sequence represented by the string argument given as a suffix is a suffix of 
> the character sequence represented by the given string; false otherwise.Also 
> true will be returned if the given suffix is an empty string or is equal to 
> the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3392) Document STARTSWITH and ENDSWITH UDFs

2013-07-24 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3392:
---

Description: PIG-2879 and PIG-3163 added new built-in udfs "STARTSWITH" and 
"ENDSWITH", but documentation is missing.  (was: PIG-3163 added a new built-in 
udf "ENDSWITH", but documentation is missing.)
Summary: Document STARTSWITH and ENDSWITH UDFs  (was: Document ENDSWITH 
UDF)

> Document STARTSWITH and ENDSWITH UDFs
> -
>
> Key: PIG-3392
> URL: https://issues.apache.org/jira/browse/PIG-3392
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Cheolsoo Park
>Assignee: Sriram Krishnan
> Fix For: 0.12
>
>
> PIG-2879 and PIG-3163 added new built-in udfs "STARTSWITH" and "ENDSWITH", 
> but documentation is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3392) Document ENDSWITH UDF

2013-07-24 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created PIG-3392:
--

 Summary: Document ENDSWITH UDF
 Key: PIG-3392
 URL: https://issues.apache.org/jira/browse/PIG-3392
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Cheolsoo Park
Assignee: Sriram Krishnan
 Fix For: 0.12


PIG-3163 added a new built-in udf "ENDSWITH", but documentation is missing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3182) Pig currently lacks functions to trim the whitespace only on one side

2013-07-24 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3182:
---

  Resolution: Fixed
Release Note: LTRIM and RTRIM are new built-in UDFs that trim the 
whitespace of a string on the left and right hand side respectively.  (was: 
Patch available)
  Status: Resolved  (was: Patch Available)

Committed to trunk.

> Pig currently lacks functions to trim the whitespace only on one side
> -
>
> Key: PIG-3182
> URL: https://issues.apache.org/jira/browse/PIG-3182
> Project: Pig
>  Issue Type: New Feature
>  Components: internal-udfs
>Reporter: Padma Ravindran
>Assignee: Kousuke Saruta
>Priority: Minor
>  Labels: patch
> Fix For: 0.12
>
> Attachments: LTrim.java.patch, PIG-3182.patch, PIG-3182.patch, 
> PIG-3182.patch
>
>
> Pig currently lacks function to trim the whitespace only on the left hand 
> side of a given word
> ltrim(' lorem ') = 'lorem '

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3182) Pig currently lacks functions to trim the whitespace only on one side

2013-07-24 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3182:
---

Component/s: internal-udfs
Summary: Pig currently lacks functions to trim the whitespace only on 
one side  (was: Pig currently lacks function to trim the whitespace only on the 
left hand side)

+1. Thank you Kousuke!

Since you added both RTrim and LTrim, I updated the title of the jira properly.

> Pig currently lacks functions to trim the whitespace only on one side
> -
>
> Key: PIG-3182
> URL: https://issues.apache.org/jira/browse/PIG-3182
> Project: Pig
>  Issue Type: New Feature
>  Components: internal-udfs
>Reporter: Padma Ravindran
>Assignee: Kousuke Saruta
>Priority: Minor
>  Labels: patch
> Fix For: 0.12
>
> Attachments: LTrim.java.patch, PIG-3182.patch, PIG-3182.patch, 
> PIG-3182.patch
>
>
> Pig currently lacks function to trim the whitespace only on the left hand 
> side of a given word
> ltrim(' lorem ') = 'lorem '

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3114) Duplicated macro name error when using pigunit

2013-07-24 Thread Ruslan Al-Fakikh (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13718179#comment-13718179
 ] 

Ruslan Al-Fakikh commented on PIG-3114:
---

Sajid Raza:
Can you please paste the code for your workaround?

> Duplicated macro name error when using pigunit
> --
>
> Key: PIG-3114
> URL: https://issues.apache.org/jira/browse/PIG-3114
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 0.11
>Reporter: Chetan Nadgire
>Assignee: Chetan Nadgire
> Fix For: 0.12
>
> Attachments: PIG-3114.patch, PIG-3114.patch
>
>
> I'm using PigUnit to test a pig script within which a macro is defined.
> Pig runs fine on cluster but getting parsing error with pigunit.
> So I tried very basic pig script with macro and getting similar error.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing.  null. Reason: Duplicated macro name 'my_macro_1'
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:988)
>   at 
> org.apache.pig.pigunit.pig.GruntParser.processPig(GruntParser.java:61)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
>   at 
> org.apache.pig.pigunit.pig.PigServer.registerScript(PigServer.java:56)
>   at org.apache.pig.pigunit.PigTest.registerScript(PigTest.java:160)
>   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:231)
>   at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:261)
>   at FirstPigTest.MyPigTest.testTop2Queries(MyPigTest.java:32)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at junit.framework.TestCase.runTest(TestCase.java:176)
>   at junit.framework.TestCase.runBare(TestCase.java:141)
>   at junit.framework.TestResult$1.protect(TestResult.java:122)
>   at junit.framework.TestResult.runProtected(TestResult.java:142)
>   at junit.framework.TestResult.run(TestResult.java:125)
>   at junit.framework.TestCase.run(TestCase.java:129)
>   at junit.framework.TestSuite.runTest(TestSuite.java:255)
>   at junit.framework.TestSuite.run(TestSuite.java:250)
>   at 
> org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:84)
>   at 
> org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
>   at 
> org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
>   at 
> org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
> Caused by: Failed to parse:  null. Reason: Duplicated macro name 
> 'my_macro_1'
>   at 
> org.apache.pig.parser.QueryParserDriver.makeMacroDef(QueryParserDriver.java:406)
>   at 
> org.apache.pig.parser.QueryParserDriver.expandMacro(QueryParserDriver.java:277)
>   at 
> org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:178)
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
>   ... 30 more
>  
> Pig script which is failing :
> {code:title=test.pig|borderStyle=solid}
> DEFINE my_macro_1 (QUERY, A) RETURNS C {
> $C = ORDER $QUERY BY total DESC, $A;
> } ;
> data =  LOAD 'input' AS (query:CHARARRAY);
> queries_group = GROUP data BY query;
> queries_count = FOREACH queries_group GENERATE group AS query, COUNT(data) AS 
> total;
> queries_ordered = my_macro_1(queries_count, query);
> queries_limit = LIMIT queries_ordered 2;
> STORE queries_limit INTO 'output';
> {code}
> If I remove macro pigunit works fine. Even just defining macro without using 
> it results in parsing error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira