[jira] [Commented] (PIG-1622) DEFINE streaming options are ill defined and not properly documented

2011-04-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023245#comment-13023245
 ] 

Xuefu Zhang commented on PIG-1622:
--

Based on discussion with Olga, we need to limit that each command option can be 
specified at most once. Multiple occurrence of the same option results a 
semantic error when script is parsed. Patch enforcing this restriction will be 
provided soon.

 DEFINE streaming options are ill defined and not properly documented
 

 Key: PIG-1622
 URL: https://issues.apache.org/jira/browse/PIG-1622
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Alan Gates
Assignee: Corinne Chandel
Priority: Minor
 Fix For: 0.9.0


 According to the documentation 
 (http://hadoop.apache.org/pig/docs/r0.7.0/piglatin_ref2.html#DEFINE) the 
 syntax for DEFINE when used to define a streaming command is:
 DEFINE cmd INPUT(stdin|path) OUTPUT(stdout|stderr|path) SHIP(path [, path, 
 ...]) CACHE (path [, path, ...])
 However, the actual parser accepts something pretty different.  Consider the 
 following script:
 {code}
 define strm `wc -l` INPUT(stdin) 
 CACHE('/Users/gates/.vimrc#myvim') 
 OUTPUT(stdin)
 INPUT('/tmp/fred') 
 OUTPUT('/tmp/bob')
 SHIP('/Users/gates/.bashrc') 
 SHIP('/Users/gates/.vimrc') 
 CACHE('/Users/gates/.bashrc#mybash')
 stderr('/tmp/errors' limit 10);
 A = load '/Users/gates/test/data/studenttab10';
 B = stream A through strm;
 dump B;
 {code}
 The above actually parsers.  I see several issues here:
 # What do multiple INPUT and OUTPUT statements mean in the context of 
 streaming?  These should not be allowed.
 # The documentation implies an order (INPUT, OUTPUT, SHIP, CACHE) that is not 
 enforced by the parser.  We should either enforce the order in the parser or 
 update the documentation.  Most likely the latter to avoid breaking existing 
 scripts.
 # Why are multiple SHIP and CACHE clauses allowed when each can take multiple 
 paths?  It seems we should only allow one of each.
 # The error clause is completely different that what is given in the 
 documentation.  I suspect this is a documentation error and the grammar 
 supported by the parser here is what we want.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2005) Discrepancy in the way dry run handles semicolon in macro definition

2011-04-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023253#comment-13023253
 ] 

Richard Ding commented on PIG-2005:
---

Patch-test result:

{code}
 [exec] +1 overall.  
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
{code}

Unit tests pass.

 Discrepancy in the way dry run handles semicolon in macro definition
 

 Key: PIG-2005
 URL: https://issues.apache.org/jira/browse/PIG-2005
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2005_1.patch


 Macro definition requires a semicolon to mark the end. For example:
 {code}
 define mymacro(x) returns y {... ...};
 {code}
 But invoked through command line, the macro definitions without semicolon 
 also work except in the case of dryrun. This discrepancy is due to 
 GruntParser automatic appending a semicolon to Pig statements if semicolon is 
 absent at the end. Dryrun GruntParser should do the same.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2005) Discrepancy in the way dry run handles semicolon in macro definition

2011-04-22 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-2005.
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to trunk and 0.9 branch.

 Discrepancy in the way dry run handles semicolon in macro definition
 

 Key: PIG-2005
 URL: https://issues.apache.org/jira/browse/PIG-2005
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.9.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.9.0

 Attachments: PIG-2005_1.patch


 Macro definition requires a semicolon to mark the end. For example:
 {code}
 define mymacro(x) returns y {... ...};
 {code}
 But invoked through command line, the macro definitions without semicolon 
 also work except in the case of dryrun. This discrepancy is due to 
 GruntParser automatic appending a semicolon to Pig statements if semicolon is 
 absent at the end. Dryrun GruntParser should do the same.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2003) Using keyward as alias doesn't either emit an error or produce a logical plan.

2011-04-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-2003:
-

Attachment: PIG-2003.patch

 Using keyward as alias doesn't either emit an error or produce a logical plan.
 --

 Key: PIG-2003
 URL: https://issues.apache.org/jira/browse/PIG-2003
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.9.0

 Attachments: PIG-2003.patch


 The following is the symptom:
 grunt ship = load 'x';
 grunt describe ship;
 2011-04-19 13:52:52,809 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1005: No plan for ship to describe
 The correct behavior is to give an error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2006) Regression: NPE when Pig processes an empty script file

2011-04-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated PIG-2006:
-

Attachment: PIG-2006.patch

 Regression: NPE when Pig processes an empty script file
 ---

 Key: PIG-2006
 URL: https://issues.apache.org/jira/browse/PIG-2006
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.9.0

 Attachments: PIG-2006.patch


 If a pig script file is empty and supplied as input for Pig (using -f 
 option), an NPE is thrown. Stacktrace:
 java.lang.NullPointerException
 at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
 at java.util.regex.Matcher.reset(Matcher.java:291)
 at java.util.regex.Matcher.init(Matcher.java:211)
 at java.util.regex.Pattern.matcher(Pattern.java:888)
 at 
 org.apache.pig.scripting.ScriptEngine$SupportedScriptLang.accepts(ScriptEngine.java:89)
 at 
 org.apache.pig.scripting.ScriptEngine.getSupportedScriptLang(ScriptEngine.java:163)
 at org.apache.pig.Main.determineScriptType(Main.java:892)
 at org.apache.pig.Main.run(Main.java:378)
 at org.apache.pig.Main.main(Main.java:108)
 This seems related Jython support in 0.9.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2006) Regression: NPE when Pig processes an empty script file

2011-04-22 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023281#comment-13023281
 ] 

Richard Ding commented on PIG-2006:
---

+1

 Regression: NPE when Pig processes an empty script file
 ---

 Key: PIG-2006
 URL: https://issues.apache.org/jira/browse/PIG-2006
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 0.9.0

 Attachments: PIG-2006.patch


 If a pig script file is empty and supplied as input for Pig (using -f 
 option), an NPE is thrown. Stacktrace:
 java.lang.NullPointerException
 at java.util.regex.Matcher.getTextLength(Matcher.java:1140)
 at java.util.regex.Matcher.reset(Matcher.java:291)
 at java.util.regex.Matcher.init(Matcher.java:211)
 at java.util.regex.Pattern.matcher(Pattern.java:888)
 at 
 org.apache.pig.scripting.ScriptEngine$SupportedScriptLang.accepts(ScriptEngine.java:89)
 at 
 org.apache.pig.scripting.ScriptEngine.getSupportedScriptLang(ScriptEngine.java:163)
 at org.apache.pig.Main.determineScriptType(Main.java:892)
 at org.apache.pig.Main.run(Main.java:378)
 at org.apache.pig.Main.main(Main.java:108)
 This seems related Jython support in 0.9.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1910) incorrect schema shown when project-star is used with other projections

2011-04-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1910.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch committed to both trunk and 0.9 branch.

 incorrect schema shown when project-star is used with other projections
 ---

 Key: PIG-1910
 URL: https://issues.apache.org/jira/browse/PIG-1910
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.9.0

 Attachments: PIG-1910-1.patch, PIG-1910-2.patch, PIG-1910-3.patch, 
 PIG-1910-4.patch, PIG-1910-5.patch


 {code}
 grunt l = load 'x' ;   
 grunt f = foreach l generate $1 as a, *, $2 as b;  
 grunt describe f;
 f: {a: bytearray,(null),b: bytearray}  -- The tuple returned by * is 
 automatically flattened, so this schema is not correct. It is more accurate 
 to return a null schema.
 {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2011) Speed up TestTypedMap.java

2011-04-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2011:
---

Attachment: PIG_2011.patch

Attached patch switches TestTypedMap to using LOCAL mode. This results in 
pretty significant savings:

IN MR:

Testcase: testSimpleLoad took 25.21 sec
Testcase: testSimpleMapKeyLookup took 23.7 sec
Testcase: testSimpleMapCast took 23.432 sec
Testcase: testComplexLoad took 23.539 sec
Testcase: testComplexCast took 19.355 sec
Testcase: testComplexCast2 took 18.456 sec
Testcase: testUnTypedMap took 18.373 sec
Testcase: testOrderBy took 171.013 sec

Total time: 7 min 10 secs

IN LOCAL:

Testcase: testSimpleLoad took 8.758 sec
Testcase: testSimpleMapKeyLookup took 8.947 sec
Testcase: testSimpleMapCast took 8.781 sec
Testcase: testComplexLoad took 8.072 sec
Testcase: testComplexCast took 7.843 sec
Testcase: testComplexCast2 took 7.787 sec
Testcase: testUnTypedMap took 8.595 sec
Testcase: testOrderBy took 47.002 sec

Total time: 2 min 33 secs

 Speed up TestTypedMap.java 
 ---

 Key: PIG-2011
 URL: https://issues.apache.org/jira/browse/PIG-2011
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.10

 Attachments: PIG_2011.patch


 TestTypedMap uses Mapreduce mode and takes 7 minutes.
 This can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Notes on differences between Local and MR mode in Pig

2011-04-22 Thread Dmitriy Ryaboy
I jotted down some notes on what the internal differences I could find
w.r.t. LOCAL vs MAPREDUCE mode in the Pig code base.

As discussed in the contributor meeting, we should audit our tests and
switch all the ones that don't touch on one of these conditions to start
using Local mode, as it will significantly speed up the test suite. See
PIG-2011 for example.

Differences:

- no distributed cache support.
  See JobControlCompiler's private static String
addSingleFileToDistributedCache
  also this means no FRJoin, no MergeJoin, no MergeCoGroup, no UDFs that
rely on DistCache

- outputCommitter's cleanupJob does not get called in local mode.
   TestStore.testSetStoreSchema() tests for a workaround, so don't mess with
TestStoreSchema stuff.

- anything that checks MRCompiler.hasTooManyInputFiles in MR mode. This
func gets called by
  FRJoin and aggregateScalarFiles which in turn gets called by
MapReduceLauncher.compile.
  gets called from aggregateScalarFiles for every Store in plan in a
map-only job. In local, always returns false.
  in MR, returns true if:
  -  nativeMR operator, and optimisticFileConcatenation is on
  - if input is hdfs file, and num splits (after potential combination), or
look at num mappers
and the resulting number  threshold.
  If there's a test of this behavior, it has to stay in MR mode.

- parallelism of final Order by is set to 1 in Local, but can be dynamically
determined in MR mode.
  (perhaps we should not do this and do things serially for each requested
parallel task in local?)

- OpLimitOptimizer does not apply in LOCAL

- the PARSER (QueryParser.jjt) always sets parallelism to 1 in local mode.
So anything that tests parallelism
  has to test it in MR mode.

- same in LogicalPlanBuilder

- PigServer.capacity() supposed to return available space, but does not work
in local mode.
  This is only called n TestMapReduce (ever!) I think we can just toss the
method and the test.

- Map tests appear to insist on MR.. not necessary? (see PIG-2011)

- any sort of classpath / register machinations should be tested in MR

I also have a note of - SimplePigStats?? but don't recall what that refers
to.. perhaps PigStats counters are messed up in local mode?

Cheers

D


[jira] [Commented] (PIG-2011) Speed up TestTypedMap.java

2011-04-22 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023304#comment-13023304
 ] 

Daniel Dai commented on PIG-2011:
-

+1. I agree we can convert a large amount of tests to local mode, except for 
some order by/skewed join tests, which will behave differently in local mode.

 Speed up TestTypedMap.java 
 ---

 Key: PIG-2011
 URL: https://issues.apache.org/jira/browse/PIG-2011
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.10

 Attachments: PIG_2011.patch


 TestTypedMap uses Mapreduce mode and takes 7 minutes.
 This can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023305#comment-13023305
 ] 

Xuefu Zhang commented on PIG-2007:
--

Test-patch run: (release warnings are ignored as there are no new files 
introduced.)

 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] -1 release audit.  The applied patch generated 568 release 
audit warnings (more than the trunk's current 566 warnings).


 Parsing error when map key referred directly from udf in nested foreach 
 

 Key: PIG-2007
 URL: https://issues.apache.org/jira/browse/PIG-2007
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Anitha Raju
Assignee: Xuefu Zhang
 Fix For: 0.9.0

 Attachments: PIG-2007.patch


 The below script when executed with version 0.9 fails with parsing error.
 {code}
  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 line 2, column 15 mismatched input '{' expecting GENERATE
 {code}
 Script1
 {code}
 register myudf.jar;
 A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
 B1 = foreach A {
 C = test.TOMAP('key1',$1)#'key1';
 generate C as C;
 }
 {code}
 The above happens when, in a nested foreach i refer to a map key directly 
 from a udf result
 The same would work if one executes without the nested foreach.
 {code}
 register myudf.jar;
 A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
 B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
 dump B1;
 {code}
 Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-22 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang resolved PIG-2007.
--

Resolution: Fixed

 Parsing error when map key referred directly from udf in nested foreach 
 

 Key: PIG-2007
 URL: https://issues.apache.org/jira/browse/PIG-2007
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Anitha Raju
Assignee: Xuefu Zhang
 Fix For: 0.9.0

 Attachments: PIG-2007.patch


 The below script when executed with version 0.9 fails with parsing error.
 {code}
  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 line 2, column 15 mismatched input '{' expecting GENERATE
 {code}
 Script1
 {code}
 register myudf.jar;
 A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
 B1 = foreach A {
 C = test.TOMAP('key1',$1)#'key1';
 generate C as C;
 }
 {code}
 The above happens when, in a nested foreach i refer to a map key directly 
 from a udf result
 The same would work if one executes without the nested foreach.
 {code}
 register myudf.jar;
 A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
 B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
 dump B1;
 {code}
 Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2007) Parsing error when map key referred directly from udf in nested foreach

2011-04-22 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023308#comment-13023308
 ] 

Xuefu Zhang commented on PIG-2007:
--

Patch is committed to trunk.

 Parsing error when map key referred directly from udf in nested foreach 
 

 Key: PIG-2007
 URL: https://issues.apache.org/jira/browse/PIG-2007
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Anitha Raju
Assignee: Xuefu Zhang
 Fix For: 0.9.0

 Attachments: PIG-2007.patch


 The below script when executed with version 0.9 fails with parsing error.
 {code}
  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 line 2, column 15 mismatched input '{' expecting GENERATE
 {code}
 Script1
 {code}
 register myudf.jar;
 A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
 B1 = foreach A {
 C = test.TOMAP('key1',$1)#'key1';
 generate C as C;
 }
 {code}
 The above happens when, in a nested foreach i refer to a map key directly 
 from a udf result
 The same would work if one executes without the nested foreach.
 {code}
 register myudf.jar;
 A = load 'test.txt' using PigStorage() as (a:int,b:chararray);
 B1 = foreach A generate test.TOMAP('key1',$1)#'key1';
 dump B1;
 {code}
 Script1 works well with 0.8.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Notes from the contributor meeting

2011-04-22 Thread Olga Natkovich
Hi,

Attendees:

Dmitriy Ryaboy
Alan Gates
Ashutosh Chauhan
Daniel Dai
Xuefu Zhang
Richard Ding
Olga Natkovich

Topics discussed:


(1)Improving Pig testing:

a.   Short term

   i.  making 
tests run significantly faster. Dmitriy said he would work on transitioning the 
tests into local mode. Hopefully that will reduce the run time from 10 hours to 
about 3.

 ii.  Get test 
patch automation back on. I took an action item to follow up on this.

b.  Longer term

   i.  Move 
beyond unit testing. Alan suggested that's once recently open sourced e2e 
harness is ready to be used (3-6 month) we would move most of e2e tests we 
currently run as unit tests into the e2e tests and only leave true unit tests 
in JUnit. This will reduce unit test runtime to something under an hour and 
will allow to run the e2e tests on real data and real clusters making the 
testing more realistic.

 ii.  Figuring 
out a way to make UDF testing easier. I don't think we had many good ideas on 
how to do this. Needs further discussion

(2)Discussion on release management. Main goal is to maintain stability for 
production systems while allowing changes to be released quickly. We came up 
with the following proposal:

a.   Making major releases time (not feature) based and release every 3 
month

b.  Make sure that branches post release are kept stable by only allowing 
P1 changes (failures with no reasonable workaround or silent failures)

c.   Develop disruptive features (example - parser changes) on separate 
branches and only folding them in once the code was completed and stabilized.

(3)Discussion on revamping UDF interface

a.   Making interface simpler - no need to implement 3 different version

b.  Making it more intuitive

   i.  No need 
for wrapping input parameters into tuples

 ii.  No need 
for parameters casting

iii.  Simplify 
schema management

   iv.  Simplify 
overloading

c.   This will need to coexist with the current approach for a significant 
amount of time (6-12 month) to let users transition.

(4)Status of Piggybank

a.   Not much progress so far. Dmitriy is struggling with the build process.


Other attendees - please, feel free to add.

Olga






[jira] [Updated] (PIG-2011) Speed up TestTypedMap.java

2011-04-22 Thread Dmitriy V. Ryaboy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-2011:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed to trunk (0.10).

 Speed up TestTypedMap.java 
 ---

 Key: PIG-2011
 URL: https://issues.apache.org/jira/browse/PIG-2011
 Project: Pig
  Issue Type: Improvement
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.10

 Attachments: PIG_2011.patch


 TestTypedMap uses Mapreduce mode and takes 7 minutes.
 This can be improved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-1989) complex type casting should return null on casting failure

2011-04-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1989:
---

Assignee: Daniel Dai

 complex type casting should return null on casting failure 
 ---

 Key: PIG-1989
 URL: https://issues.apache.org/jira/browse/PIG-1989
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Thejas M Nair
Assignee: Daniel Dai
 Fix For: 0.9.0


 When casting fails for complex objects, pig is currently returning un-casted 
 object if the cast fails. 
 It should return null instead. That is consistent with the behavior when 
 casting to other basic types. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1945) document Dynamic Invokers for udfs

2011-04-22 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1945.
--

  Resolution: Fixed
Release Note: 
This feature is documented in Pig 080 (pig latin ref 1) and Pig 090 (built in 
functions).
Closing jira.

 document Dynamic Invokers for udfs
 --

 Key: PIG-1945
 URL: https://issues.apache.org/jira/browse/PIG-1945
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Corinne Chandel
 Fix For: 0.9.0


 The Dynamic Invokers feature is not documented in official documentation (in 
 pig 0.8). It should be part of the udf documentation, or the page should at 
 least point to the javadoc pages of this family of udfs that help in invoking 
 java functions.
 Release notes are present in this jira - 
 https://issues.apache.org/jira/browse/PIG-1551 .

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1960) Pig CookBook documentation Map key should be quoted

2011-04-22 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1960.
--

Resolution: Fixed

Docs updated. See PIG-1772 and patch pig-1772-beta2-1.patch

 Pig CookBook documentation Map key should be quoted
 -

 Key: PIG-1960
 URL: https://issues.apache.org/jira/browse/PIG-1960
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.0
Reporter: Daniel Dai
Assignee: Corinne Chandel
 Fix For: 0.9.0


 There are two places in cookbook refer to a map key, which should be quoted:
 B = foreach A generate in#k1 as k1, in#k2 as k2;
 == B = foreach A generate in#'k1' as k1, in#'k2' as k2;
 B = foreach A generate CONCAT(in#k1, in#k2);
 == B = foreach A generate CONCAT(in#'k1', in#'k2');

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1983) Clarify requiredFieldList in LoadPushDown.pushProjection is read only

2011-04-22 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1983.
--

Resolution: Fixed

Docs updated. See PIG-1772 and patch pig-1772-beta2-1.patch

 Clarify requiredFieldList in LoadPushDown.pushProjection is read only
 -

 Key: PIG-1983
 URL: https://issues.apache.org/jira/browse/PIG-1983
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.9.0
Reporter: Daniel Dai
Assignee: Corinne Chandel
Priority: Minor
 Fix For: 0.9.0


 In Pig UDF manual, LoadPushDown.pushProjection(), add a clarification that 
 requiredFieldRequest is read only, cannot be changed by LoadFunc

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1796) need to document what is supported in nested foreach

2011-04-22 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1796.
--

Resolution: Fixed

Docs updated. See PIG-1772 and patch pig-1772-beta2-1.patch

 need to document what is supported in nested foreach
 

 Key: PIG-1796
 URL: https://issues.apache.org/jira/browse/PIG-1796
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.9.0




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-1968) Need to document embeding in Java

2011-04-22 Thread Corinne Chandel (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corinne Chandel resolved PIG-1968.
--

Resolution: Fixed

Docs updated. See PIG-1772 and patch pig-1772-beta2-1.patch

 Need to document embeding in Java
 -

 Key: PIG-1968
 URL: https://issues.apache.org/jira/browse/PIG-1968
 Project: Pig
  Issue Type: Bug
  Components: documentation
Reporter: Olga Natkovich
Assignee: Corinne Chandel
 Fix For: 0.9.0


 We have a small snipped in the setup but we should now provide the 
 information in the control structure section

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-1826) Unexpected data type -1 found in stream error

2011-04-22 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-1826:
---

Assignee: Daniel Dai

 Unexpected data type -1 found in stream error
 -

 Key: PIG-1826
 URL: https://issues.apache.org/jira/browse/PIG-1826
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
 Environment: This is pig 0.8.0 on a linux box
Reporter: Jonathan Coveney
Assignee: Daniel Dai
 Fix For: 0.9.0

 Attachments: PIG-1826.tar.gz, numgraph.java


 When running the attached udf I get the title error. By inserting printlns 
 extensively, the script is functioning properly and returning a DataBag, but 
 for whatever reason, pig does not detect it as such.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira