[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847097#comment-17847097 ] Daniel Dai commented on PIG-5453: - +1 > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Fix For: 0.19.0 > > Attachments: pig-5453-v01.patch, pig-5453-v02.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly
[ https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838865#comment-17838865 ] Daniel Dai commented on PIG-5453: - +1 > FLATTEN shifting fields incorrectly > --- > > Key: PIG-5453 > URL: https://issues.apache.org/jira/browse/PIG-5453 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5453-v01.patch > > > Follow up from PIG-5201, PIG-5452. > When flatten-ed tuple has less or more fields than specified, entire fields > shift incorrectly. > Input > {noformat} > A (a,b,c) > B (a,b,c) > C (a,b,c) > Y (a,b) > Z (a,b,c,d,e,f) > E{noformat} > Script > {code:java} > A = load 'input.txt' as (a1:chararray, a2:tuple()); > B = FOREACH A GENERATE a1, FLATTEN(a2) as > (b1:chararray,b2:chararray,b3:chararray), a1 as a4; > dump B; {code} > Incorrect results > {noformat} > (A,a,b,c,A) > (B,a,b,c,B) > (C,a,b,c,C) > (Y,a,b,Y,) > (Z,a,b,c,d) > (EE){noformat} > E is correct. It's fixed as part of PIG-5201, PIG-5452. > Y has shifted a4(Y) to the left incorrectly. > Should have been (Y,a,b,,Y) > Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2). > Should have been (Z,a,b,c,Z). > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5406) TestJoinLocal imports org.python.google.common.collect.Lists instead of org.google.common.collect.Lists
[ https://issues.apache.org/jira/browse/PIG-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581065#comment-17581065 ] Daniel Dai commented on PIG-5406: - +1 > TestJoinLocal imports org.python.google.common.collect.Lists instead of > org.google.common.collect.Lists > --- > > Key: PIG-5406 > URL: https://issues.apache.org/jira/browse/PIG-5406 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0, 0.16.0, 0.17.0 >Reporter: James Z.M. Gao >Assignee: Rohini Palaniswamy >Priority: Minor > Fix For: 0.18.0 > > Attachments: PIG-5406-v1.patch > > > [PIG-4366|https://github.com/apache/pig/commit/81abb6bd0adb6e101898d67b3c2a9e35e11ce993] > make PIG-2861 coming back. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PIG-5404) FLATTEN infers wrong datatype
[ https://issues.apache.org/jira/browse/PIG-5404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214281#comment-17214281 ] Daniel Dai commented on PIG-5404: - +1 > FLATTEN infers wrong datatype > - > > Key: PIG-5404 > URL: https://issues.apache.org/jira/browse/PIG-5404 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.17.0 >Reporter: Bruno Pusztahazi >Assignee: Koji Noguchi >Priority: Blocker > Labels: datatypes, flatten > Attachments: pig-5404-v01.patch > > > In version 0.12 (checked out branch-0.12) the following code works as > expected: > With the following input file test.csv: > > {code:java} > John_5,18,4.0F > Mary_6,19,3.8F > Bill_7,20,3.9F > Joe_8,18,3.8F{code} > > > {code:java} > A = LOAD 'test.csv' USING PigStorage (',') AS > (name:chararray,age:int,gpr:float); > B = FOREACH A GENERATE FLATTEN(STRSPLIT(name,'_')) as > (name1:chararray,name2:chararray),age,gpr; > DESCRIBE B;{code} > and produces the following output: > > {code:java} > B: {name1: chararray,name2: chararray,age: int,gpr: float} > {code} > This is the expected output as the result of flatten is defined as chararrays. > > When using version 0.17 (checkout out branch-0.17) the code produces: > {code:java} > B: {name1: bytearray,name2: bytearray,age: int,gpr: float} > {code} > This shows that somehow FLATTEN inferred wrong data types (bytearray instead > of chararay). > > Using explicit casting as a workaround on 0.17: > {code:java} > B1 = FOREACH B GENERATE (chararray)name1,(chararray)name2,age,gpr; > DESCRIBE B1;{code} > produces > {code:java} > B1: {name1: chararray,name2: chararray,age: int,gpr: float} > {code} > This time with the expected data types. > > The plan explain show some strange cast operators that are not really used > (or at least the actual data types are wrong): > {code:java} > #--- > # New Logical Plan: > #--- > B: (Name: LOStore Schema: > name1#121:chararray,name2#122:chararray,age#105:int,gpr#106:float) > | > |---B: (Name: LOForEach Schema: > name1#121:chararray,name2#122:chararray,age#105:int,gpr#106:float) > | | > | (Name: LOGenerate[false,false,false,false] Schema: > name1#121:chararray,name2#122:chararray,age#105:int,gpr#106:float)ColumnPrune:OutputUids=[121, > 105, 122, 106]ColumnPrune:InputUids=[121, 105, 122, 106] > | | | > | | (Name: Cast Type: chararray Uid: 121) > | | | > | | |---name1:(Name: Project Type: bytearray Uid: 121 Input: 0 > Column: 0) > | | | > | | (Name: Cast Type: chararray Uid: 122) > | | | > | | |---name2:(Name: Project Type: bytearray Uid: 122 Input: 1 > Column: 0) > | | | > | | age:(Name: Project Type: int Uid: 105 Input: 2 Column: 0) > | | | > | | gpr:(Name: Project Type: float Uid: 106 Input: 3 Column: 0) > | | > | |---(Name: LOInnerLoad[0] Schema: name1#121:bytearray) > | | > | |---(Name: LOInnerLoad[1] Schema: name2#122:bytearray) > | | > | |---(Name: LOInnerLoad[2] Schema: age#105:int) > | | > | |---(Name: LOInnerLoad[3] Schema: gpr#106:float) > | > |---B: (Name: LOForEach Schema: > name1#135:bytearray,name2#136:bytearray,age#105:int,gpr#106:float) > | | > | (Name: LOGenerate[true,false,false] Schema: > name1#135:bytearray,name2#136:bytearray,age#105:int,gpr#106:float) > | | | > | | (Name: UserFunc(org.apache.pig.builtin.STRSPLIT) Type: tuple > Uid: 132) > | | | > | | |---(Name: Cast Type: chararray Uid: 104) > | | | | > | | | |---name:(Name: Project Type: bytearray Uid: 104 Input: 0 > Column: (*)) > | | | > | | |---(Name: Constant Type: chararray Uid: 131) > | | | > | | (Name: Cast Type: int Uid: 105) > | | | > | | |---age:(Name: Project Type: bytearray Uid: 105 Input: 1 > Column: (*)) > | | | > | | (Name: Cast Type: float Uid: 106) > | | | > | | |---gpr:(Name: Project Type: bytearray Uid: 106 Input: 2 > Column: (*)) > | | > | |---(Name: LOInnerLoad[0] Schema: name#104:bytearray) > | | > | |---(Name: LOInnerLoad[1] Schema: age#105:bytearray) > | | > | |---(Name: LOInnerLoad[2] Schema: gpr#106:bytearray) > | > |---A: (Name: LOLoad Schema: > name#104:bytearray,age#105:bytearray,gpr#106:bytearray)RequiredFields:null > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PIG-5243) describe with typecast on as-clause shows the types before the typecasting
[ https://issues.apache.org/jira/browse/PIG-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214280#comment-17214280 ] Daniel Dai commented on PIG-5243: - +1 > describe with typecast on as-clause shows the types before the typecasting > -- > > Key: PIG-5243 > URL: https://issues.apache.org/jira/browse/PIG-5243 > Project: Pig > Issue Type: Bug > Components: parser >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5243-v01.patch > > > For code like > {code} > a = load 'test.txt' as (mytuple:tuple (), gpa:float); > b = foreach a generate mytuple as (mytuple2:(name:int, age:double)); > store b into '/tmp/deleteme'; > {code} > {{describe b}} shows > {noformat} > b: {mytuple2: (name: bytearray,age: bytearray)} > {noformat} > Execution wise, it is fine since there is an extra foreach typecasting the > above relation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE
[ https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732373#comment-16732373 ] Daniel Dai commented on PIG-5372: - Wow that's back in 2010 :). I think SkewedPartitioner.setConf is passing conf to MapRedUtil.loadPartitionFileFromLocalCache via PigMapReduce.sJobConf. This is no longer necessary as MapRedUtil.loadPartitionFileFromLocalCache takes mapConf parameter (in a later patch). We can change MapRedUtil.loadPartitionFileFromLocalCache to retrieve fs.file.impl/fs.hdfs.impl from mapConf. Then we don't need overwrite PigMapReduce.sJobConf in SkewedPartitioner.setConf. > SAMPLE/RANDOM(udf) before skewed join failing with NPE > -- > > Key: PIG-5372 > URL: https://issues.apache.org/jira/browse/PIG-5372 > Project: Pig > Issue Type: Bug >Affects Versions: 0.16.0 >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5372-v1.patch > > > Sample short code like below > {code} > A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int); > B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int); > A2 = FOREACH A generate *, RANDOM() as randnum; > D = join A2 by a1, B by b1 using 'skewed' parallel 2; > store D into '$output'; > {code} > Fails with NPE. > {noformat} > 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO > org.apache.tez.dag.history.HistoryEventHandler - > [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: > vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, > startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, > status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, > info=[Error: Failure while running > task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 -> > scope-58 Operator Key: scope-29): > org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception > while executing [POUserFunc (Name: > POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: > scope-40) children: null at []]: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171) > at > org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: > Exception while executing [POUserFunc (Name: > POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: > scope-40) children: null at []]: java.lang.NullPointerException > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325) > at >
[jira] [Commented] (PIG-5368) Braces without escaping in regexes throws error in recent perl versions
[ https://issues.apache.org/jira/browse/PIG-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715575#comment-16715575 ] Daniel Dai commented on PIG-5368: - Also committed PIG-5368-1.addendum.patch. > Braces without escaping in regexes throws error in recent perl versions > --- > > Key: PIG-5368 > URL: https://issues.apache.org/jira/browse/PIG-5368 > Project: Pig > Issue Type: Bug >Reporter: Laszlo Bodor >Assignee: Laszlo Bodor >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5368-1.addendum.patch, PIG-5368-1.patch > > > > |In perl v5.22, using a literal { in a regular expression was deprecated, and > will emit a warning if it isn't escaped: \{. In v5.26, this won't just warn, > it'll cause a syntax error.| > Example: > [https://github.com/apache/pig/blob/e766b6bf29e610b6312f8447fc008bed6beb4090/test/e2e/pig/tests/cmdline.conf#L47] > > {code} > $ perl -e 'print "It matches\n" if "Hello World" =~ /World{abc}/' > Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in > m/World\{ <-- HERE abc}/ at -e line 1. > $ perl -e 'print "It matches\n" if "Hello World" =~ /World\{abc}/' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields
[ https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705272#comment-16705272 ] Daniel Dai commented on PIG-5370: - +1 > Union onschema + columnprune dropping used fields > -- > > Key: PIG-5370 > URL: https://issues.apache.org/jira/browse/PIG-5370 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5370-v1.patch, pig-5370-v2.patch > > > After PIG-5312, below query started failing. > {code} > A = load 'input.txt' as (a1:int, a2:chararray, a3:int); > B = FOREACH (GROUP A by (a1,a2)) { > A_FOREACH = FOREACH A GENERATE a2,a3; > GENERATE A, FLATTEN(A_FOREACH) as (a2,a3); > } > C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: > chararray,a3:int); > D = UNION ONSCHEMA B, C; > dump D; > {code} > {code:title=input1.txt} > 1 a 3 > 2 b 4 > 2 c 5 > 1 a 6 > 2 b 7 > 1 c 8 > {code} > {code:title=input2.txt} > {(10,a0,30),(20,b0,40)} zzz 222 > {code} > {noformat:title=Expected output} > ({(10,a0,30),(20,b0,40)},zzz,222) > ({(1,a,6),(1,a,3)},a,6) > ({(1,a,6),(1,a,3)},a,3) > ({(1,c,8)},c,8) > ({(2,b,7),(2,b,4)},b,7) > ({(2,b,7),(2,b,4)},b,4) > ({(2,c,5)},c,5) > {noformat} > {noformat:title=Actual (incorrect) output} > ({(10,a0,30),(20,b0,40)})ONLY 1 Field > ({(1,a,6),(1,a,3)},a,6) > ({(1,a,6),(1,a,3)},a,3) > ({(1,c,8)},c,8) > ({(2,b,7),(2,b,4)},b,7) > ({(2,b,7),(2,b,4)},b,4) > ({(2,c,5)},c,5) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields
[ https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702822#comment-16702822 ] Daniel Dai commented on PIG-5370: - +1, sounds good to me. > Union onschema + columnprune dropping used fields > -- > > Key: PIG-5370 > URL: https://issues.apache.org/jira/browse/PIG-5370 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Major > Attachments: pig-5370-v1.patch > > > After PIG-5312, below query started failing. > {code} > A = load 'input.txt' as (a1:int, a2:chararray, a3:int); > B = FOREACH (GROUP A by (a1,a2)) { > A_FOREACH = FOREACH A GENERATE a2,a3; > GENERATE A, FLATTEN(A_FOREACH) as (a2,a3); > } > C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: > chararray,a3:int); > D = UNION ONSCHEMA B, C; > dump D; > {code} > {code:title=input1.txt} > 1 a 3 > 2 b 4 > 2 c 5 > 1 a 6 > 2 b 7 > 1 c 8 > {code} > {code:title=input2.txt} > {(10,a0,30),(20,b0,40)} zzz 222 > {code} > {noformat:title=Expected output} > ({(10,a0,30),(20,b0,40)},zzz,222) > ({(1,a,6),(1,a,3)},a,6) > ({(1,a,6),(1,a,3)},a,3) > ({(1,c,8)},c,8) > ({(2,b,7),(2,b,4)},b,7) > ({(2,b,7),(2,b,4)},b,4) > ({(2,c,5)},c,5) > {noformat} > {noformat:title=Actual (incorrect) output} > ({(10,a0,30),(20,b0,40)})ONLY 1 Field > ({(1,a,6),(1,a,3)},a,6) > ({(1,a,6),(1,a,3)},a,3) > ({(1,c,8)},c,8) > ({(2,b,7),(2,b,4)},b,7) > ({(2,b,7),(2,b,4)},b,4) > ({(2,c,5)},c,5) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PIG-5368) Braces without escaping in regexes throws error in recent perl versions
[ https://issues.apache.org/jira/browse/PIG-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-5368. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 0.18.0 Patch committed to trunk. Thanks Laszlo! > Braces without escaping in regexes throws error in recent perl versions > --- > > Key: PIG-5368 > URL: https://issues.apache.org/jira/browse/PIG-5368 > Project: Pig > Issue Type: Bug >Reporter: Laszlo Bodor >Assignee: Laszlo Bodor >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5368-1.patch > > > > |In perl v5.22, using a literal { in a regular expression was deprecated, and > will emit a warning if it isn't escaped: \{. In v5.26, this won't just warn, > it'll cause a syntax error.| > Example: > [https://github.com/apache/pig/blob/e766b6bf29e610b6312f8447fc008bed6beb4090/test/e2e/pig/tests/cmdline.conf#L47] > > {code} > $ perl -e 'print "It matches\n" if "Hello World" =~ /World\{abc}/' > Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in > m/World\{ <-- HERE abc}/ at -e line 1. > $ perl -e 'print "It matches\n" if "Hello World" =~ /World\\{abc}/' > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PIG-5366) Enable PigStreamingDepend to load from current directory in newer Perl versions
[ https://issues.apache.org/jira/browse/PIG-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5366: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 0.18.0 Status: Resolved (was: Patch Available) +1. Patch committed to trunk. Thanks [~abstractdog]! > Enable PigStreamingDepend to load from current directory in newer Perl > versions > --- > > Key: PIG-5366 > URL: https://issues.apache.org/jira/browse/PIG-5366 > Project: Pig > Issue Type: Bug >Reporter: Laszlo Bodor >Assignee: Laszlo Bodor >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-5366_1.patch > > > A perl related issue found while testing streaming. In newer perl versions > (>5.26), current directory (".") is not included in @INC, so > PerlStreamingDepend may fail during "use PigStreamingModule;". A possible > solution is to let this module add current directory for itself to make it > more independent from the environment (current perl version). > Test case was: > {code} > define CMD `perl PigStreamingDepend.pl - sio_5_1 sio_5_2` input(stdin) > output('sio_5_1', 'sio_5_2') ship('./libexec/PigStreamingDepend.pl', > './libexec/PigStreamingModule.pm'); > A = load '/user/hrt_qa/tests/data/singlefile/studenttab10k'; > B = stream A through CMD; > store B into > '/user/hrt_qa/out/hrtqa-1539851229-streaming.conf-StreamingIO/StreamingIO_5.out'; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5366) Enable PigStreamingDepend to load from current directory in newer Perl versions
[ https://issues.apache.org/jira/browse/PIG-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657989#comment-16657989 ] Daniel Dai commented on PIG-5366: - [~abstractdog], assigned to you. > Enable PigStreamingDepend to load from current directory in newer Perl > versions > --- > > Key: PIG-5366 > URL: https://issues.apache.org/jira/browse/PIG-5366 > Project: Pig > Issue Type: Bug >Reporter: Laszlo Bodor >Assignee: Laszlo Bodor >Priority: Major > Attachments: PIG-5366_1.patch > > > A perl related issue found while testing streaming. In newer perl versions > (>5.26), current directory (".") is not included in @INC, so > PerlStreamingDepend may fail during "use PigStreamingModule;". A possible > solution is to let this module add current directory for itself to make it > more independent from the environment (current perl version). > Test case was: > {code} > define CMD `perl PigStreamingDepend.pl - sio_5_1 sio_5_2` input(stdin) > output('sio_5_1', 'sio_5_2') ship('./libexec/PigStreamingDepend.pl', > './libexec/PigStreamingModule.pm'); > A = load '/user/hrt_qa/tests/data/singlefile/studenttab10k'; > B = stream A through CMD; > store B into > '/user/hrt_qa/out/hrtqa-1539851229-streaming.conf-StreamingIO/StreamingIO_5.out'; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PIG-5366) Enable PigStreamingDepend to load from current directory in newer Perl versions
[ https://issues.apache.org/jira/browse/PIG-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5366: --- Assignee: Laszlo Bodor > Enable PigStreamingDepend to load from current directory in newer Perl > versions > --- > > Key: PIG-5366 > URL: https://issues.apache.org/jira/browse/PIG-5366 > Project: Pig > Issue Type: Bug >Reporter: Laszlo Bodor >Assignee: Laszlo Bodor >Priority: Major > Attachments: PIG-5366_1.patch > > > A perl related issue found while testing streaming. In newer perl versions > (>5.26), current directory (".") is not included in @INC, so > PerlStreamingDepend may fail during "use PigStreamingModule;". A possible > solution is to let this module add current directory for itself to make it > more independent from the environment (current perl version). > Test case was: > {code} > define CMD `perl PigStreamingDepend.pl - sio_5_1 sio_5_2` input(stdin) > output('sio_5_1', 'sio_5_2') ship('./libexec/PigStreamingDepend.pl', > './libexec/PigStreamingModule.pm'); > A = load '/user/hrt_qa/tests/data/singlefile/studenttab10k'; > B = stream A through CMD; > store B into > '/user/hrt_qa/out/hrtqa-1539851229-streaming.conf-StreamingIO/StreamingIO_5.out'; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PIG-4373) Implement PIG-3861 in Tez
[ https://issues.apache.org/jira/browse/PIG-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-4373: Status: Patch Available (was: Open) > Implement PIG-3861 in Tez > - > > Key: PIG-4373 > URL: https://issues.apache.org/jira/browse/PIG-4373 > Project: Pig > Issue Type: Improvement > Components: tez >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Daniel Dai >Priority: Major > Labels: MissingFeature > Fix For: 0.18.0 > > Attachments: PIG-4373_1.patch > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PIG-4373) Implement PIG-3861 in Tez
[ https://issues.apache.org/jira/browse/PIG-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-4373: --- Assignee: Daniel Dai (was: Rohini Palaniswamy) > Implement PIG-3861 in Tez > - > > Key: PIG-4373 > URL: https://issues.apache.org/jira/browse/PIG-4373 > Project: Pig > Issue Type: Improvement > Components: tez >Affects Versions: 0.14.0 >Reporter: Rohini Palaniswamy >Assignee: Daniel Dai >Priority: Major > Labels: MissingFeature > Fix For: 0.18.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PIG-5329) cwiki training links
[ https://issues.apache.org/jira/browse/PIG-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-5329. - Resolution: Fixed Assignee: Daniel Dai Fix Version/s: site Updated, thanks! > cwiki training links > > > Key: PIG-5329 > URL: https://issues.apache.org/jira/browse/PIG-5329 > Project: Pig > Issue Type: Bug > Components: documentation >Reporter: Csaba Skrabak >Assignee: Daniel Dai >Priority: Trivial > Fix For: site > > > Every single link on the page > [https://cwiki.apache.org/confluence/display/PIG/Pig+Training] > is broken. > Google finds better training courses, e.g.: > [https://hortonworks.com/tutorial/beginners-guide-to-apache-pig/] > [https://www.tutorialspoint.com/apache_pig/index.htm] > [https://cognitiveclass.ai/courses/introduction-to-pig/] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PIG-4608) FOREACH ... UPDATE
[ https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336593#comment-16336593 ] Daniel Dai edited comment on PIG-4608 at 1/24/18 1:00 AM: -- bq. a = FOREACH b UPDATE q AS q:int – This should be illegal, right? If the type is changed, an explicit modify of the value should occur This should be valid, AS clause has the capacity to change types. UPDATE clause is evaluated before AS clause, so a = FOREACH b UPDATE q WITH (int)q AS q:chararray; Will result a chararray q. bq. flattening a tuple into existing fields - does this make sense This makes sense, it is a symmetry to the AS clause I didn't see UPDATE/DROP in a single statement in the example, are we not going to support both in the same statement? I actually prefer those in the same statement, as I feel users usually think about adjusting all columns in the same time. How about APPEND? Actually when I think about DROP/APPEND, I feel we have to have INSERT as well to close the loop. But if adding INSERT, other syntax might be more proper, such as: a = FOREACH b generate .., UPDATE a10 WITH 1 as new_a10, ..a20, 2 as a_20_plus_half, ..a30, a32.., UPDATE a40 WITH 2 as new_a40, 1 as a41; Here: Update: a10, a40 using UPDATE clause Insert: a_20_plus_half Drop: a31 Append: a41 In the original use case, it can be written as: intermediate = foreach i generate .., 3 as f3, .., 6 as f6, .., 48 as f48, ..; The idea is to make ".." syntax more flexible, skip prefix/suffix if can be inferred. Probably more natural to add support for INSERT with this, thus make the syntax complete. How's that sound? was (Author: daijy): bq. a = FOREACH b UPDATE q AS q:int – This should be illegal, right? If the type is changed, an explicit modify of the value should occur This should be valid, AS clause has the capacity to change types. UPDATE clause is evaluated before AS clause, so a = FOREACH b UPDATE q WITH (int)q AS q:chararray; Will result a chararray q. bq. flattening a tuple into existing fields - does this make sense This makes sense, it is a symmetry to the AS clause I didn't see UPDATE/DROP in a single statement in the example, are we not going to support both in the same statement? I actually prefer those in the same statement, as I feel users usually think about adjusting all columns in the same time. How about APPEND? Actually when I think about DROP/APPEND, I feel we have to have INSERT as well to close the loop. But if adding INSERT, other syntax might be more proper, such as: a = FOREACH b generate .., UPDATE a10 WITH 1 as new_a10, ..a20, 2 as a_20_plus_half, ..a30, a32.., UPDATE a40 WITH 2 as new_a40, 1 as a41; Here: Update: a10, a40 using UPDATE clause Insert: a_20_plus_half Drop: a31 Append: a41 In the original use case, it can be written as: intermediate = foreach i generate .., 3 as f3, .., 6 as f6, .. 48 as f48, ..; The idea is to make ".." syntax more flexible, skip prefix/suffix if can be inferred. Probably more natural to add support for INSERT with this, thus make the syntax complete. How's that sound? > FOREACH ... UPDATE > -- > > Key: PIG-4608 > URL: https://issues.apache.org/jira/browse/PIG-4608 > Project: Pig > Issue Type: New Feature >Reporter: Haley Thrapp >Priority: Major > > I would like to propose a new command in Pig, FOREACH...UPDATE. > Syntactically, it would look much like FOREACH … GENERATE. > Example: > Input data: > (1,2,3) > (2,3,4) > (3,4,5) > -- Load the data > three_numbers = LOAD 'input_data' > USING PigStorage() > AS (f1:int, f2:int, f3:int); > -- Sum up the row > updated = FOREACH three_numbers UPDATE > 5 as f1, > f1+f2 as new_sum > ; > Dump updated; > (5,2,3,3) > (5,3,4,5) > (5,4,5,7) > Fields to update must be specified by alias. Any fields in the UPDATE that do > not match an existing field will be appended to the end of the tuple. > This command is particularly desirable in scripts that deal with a large > number of fields (in the 20-200 range). Often, we need to only make > modifications to a few fields. The FOREACH ... UPDATE statement, allows the > developer to focus on the actual logical changes instead of having to list > all of the fields that are also being passed through. > My team has prototyped this with changes to FOREACH ... GENERATE. We believe > this can be done with changes to the parser and the creation of a new > LOUpdate. No physical plan changes should be needed because we will leverage > what LOGenerate does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-4608) FOREACH ... UPDATE
[ https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336593#comment-16336593 ] Daniel Dai commented on PIG-4608: - bq. a = FOREACH b UPDATE q AS q:int – This should be illegal, right? If the type is changed, an explicit modify of the value should occur This should be valid, AS clause has the capacity to change types. UPDATE clause is evaluated before AS clause, so a = FOREACH b UPDATE q WITH (int)q AS q:chararray; Will result a chararray q. bq. flattening a tuple into existing fields - does this make sense This makes sense, it is a symmetry to the AS clause I didn't see UPDATE/DROP in a single statement in the example, are we not going to support both in the same statement? I actually prefer those in the same statement, as I feel users usually think about adjusting all columns in the same time. How about APPEND? Actually when I think about DROP/APPEND, I feel we have to have INSERT as well to close the loop. But if adding INSERT, other syntax might be more proper, such as: a = FOREACH b generate .., UPDATE a10 WITH 1 as new_a10, ..a20, 2 as a_20_plus_half, ..a30, a32.., UPDATE a40 WITH 2 as new_a40, 1 as a41; Here: Update: a10, a40 using UPDATE clause Insert: a_20_plus_half Drop: a31 Append: a41 In the original use case, it can be written as: intermediate = foreach i generate .., 3 as f3, .., 6 as f6, .. 48 as f48, ..; The idea is to make ".." syntax more flexible, skip prefix/suffix if can be inferred. Probably more natural to add support for INSERT with this, thus make the syntax complete. How's that sound? > FOREACH ... UPDATE > -- > > Key: PIG-4608 > URL: https://issues.apache.org/jira/browse/PIG-4608 > Project: Pig > Issue Type: New Feature >Reporter: Haley Thrapp >Priority: Major > > I would like to propose a new command in Pig, FOREACH...UPDATE. > Syntactically, it would look much like FOREACH … GENERATE. > Example: > Input data: > (1,2,3) > (2,3,4) > (3,4,5) > -- Load the data > three_numbers = LOAD 'input_data' > USING PigStorage() > AS (f1:int, f2:int, f3:int); > -- Sum up the row > updated = FOREACH three_numbers UPDATE > 5 as f1, > f1+f2 as new_sum > ; > Dump updated; > (5,2,3,3) > (5,3,4,5) > (5,4,5,7) > Fields to update must be specified by alias. Any fields in the UPDATE that do > not match an existing field will be appended to the end of the tuple. > This command is particularly desirable in scripts that deal with a large > number of fields (in the 20-200 range). Often, we need to only make > modifications to a few fields. The FOREACH ... UPDATE statement, allows the > developer to focus on the actual logical changes instead of having to list > all of the fields that are also being passed through. > My team has prototyped this with changes to FOREACH ... GENERATE. We believe > this can be done with changes to the parser and the creation of a new > LOUpdate. No physical plan changes should be needed because we will leverage > what LOGenerate does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-4608) FOREACH ... UPDATE
[ https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329539#comment-16329539 ] Daniel Dai commented on PIG-4608: - "add" (or append?)/"update xxx as"/"drop" syntax sounds good to me. We also want to make sure it works with positional reference ($0, $1, etc). You might take a look PIG-3122 for keywords conflicts if applicable. > FOREACH ... UPDATE > -- > > Key: PIG-4608 > URL: https://issues.apache.org/jira/browse/PIG-4608 > Project: Pig > Issue Type: New Feature >Reporter: Haley Thrapp >Priority: Major > > I would like to propose a new command in Pig, FOREACH...UPDATE. > Syntactically, it would look much like FOREACH … GENERATE. > Example: > Input data: > (1,2,3) > (2,3,4) > (3,4,5) > -- Load the data > three_numbers = LOAD 'input_data' > USING PigStorage() > AS (f1:int, f2:int, f3:int); > -- Sum up the row > updated = FOREACH three_numbers UPDATE > 5 as f1, > f1+f2 as new_sum > ; > Dump updated; > (5,2,3,3) > (5,3,4,5) > (5,4,5,7) > Fields to update must be specified by alias. Any fields in the UPDATE that do > not match an existing field will be appended to the end of the tuple. > This command is particularly desirable in scripts that deal with a large > number of fields (in the 20-200 range). Often, we need to only make > modifications to a few fields. The FOREACH ... UPDATE statement, allows the > developer to focus on the actual logical changes instead of having to list > all of the fields that are also being passed through. > My team has prototyped this with changes to FOREACH ... GENERATE. We believe > this can be done with changes to the parser and the creation of a new > LOUpdate. No physical plan changes should be needed because we will leverage > what LOGenerate does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PIG-5293) Suspicious code as missing `this' for a member
[ https://issues.apache.org/jira/browse/PIG-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-5293. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 0.18.0 Patch committed to trunk. Thanks [~lifove]! > Suspicious code as missing `this' for a member > -- > > Key: PIG-5293 > URL: https://issues.apache.org/jira/browse/PIG-5293 > Project: Pig > Issue Type: Bug >Reporter: JC >Assignee: JC > Fix For: 0.18.0 > > > Hi > In a recent github mirror, I've found suspicious code. > Branch: trunk > Path: src/org/apache/pig/pen/util/ExampleTuple.java > {code:java} > ... > 39 Tuple t = null; > ... > 110 @Override > 111 public void reference(Tuple t) { > 112 t.reference(t); > 113 } > {code} > In Line 112, `t.reference' should be `this.t.reference'? This might be just a > trivial thing as the class name as ExampleTuple. But I wanted to report just > in case. > Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5191) Pig HBase 2.0.0 support
[ https://issues.apache.org/jira/browse/PIG-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137884#comment-16137884 ] Daniel Dai commented on PIG-5191: - Looks good. Can you also try if bin/pig would register all dependent hbase jars automatically in hbase2? ie, no need to manually register jars when using HBaseStorage. We have done this for hbase1. > Pig HBase 2.0.0 support > --- > > Key: PIG-5191 > URL: https://issues.apache.org/jira/browse/PIG-5191 > Project: Pig > Issue Type: Improvement >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5191_1.patch > > > Pig doesn't support HBase 2.0.0. Since the new HBase API introduces several > API changes, we should find a way to support both 1.x and 2.x HBase API. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PIG-5293) Suspicious code as missing `this' for a member
[ https://issues.apache.org/jira/browse/PIG-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5293: --- Assignee: JC > Suspicious code as missing `this' for a member > -- > > Key: PIG-5293 > URL: https://issues.apache.org/jira/browse/PIG-5293 > Project: Pig > Issue Type: Bug >Reporter: JC >Assignee: JC > > Hi > In a recent github mirror, I've found suspicious code. > Branch: trunk > Path: src/org/apache/pig/pen/util/ExampleTuple.java > {code:java} > ... > 39 Tuple t = null; > ... > 110 @Override > 111 public void reference(Tuple t) { > 112 t.reference(t); > 113 } > {code} > In Line 112, `t.reference' should be `this.t.reference'? This might be just a > trivial thing as the class name as ExampleTuple. But I wanted to report just > in case. > Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5293) Suspicious code as missing `this' for a member
[ https://issues.apache.org/jira/browse/PIG-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137880#comment-16137880 ] Daniel Dai commented on PIG-5293: - That sounds valid. Can you upload a patch? > Suspicious code as missing `this' for a member > -- > > Key: PIG-5293 > URL: https://issues.apache.org/jira/browse/PIG-5293 > Project: Pig > Issue Type: Bug >Reporter: JC > > Hi > In a recent github mirror, I've found suspicious code. > Branch: trunk > Path: src/org/apache/pig/pen/util/ExampleTuple.java > {code:java} > ... > 39 Tuple t = null; > ... > 110 @Override > 111 public void reference(Tuple t) { > 112 t.reference(t); > 113 } > {code} > In Line 112, `t.reference' should be `this.t.reference'? This might be just a > trivial thing as the class name as ExampleTuple. But I wanted to report just > in case. > Thanks! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5289) update .eclipse.templates/.classpath with latest jars
[ https://issues.apache.org/jira/browse/PIG-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137875#comment-16137875 ] Daniel Dai commented on PIG-5289: - I don't think .eclipse.templates is used in eclipse-files. We already use ivy cache to generate .classpath file in PIG-2282. > update .eclipse.templates/.classpath with latest jars > - > > Key: PIG-5289 > URL: https://issues.apache.org/jira/browse/PIG-5289 > Project: Pig > Issue Type: Bug > Components: build >Affects Versions: 0.17.0 >Reporter: Artem Ervits >Assignee: Artem Ervits > Fix For: trunk > > > The file still references hadoop 0.20, zk 3.3.3, etc. We have to fix it > sometime to work with newer versions of hadoop and add Tez and Spark. Instead > of having a hardcoded file that will go outdated often as versions are > incremented would be better to have a ant target that generates the file > based on dependencies in build/ivy/lib -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5286) Run verify_pig in e2e with old version of Pig
[ https://issues.apache.org/jira/browse/PIG-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137870#comment-16137870 ] Daniel Dai commented on PIG-5286: - Sounds good to me. Running verify_pig on old Pig is more natural. > Run verify_pig in e2e with old version of Pig > - > > Key: PIG-5286 > URL: https://issues.apache.org/jira/browse/PIG-5286 > Project: Pig > Issue Type: Bug >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.18.0 > > Attachments: PIG-5286-1.patch > > > Currently verify_pig runs a different equivalent script as the testcase but > runs with the same version of Pig. Ran into a issue where a test passed when > a bug was introduced and benchmark files were not present. The newly > generated benchmarks were also wrong. Caught the failure when running again > pointing to previously generated benchmarks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)
[ https://issues.apache.org/jira/browse/PIG-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137589#comment-16137589 ] Daniel Dai commented on PIG-5271: - Looks good to me. [~rohini], do you want a second look? > StackOverflowError when compiling in Tez mode (with union and replicated join) > -- > > Key: PIG-5271 > URL: https://issues.apache.org/jira/browse/PIG-5271 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5271-v01.patch, pig-5271-v02.patch > > > Sample script > {code} > a4 = LOAD 'studentnulltab10k' as (name, age:int, gpa:float); > a4_1 = filter a4 by gpa is null or gpa >= 3.9; > a4_2 = filter a4 by gpa < 1; > b4 = union a4_1, a4_2; > b4_1 = filter b4 by age < 30; > b4_2 = foreach b4 generate name, age, FLOOR(gpa) as gpa; > c4 = load 'voternulltab10k' as (name, age, registration, contributions); > d4 = join b4_2 by name, c4 by name using 'replicated'; > e4 = foreach d4 generate b4_2::name as name, b4_2::age as age, gpa, > registration, contributions; > f4 = order e4 by name, age DESC; > store f4 into 'tmp_table_4' ; > a5_1 = filter a4 by gpa is null or gpa <= 3.9; > a5_2 = filter a4 by gpa < 2; > b5 = union a5_1, a5_2; > d5 = join c4 by name, b5 by name using 'replicated'; > store d5 into 'tmp_table_5' ; > {code} > This script fails to compile with StackOverflowError. > {noformat} > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) > Pig Stack Trace > --- > ERROR 2998: Unhandled internal error. null > java.lang.StackOverflowError > at java.lang.reflect.Constructor.newInstance(Constructor.java:415) > at java.lang.Class.newInstance(Class.java:442) > at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:490) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:101) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > at > org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105) > ... > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (PIG-5272) BagToString Output Schema
[ https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137270#comment-16137270 ] Daniel Dai edited comment on PIG-5272 at 8/22/17 7:41 PM: -- Are you saying your data does not match your declared schema? If you are not sure about the bag inner schema, you shall leave it empty by just declaring it as \{()\}, which means this is a bag with unknown inner schema. I see BagToString does have an issue, it does not deal with unknown inner schema. If that's the issue you are trying to fix, you are welcome to submit a patch. was (Author: daijy): Are you saying your data does not match your declared schema? If you are not sure about the bag inner schema, you shall leave it empty by just declaring it as {()}, which means this is a bag with unknown inner schema. I see BagToString does have an issue, it does not deal with unknown inner schema. If that's the issue you are trying to fix, you are welcome to submit a patch. > BagToString Output Schema > - > > Key: PIG-5272 > URL: https://issues.apache.org/jira/browse/PIG-5272 > Project: Pig > Issue Type: Improvement >Reporter: Joshua Juen >Priority: Minor > > The output schema from BagToTuple is nonsensical causing problems using the > tuple later in the same script. > For example: Given a bag: { data:chararray }, calling BagToTuple yields the > schema: ( data:chararray ) > But, this makes no sense since if the above bag contains: {data1, data2, > data3} entries, the output tuple from BagToTuple will be: > (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the > declared output schema from the UDF. > Unfortunately, the schema of the tuple cannot be known during the initial > validation phase. Thus, I believe the output schema from the UDF should be > modified to be type tuple without the number of fields being fixed to the > number of columns in the input bag. > Under the current way, the elements in the tuple cannot be accessed in the > script after calling BagToTuple without getting an incompatible type error. > We have modified the UDF in our internal UDF jars to work around the issue. > Let me know if this sounds reasonable and I can generate the patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5272) BagToString Output Schema
[ https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137270#comment-16137270 ] Daniel Dai commented on PIG-5272: - Are you saying your data does not match your declared schema? If you are not sure about the bag inner schema, you shall leave it empty by just declaring it as {()}, which means this is a bag with unknown inner schema. I see BagToString does have an issue, it does not deal with unknown inner schema. If that's the issue you are trying to fix, you are welcome to submit a patch. > BagToString Output Schema > - > > Key: PIG-5272 > URL: https://issues.apache.org/jira/browse/PIG-5272 > Project: Pig > Issue Type: Improvement >Reporter: Joshua Juen >Priority: Minor > > The output schema from BagToTuple is nonsensical causing problems using the > tuple later in the same script. > For example: Given a bag: { data:chararray }, calling BagToTuple yields the > schema: ( data:chararray ) > But, this makes no sense since if the above bag contains: {data1, data2, > data3} entries, the output tuple from BagToTuple will be: > (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the > declared output schema from the UDF. > Unfortunately, the schema of the tuple cannot be known during the initial > validation phase. Thus, I believe the output schema from the UDF should be > modified to be type tuple without the number of fields being fixed to the > number of columns in the input bag. > Under the current way, the elements in the tuple cannot be accessed in the > script after calling BagToTuple without getting an incompatible type error. > We have modified the UDF in our internal UDF jars to work around the issue. > Let me know if this sounds reasonable and I can generate the patch. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
[ https://issues.apache.org/jira/browse/PIG-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5268: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 0.18.0 Status: Resolved (was: Patch Available) +1. I don't think there's anything wrong to cleanup code especially for new contributors. Patch committed to trunk. Thanks Beluga! > Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage > > > Key: PIG-5268 > URL: https://issues.apache.org/jira/browse/PIG-5268 > Project: Pig > Issue Type: Improvement > Components: data >Affects Versions: 0.17.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Fix For: 0.18.0 > > Attachments: PIG-5268.1.patch, PIG-5268.2.patch > > > # Optimize for case where {{asCollection}} is empty > # Tidy up -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
[ https://issues.apache.org/jira/browse/PIG-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5268: --- Assignee: BELUGA BEHR > Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage > > > Key: PIG-5268 > URL: https://issues.apache.org/jira/browse/PIG-5268 > Project: Pig > Issue Type: Improvement > Components: data >Affects Versions: 0.17.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Attachments: PIG-5268.1.patch, PIG-5268.2.patch > > > # Optimize for case where {{asCollection}} is empty > # Tidy up -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5201) Null handling on FLATTEN
[ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121076#comment-16121076 ] Daniel Dai commented on PIG-5201: - What's your idea for the column padding? > Null handling on FLATTEN > > > Key: PIG-5201 > URL: https://issues.apache.org/jira/browse/PIG-5201 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.18.0 > > Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, > pig-5201-v02.patch, pig-5201-v03.patch > > > Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect > results. > Test code/script to follow. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5256) Bytecode generation for POFilter and POForeach
[ https://issues.apache.org/jira/browse/PIG-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121068#comment-16121068 ] Daniel Dai commented on PIG-5256: - Haven't checked the code line by line, but the overall approach looks good to me. The patch flattened expression tree and nested plan, so we can avoid virtual function call, and have a cleaner solution for multi-time evaluation issue. This can be extended to flatten the whole operator tree in the future (whole stage codegen). > Bytecode generation for POFilter and POForeach > -- > > Key: PIG-5256 > URL: https://issues.apache.org/jira/browse/PIG-5256 > Project: Pig > Issue Type: Sub-task > Components: impl >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.18.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5201) Null handling on FLATTEN
[ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120551#comment-16120551 ] Daniel Dai commented on PIG-5201: - It is equivalent to flatten a scalar. That's sounds fine. How about columns? Shall we produce the same number of nulls columns according to schema? It might be the same as PIG-2537. > Null handling on FLATTEN > > > Key: PIG-5201 > URL: https://issues.apache.org/jira/browse/PIG-5201 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.18.0 > > Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, > pig-5201-v02.patch, pig-5201-v03.patch > > > Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect > results. > Test code/script to follow. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5254) Hit Ctrl-D to quit grunt shell fail
[ https://issues.apache.org/jira/browse/PIG-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5254: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to both trunk and 0.17 branch. Thanks Weijun for contributing! > Hit Ctrl-D to quit grunt shell fail > --- > > Key: PIG-5254 > URL: https://issues.apache.org/jira/browse/PIG-5254 > Project: Pig > Issue Type: Bug > Components: impl >Affects Versions: 0.18.0, 0.17.1 >Reporter: Daniel Dai >Assignee: Weijun Qian > Fix For: 0.18.0, 0.17.1 > > Attachments: PIG-5254.patch > > > Exception: > {code} > java.lang.NullPointerException > at > org.apache.pig.tools.grunt.ConsoleReaderInputStream$ConsoleLineInputStream.read(ConsoleReaderInputStream.java:107) > at java.io.InputStream.read(InputStream.java:170) > at java.io.SequenceInputStream.read(SequenceInputStream.java:207) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.FillBuff(JavaCharStream.java:143) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.ReadByte(JavaCharStream.java:171) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.readChar(JavaCharStream.java:274) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.BeginToken(JavaCharStream.java:193) > at > org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3215) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1511) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) > at org.apache.pig.Main.run(Main.java:564) > at org.apache.pig.Main.main(Main.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5282) Upgade to Java 8
[ https://issues.apache.org/jira/browse/PIG-5282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111280#comment-16111280 ] Daniel Dai commented on PIG-5282: - I don't have problem for that. Hive already move to JDK 8 and we can follow. > Upgade to Java 8 > > > Key: PIG-5282 > URL: https://issues.apache.org/jira/browse/PIG-5282 > Project: Pig > Issue Type: Improvement >Reporter: Nandor Kollar > Fix For: 0.18.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (PIG-5270) Typo in Pig Logging
[ https://issues.apache.org/jira/browse/PIG-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-5270. - Resolution: Fixed Hadoop Flags: Reviewed Committed to trunk. Thank for fixing typo. > Typo in Pig Logging > --- > > Key: PIG-5270 > URL: https://issues.apache.org/jira/browse/PIG-5270 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0 > Environment: All >Reporter: Andrew Hutton >Assignee: Andrew Hutton >Priority: Minor > Labels: easyfix, patch > Fix For: 0.18.0 > > Attachments: PIG-5270.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > In the log output of the internalCopyAllGeneratedToDistributedCache() method > in pig/data/SchemaTupleFrontend.java the word "cache" is misspelled as > "cacche". According to another issue, this was already addressed and resolved > in 2013, however the issue persists in the latest releases. > Here is a link to the previous issue: > https://issues.apache.org/jira/browse/PIG-3432 > I also issued a pull request to the Github mirror: > https://github.com/apache/pig/pull/30 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5270) Typo in Pig Logging
[ https://issues.apache.org/jira/browse/PIG-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5270: Fix Version/s: (was: trunk) (was: 0.17.1) (was: 0.16.1) (was: 0.17.0) (was: 0.15.1) (was: 0.16.0) (was: 0.15.0) (was: 0.14.0) (was: 0.13.0) > Typo in Pig Logging > --- > > Key: PIG-5270 > URL: https://issues.apache.org/jira/browse/PIG-5270 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0 > Environment: All >Reporter: Andrew Hutton >Assignee: Andrew Hutton >Priority: Minor > Labels: easyfix, patch > Fix For: 0.18.0 > > Attachments: PIG-5270.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > In the log output of the internalCopyAllGeneratedToDistributedCache() method > in pig/data/SchemaTupleFrontend.java the word "cache" is misspelled as > "cacche". According to another issue, this was already addressed and resolved > in 2013, however the issue persists in the latest releases. > Here is a link to the previous issue: > https://issues.apache.org/jira/browse/PIG-3432 > I also issued a pull request to the Github mirror: > https://github.com/apache/pig/pull/30 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (PIG-5270) Typo in Pig Logging
[ https://issues.apache.org/jira/browse/PIG-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5270: --- Assignee: Andrew Hutton > Typo in Pig Logging > --- > > Key: PIG-5270 > URL: https://issues.apache.org/jira/browse/PIG-5270 > Project: Pig > Issue Type: Bug > Components: data >Affects Versions: 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0 > Environment: All >Reporter: Andrew Hutton >Assignee: Andrew Hutton >Priority: Minor > Labels: easyfix, patch > Fix For: 0.18.0 > > Attachments: PIG-5270.patch > > Original Estimate: 5m > Remaining Estimate: 5m > > In the log output of the internalCopyAllGeneratedToDistributedCache() method > in pig/data/SchemaTupleFrontend.java the word "cache" is misspelled as > "cacche". According to another issue, this was already addressed and resolved > in 2013, however the issue persists in the latest releases. > Here is a link to the previous issue: > https://issues.apache.org/jira/browse/PIG-3432 > I also issued a pull request to the Github mirror: > https://github.com/apache/pig/pull/30 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-4767) Partition filter not pushed down when filter clause references variable from another load path
[ https://issues.apache.org/jira/browse/PIG-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084758#comment-16084758 ] Daniel Dai commented on PIG-4767: - That's right, PartitionFilterOptimizer and PredicatePushdownOptimizer does not push filter up. The problem PIG-1669 try to solve does not exist. +1. > Partition filter not pushed down when filter clause references variable from > another load path > -- > > Key: PIG-4767 > URL: https://issues.apache.org/jira/browse/PIG-4767 > Project: Pig > Issue Type: Bug >Affects Versions: 0.15.0 >Reporter: Anthony Hsu >Assignee: Koji Noguchi > Fix For: 0.18.0 > > Attachments: pig-4767-v01.patch > > > To reproduce: > {noformat:title=test.pig} > a = load 'a.txt'; > a_group = group a all; > a_count = foreach a_group generate COUNT(a) as count; > b = load 'mytable' using org.apache.hcatalog.pig.HCatLoader(); > b = filter b by datepartition == '2015-09-01-00' and foo == a_count.count; > dump b; > {noformat} > The above query ends up reading all the table partitions. If you remove the > {{foo == a_count.count}} clause or replace {{a_count.count}} with a constant, > then partition filtering happens properly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5264) Remove deprecated keys from PigConfiguration
[ https://issues.apache.org/jira/browse/PIG-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084623#comment-16084623 ] Daniel Dai commented on PIG-5264: - +1 > Remove deprecated keys from PigConfiguration > > > Key: PIG-5264 > URL: https://issues.apache.org/jira/browse/PIG-5264 > Project: Pig > Issue Type: Improvement >Reporter: Nandor Kollar >Assignee: Nandor Kollar >Priority: Minor > Fix For: 0.18.0 > > Attachments: PIG-5264_1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > PigConfiguration includes several deprecated constants (like INSERT_ENABLED, > SCHEMA_TUPLE_SHOULD_ALLOW_FORCE, etc.). This should be removed as all a > deprecated since multiple version. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (PIG-5254) Hit Ctrl-D to quit grunt shell fail
[ https://issues.apache.org/jira/browse/PIG-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5254: Fix Version/s: 0.17.1 > Hit Ctrl-D to quit grunt shell fail > --- > > Key: PIG-5254 > URL: https://issues.apache.org/jira/browse/PIG-5254 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.18.0, 0.17.1 > > > Exception: > {code} > java.lang.NullPointerException > at > org.apache.pig.tools.grunt.ConsoleReaderInputStream$ConsoleLineInputStream.read(ConsoleReaderInputStream.java:107) > at java.io.InputStream.read(InputStream.java:170) > at java.io.SequenceInputStream.read(SequenceInputStream.java:207) > at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > at java.io.InputStreamReader.read(InputStreamReader.java:184) > at java.io.BufferedReader.fill(BufferedReader.java:161) > at java.io.BufferedReader.read1(BufferedReader.java:212) > at java.io.BufferedReader.read(BufferedReader.java:286) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.FillBuff(JavaCharStream.java:143) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.ReadByte(JavaCharStream.java:171) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.readChar(JavaCharStream.java:274) > at > org.apache.pig.tools.pigscript.parser.JavaCharStream.BeginToken(JavaCharStream.java:193) > at > org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3215) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1511) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) > at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) > at org.apache.pig.Main.run(Main.java:564) > at org.apache.pig.Main.main(Main.java:175) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PIG-5254) Hit Ctrl-D to quit grunt shell fail
Daniel Dai created PIG-5254: --- Summary: Hit Ctrl-D to quit grunt shell fail Key: PIG-5254 URL: https://issues.apache.org/jira/browse/PIG-5254 Project: Pig Issue Type: Bug Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.18.0 Exception: {code} java.lang.NullPointerException at org.apache.pig.tools.grunt.ConsoleReaderInputStream$ConsoleLineInputStream.read(ConsoleReaderInputStream.java:107) at java.io.InputStream.read(InputStream.java:170) at java.io.SequenceInputStream.read(SequenceInputStream.java:207) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) at org.apache.pig.tools.pigscript.parser.JavaCharStream.FillBuff(JavaCharStream.java:143) at org.apache.pig.tools.pigscript.parser.JavaCharStream.ReadByte(JavaCharStream.java:171) at org.apache.pig.tools.pigscript.parser.JavaCharStream.readChar(JavaCharStream.java:274) at org.apache.pig.tools.pigscript.parser.JavaCharStream.BeginToken(JavaCharStream.java:193) at org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3215) at org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1511) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66) at org.apache.pig.Main.run(Main.java:564) at org.apache.pig.Main.main(Main.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5225) Several unit tests are not annotated with @Test
[ https://issues.apache.org/jira/browse/PIG-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032007#comment-16032007 ] Daniel Dai commented on PIG-5225: - The test is added even before me. The test won't throw exception, it will get a null result and a warning counter like Rohini points out. However, the test name suggest it is testing a failed UDF. I don't this is valid anymore and fine to remove it. > Several unit tests are not annotated with @Test > --- > > Key: PIG-5225 > URL: https://issues.apache.org/jira/browse/PIG-5225 > Project: Pig > Issue Type: Bug >Reporter: Nandor Kollar >Assignee: Nandor Kollar > Fix For: 0.18.0 > > Attachments: PIG-5225.patch > > > Several test cases are not annotated with @Test. Since we use JUnit 4, these > test cases seems to be excluded. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PIG-5216) Customizable Error Handling for Loaders in Pig
[ https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-5216. - Resolution: Fixed Hadoop Flags: Reviewed Also did rebase after spark merge. Patch committed to trunk. Thanks Iris! > Customizable Error Handling for Loaders in Pig > -- > > Key: PIG-5216 > URL: https://issues.apache.org/jira/browse/PIG-5216 > Project: Pig > Issue Type: Improvement >Reporter: Iris Zeng >Assignee: Iris Zeng > Fix For: 0.18.0 > > Attachments: PIG-5216-1.patch, PIG-5216-2.patch, PIG-5216-3.patch, > PIG-5216-4.patch > > > Add Error Handling for Loaders in Pig, so that user can choose to allow > errors when load data, and set error numbers / rate > Ideas based on error handling on store func see > https://issues.apache.org/jira/browse/PIG-4704 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5216) Customizable Error Handling for Loaders in Pig
[ https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5216: Attachment: PIG-5216-4.patch Find several issues when running unit tests: 1. In POLoad.setup, we also need to set LoadFuncDecorator 2. When serializing "pig.loads", set POLoad.parentPlan null as we don't want to serialize all physical plan 3. In MRJobStats, we still refer "pig.inputs" 4. Some formatting issues Attach PIG-5216-4.patch. > Customizable Error Handling for Loaders in Pig > -- > > Key: PIG-5216 > URL: https://issues.apache.org/jira/browse/PIG-5216 > Project: Pig > Issue Type: Improvement >Reporter: Iris Zeng >Assignee: Iris Zeng > Fix For: 0.18.0 > > Attachments: PIG-5216-1.patch, PIG-5216-2.patch, PIG-5216-3.patch, > PIG-5216-4.patch > > > Add Error Handling for Loaders in Pig, so that user can choose to allow > errors when load data, and set error numbers / rate > Ideas based on error handling on store func see > https://issues.apache.org/jira/browse/PIG-4704 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5184) set command to view value of a variable
[ https://issues.apache.org/jira/browse/PIG-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028088#comment-16028088 ] Daniel Dai commented on PIG-5184: - Addressing Rohini's review comments. > set command to view value of a variable > --- > > Key: PIG-5184 > URL: https://issues.apache.org/jira/browse/PIG-5184 > Project: Pig > Issue Type: Improvement > Components: parser >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.18.0 > > Attachments: PIG-5184-1.patch, PIG-5184-2.patch > > > Currently, set command can set the value of a variable, or show all variables > along with value. I'd like to add another form which show the value of a > particular variable. For example: > >set fs.defaultFS (show value of fs.defaultFS). > That will help us debug a Pig session in a cleaner way (as compare to show > all variables). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5184) set command to view value of a variable
[ https://issues.apache.org/jira/browse/PIG-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5184: Attachment: PIG-5184-2.patch > set command to view value of a variable > --- > > Key: PIG-5184 > URL: https://issues.apache.org/jira/browse/PIG-5184 > Project: Pig > Issue Type: Improvement > Components: parser >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.18.0 > > Attachments: PIG-5184-1.patch, PIG-5184-2.patch > > > Currently, set command can set the value of a variable, or show all variables > along with value. I'd like to add another form which show the value of a > particular variable. For example: > >set fs.defaultFS (show value of fs.defaultFS). > That will help us debug a Pig session in a cleaner way (as compare to show > all variables). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4059) Pig on Spark
[ https://issues.apache.org/jira/browse/PIG-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027728#comment-16027728 ] Daniel Dai commented on PIG-4059: - +1. Didn't get a chance to review the patch, but we shall not further delay it. > Pig on Spark > > > Key: PIG-4059 > URL: https://issues.apache.org/jira/browse/PIG-4059 > Project: Pig > Issue Type: New Feature > Components: spark >Reporter: Rohini Palaniswamy >Assignee: Praveen Rachabattuni > Labels: spork > Fix For: spark-branch > > Attachments: Pig-on-Spark-Design-Doc.pdf, Pig-on-Spark-Scope.pdf > > > Setting up your development environment: > 0. download spark release package(currently pig on spark only support spark > 1.6). > 1. Check out Pig Spark branch. > 2. Build Pig by running "ant jar" and "ant -Dhadoopversion=23 jar" for > hadoop-2.x versions > 3. Configure these environmental variables: > export HADOOP_USER_CLASSPATH_FIRST="true" > Now we support “local” and "yarn-client" mode, you can export system variable > “SPARK_MASTER” like: > export SPARK_MASTER=local or export SPARK_MASTER="yarn-client" > 4. In local mode: ./pig -x spark_local xxx.pig > In yarn-client mode: > export SPARK_HOME=xx; > export SPARK_JAR=hdfs://example.com:8020/ (the hdfs location where > you upload the spark-assembly*.jar) > ./pig -x spark xxx.pig -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5201) Null handling on FLATTEN
[ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026631#comment-16026631 ] Daniel Dai commented on PIG-5201: - It is not a regression, so it is certainly fine to push out. Koji can still try but it should not be a release blocker. > Null handling on FLATTEN > > > Key: PIG-5201 > URL: https://issues.apache.org/jira/browse/PIG-5201 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Fix For: 0.17.0 > > Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, > pig-5201-v02.patch > > > Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect > results. > Test code/script to follow. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4662) New optimizer rule: filter nulls before inner joins
[ https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025387#comment-16025387 ] Daniel Dai commented on PIG-4662: - I don't think it would make noticeable performance difference going either way. I'd like to see a modular design rather than intermingle different concept together. Also I don't feel it is hard to find the join key in the logical optimizer and adding a filter on it. > New optimizer rule: filter nulls before inner joins > --- > > Key: PIG-4662 > URL: https://issues.apache.org/jira/browse/PIG-4662 > Project: Pig > Issue Type: Improvement >Reporter: Ido Hadanny >Assignee: Satish Subhashrao Saley >Priority: Minor > Labels: Performance > Fix For: 0.18.0 > > > As stated in the docs, rewriting an inner join and filtering nulls from > inputs can be a big performance gain: > http://pig.apache.org/docs/r0.14.0/perf.html#nulls > We would like to add an optimizer rule which detects inner joins, and filters > nulls in all inputs: > A = filter A by t is not null; > B = filter B by x is not null; > C = join A by t, B by x; > see also: > http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5194) HiveUDF fails with Spark exec type
[ https://issues.apache.org/jira/browse/PIG-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025249#comment-16025249 ] Daniel Dai commented on PIG-5194: - +1 for the HiveUDAF change, thanks for catching this! > HiveUDF fails with Spark exec type > -- > > Key: PIG-5194 > URL: https://issues.apache.org/jira/browse/PIG-5194 > Project: Pig > Issue Type: Sub-task > Components: spark >Reporter: Adam Szita >Assignee: Adam Szita > Fix For: spark-branch > > Attachments: PIG-5194.0.patch, PIG-5194.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5231) PigStorage with -schema may produce inconsistent outputs with more fields
[ https://issues.apache.org/jira/browse/PIG-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025155#comment-16025155 ] Daniel Dai commented on PIG-5231: - Vote for 3. We pick the first schema in dirs in all LoadFunc, such as OrcStorage, AvroStorage. I don't think we shall make an exception for PigStorage. +1 for the patch. > PigStorage with -schema may produce inconsistent outputs with more fields > - > > Key: PIG-5231 > URL: https://issues.apache.org/jira/browse/PIG-5231 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5231-v01.patch > > > When multiple directories are passed to PigStorage(',','-schema'), pig will > {quote} > No attempt to merge conflicting schemas is made during loading. The first > schema encountered during a file system scan is used. > {quote} > For two directories input with schema > file1: (f1:chararray, f2:int) and > file2: (f1:chararray, f2:int, f3:int) > Pig will pick the first schema from file1 and only allow f1, f2 access. > However, output would still contain 3 fields for tuples from file2. This > later leads to complete corrupt outputs due to shifted fields resulting in > incorrect references. > (This may also happen when input itself contains the delimiter.) > If file2 schema is picked, this is already handled by filling the missing > fields with null. (PIG-3100) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024919#comment-16024919 ] Daniel Dai commented on PIG-5224: - bq. Well, if next LOForEach is not removing all the columns which are not used, then essentially those columns are being used and therefore ColumnPruner would not have tried to prune them in the first place? That's only if user write "foreach" statement carefully. If he project a column but never used in the script, Column pruner will still think this is a column should remove. +1 for pig-5224-v2.patch. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch, > pig-5224-v2.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PIG-3021) Split results missing records when there is null values in the column comparison
[ https://issues.apache.org/jira/browse/PIG-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-3021. - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 0.17.0 +1 for PIG-3021-4.patch. Patch committed to trunk. Thanks Nian, Cheolsoo! [~jeffjee617], do you mind adding some documentation as well (in other Jira)? > Split results missing records when there is null values in the column > comparison > > > Key: PIG-3021 > URL: https://issues.apache.org/jira/browse/PIG-3021 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Chang Luo >Assignee: Nian Ji > Fix For: 0.17.0 > > Attachments: PIG-3021-2.patch, PIG-3021-3.patch, PIG-3021-4.patch, > PIG-3021.patch > > > Suppose a(x, y) > split a into b if x==y, c otherwise; > One will expect the union of b and c will be a. However, if x or y is null, > the record won't appear in either b or c. > To workaround this, I have to change to the following: > split a into b if x is not null and y is not null and x==y, c otherwise; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024320#comment-16024320 ] Daniel Dai commented on PIG-5224: - The inserted LOForEach remove all the columns which are not used in the scripts going forward. The next LOForEach is not necessary doing that. I believe this is not for performance reason (The performance gain for removing several columns might be debatable), this is to make ColumnPruner simpler. > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5235) Typecast with as-clause fails for tuple/bag with an empty schema
[ https://issues.apache.org/jira/browse/PIG-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024288#comment-16024288 ] Daniel Dai commented on PIG-5235: - +1 > Typecast with as-clause fails for tuple/bag with an empty schema > > > Key: PIG-5235 > URL: https://issues.apache.org/jira/browse/PIG-5235 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5235-v01.patch > > > Following script fails with trunk(0.17). > {code} > a = load 'test.txt' as (mytuple:tuple (), gpa:float); > b = foreach a generate mytuple as (mytuple2:(name:int, age:double)); > store b into '/tmp/deleteme'; > {code} > 2017-05-16 09:52:31,280 \[main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 2999: Unexpected internal error. null > (This is a continuation from the as-clause fix at PIG-2315 and follow up jira > PIG-4933) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4924) Translate failures.maxpercent MR setting to Tez
[ https://issues.apache.org/jira/browse/PIG-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024195#comment-16024195 ] Daniel Dai commented on PIG-4924: - +1 > Translate failures.maxpercent MR setting to Tez > --- > > Key: PIG-4924 > URL: https://issues.apache.org/jira/browse/PIG-4924 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.17.0 > > Attachments: PIG-4924-1.patch > > > TEZ-3271 adds support equivalent to mapreduce.map.failures.maxpercent and > mapreduce.reduce.failures.maxpercent. We need to translate that per vertex. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4662) New optimizer rule: filter nulls before inner joins
[ https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024192#comment-16024192 ] Daniel Dai commented on PIG-4662: - I prefer to do it in optimizer, it seems to be more clear. > New optimizer rule: filter nulls before inner joins > --- > > Key: PIG-4662 > URL: https://issues.apache.org/jira/browse/PIG-4662 > Project: Pig > Issue Type: Improvement >Reporter: Ido Hadanny >Assignee: Satish Subhashrao Saley >Priority: Minor > Labels: Performance > Fix For: 0.18.0 > > > As stated in the docs, rewriting an inner join and filtering nulls from > inputs can be a big performance gain: > http://pig.apache.org/docs/r0.14.0/perf.html#nulls > We would like to add an optimizer rule which detects inner joins, and filters > nulls in all inputs: > A = filter A by t is not null; > B = filter B by x is not null; > C = join A by t, B by x; > see also: > http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4914) Add testcase for join with special characters in chararray
[ https://issues.apache.org/jira/browse/PIG-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024182#comment-16024182 ] Daniel Dai commented on PIG-4914: - This is only for tuple join key, right? String join key of utf8 characters is already covered in PIG-4358. > Add testcase for join with special characters in chararray > -- > > Key: PIG-4914 > URL: https://issues.apache.org/jira/browse/PIG-4914 > Project: Pig > Issue Type: Improvement >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy > Fix For: 0.18.0 > > > This jira is to add testcase for PIG-4821. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5185) Job name show "DefaultJobName" when running a Python script
[ https://issues.apache.org/jira/browse/PIG-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5185: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Rohini for review! > Job name show "DefaultJobName" when running a Python script > --- > > Key: PIG-5185 > URL: https://issues.apache.org/jira/browse/PIG-5185 > Project: Pig > Issue Type: Bug > Components: impl >Reporter: Daniel Dai >Assignee: Daniel Dai > Fix For: 0.17.0 > > Attachments: PIG-5185-1.patch, PIG-5185-2.patch > > > Run a python script with Pig, Hadoop WebUI show "DefaultJobName" instead of > script name. We shall use script name, the same semantic for regular Pig > script. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5222) Fix Junit Deprecations
[ https://issues.apache.org/jira/browse/PIG-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5222: Attachment: PIG-5222-fixtest.patch TestEvalPipelineLocal.testFunctionInsideFunction, TestEvalPipelineLocal.testBagFunctionWithFlattening andTestEvalPipelineLocal.testMapLookup failed with the patch. Attach fix. > Fix Junit Deprecations > -- > > Key: PIG-5222 > URL: https://issues.apache.org/jira/browse/PIG-5222 > Project: Pig > Issue Type: Improvement >Reporter: William Watson >Assignee: William Watson > Fix For: 0.17.0 > > Attachments: fix-junit-deprecations.patch, PIG-5222-fixtest.patch > > > junit.framework.Assert is deprecated in favor of org.junit.Assert. Warnings > pop up all over the tests -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5221) More fs.default.name deprecation warnings
[ https://issues.apache.org/jira/browse/PIG-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5221: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Switch the order of checking should be Ok. Patch committed to trunk. Thanks William! > More fs.default.name deprecation warnings > - > > Key: PIG-5221 > URL: https://issues.apache.org/jira/browse/PIG-5221 > Project: Pig > Issue Type: Improvement >Reporter: William Watson >Assignee: William Watson > Fix For: 0.17.0 > > Attachments: remove-fs-default-name-deprecations.patch > > > There are more places in the code, especially in the tests where we're still > using fs.default.name instead of fs.defaultFS and we get deprecation warnings > because of it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5222) Fix Junit Deprecations
[ https://issues.apache.org/jira/browse/PIG-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5222: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Patch committed to trunk. Thanks William! > Fix Junit Deprecations > -- > > Key: PIG-5222 > URL: https://issues.apache.org/jira/browse/PIG-5222 > Project: Pig > Issue Type: Improvement >Reporter: William Watson >Assignee: William Watson > Fix For: 0.17.0 > > Attachments: fix-junit-deprecations.patch > > > junit.framework.Assert is deprecated in favor of org.junit.Assert. Warnings > pop up all over the tests -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage
[ https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969841#comment-15969841 ] Daniel Dai commented on PIG-5224: - If this is a problem of extra foreach after LOCogroup, why not adding the same check in ColumnPruneVisitor.visit(LOCogroup cg) instead of addForEachIfNecessary? > Extra foreach from ColumnPrune preventing Accumulator usage > --- > > Key: PIG-5224 > URL: https://issues.apache.org/jira/browse/PIG-5224 > Project: Pig > Issue Type: Improvement >Reporter: Koji Noguchi >Assignee: Koji Noguchi > Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch > > > {code} > A = load 'input' as (id:int, fruit); > B = foreach A generate id; -- to enable columnprune > C = group B by id; > D = foreach C { > o = order B by id; > generate org.apache.pig.test.utils.AccumulatorBagCount(o); > } > STORE D into ... > {code} > Pig fails to use Accumulator interface for this UDF. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing
[ https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5223: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Both tests pass. Thanks for additional test. Patch committed to trunk. Thanks Jin! > TestLimitVariable.testNestedLimitVariable1 and > TestSecondarySortMR.testNestedLimitedSort failing > - > > Key: PIG-5223 > URL: https://issues.apache.org/jira/browse/PIG-5223 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5223-1.patch, PIG-5223-2.patch > > > TestLimitVariable.testNestedLimitVariable1 > {quote} > Comparing actual and expected results. expected:<\[(1,11), (2,3), (3,10), > (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]> > {quote} > TestSecondarySortMR.testNestedLimitedSort > {quote} > Error during parsing. mismatched input 'in' expecting > INTO > {quote} > Latter is probably a simple syntax error. Former looks serious. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing
[ https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969265#comment-15969265 ] Daniel Dai commented on PIG-5223: - Both conditions should be met in limit by variable case. It should not happen if mLimit==-1 and mlimitPlan==null. > TestLimitVariable.testNestedLimitVariable1 and > TestSecondarySortMR.testNestedLimitedSort failing > - > > Key: PIG-5223 > URL: https://issues.apache.org/jira/browse/PIG-5223 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5223-1.patch > > > TestLimitVariable.testNestedLimitVariable1 > {quote} > Comparing actual and expected results. expected:<\[(1,11), (2,3), (3,10), > (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]> > {quote} > TestSecondarySortMR.testNestedLimitedSort > {quote} > Error during parsing. mismatched input 'in' expecting > INTO > {quote} > Latter is probably a simple syntax error. Former looks serious. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing
[ https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969250#comment-15969250 ] Daniel Dai commented on PIG-5223: - mlimitplan is an expression to calculate limit variable. That is, the number of limited rows is determined at runtime by evaluation expressionPlan (the physical plan equivalence of mlimitplan). If (mLimit == -1 && mlimitPlan != null), that means limiting by variable not constant. It is basically the same as your (mLimit = -1) condition, but add another condition for assurance. > TestLimitVariable.testNestedLimitVariable1 and > TestSecondarySortMR.testNestedLimitedSort failing > - > > Key: PIG-5223 > URL: https://issues.apache.org/jira/browse/PIG-5223 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5223-1.patch > > > TestLimitVariable.testNestedLimitVariable1 > {quote} > Comparing actual and expected results. expected:<\[(1,11), (2,3), (3,10), > (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]> > {quote} > TestSecondarySortMR.testNestedLimitedSort > {quote} > Error during parsing. mismatched input 'in' expecting > INTO > {quote} > Latter is probably a simple syntax error. Former looks serious. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing
[ https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968670#comment-15968670 ] Daniel Dai edited comment on PIG-5223 at 4/14/17 5:54 AM: -- Yes, you can use the condition (mLimit == -1 && mlimitPlan != null), and if true, disable the optimization. It is possible to push the limitPlan to LimitedSortedDataBag, but it is not a one line change and we can do it in followup. was (Author: daijy): Yes, you can use the condition (mLimit == -1 && mlimitPlan != null), and if true, disable the optimization. It is possible to push the limitPlan to LimitedSortedDataBag, but it is not a one line change and we can do it in followup. Please upload the patch. > TestLimitVariable.testNestedLimitVariable1 and > TestSecondarySortMR.testNestedLimitedSort failing > - > > Key: PIG-5223 > URL: https://issues.apache.org/jira/browse/PIG-5223 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5223-1.patch > > > TestLimitVariable.testNestedLimitVariable1 > {quote} > Comparing actual and expected results. expected:<\[(1,11), (2,3), (3,10), > (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]> > {quote} > TestSecondarySortMR.testNestedLimitedSort > {quote} > Error during parsing. mismatched input 'in' expecting > INTO > {quote} > Latter is probably a simple syntax error. Former looks serious. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing
[ https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968670#comment-15968670 ] Daniel Dai commented on PIG-5223: - Yes, you can use the condition (mLimit == -1 && mlimitPlan != null), and if true, disable the optimization. It is possible to push the limitPlan to LimitedSortedDataBag, but it is not a one line change and we can do it in followup. Please upload the patch. > TestLimitVariable.testNestedLimitVariable1 and > TestSecondarySortMR.testNestedLimitedSort failing > - > > Key: PIG-5223 > URL: https://issues.apache.org/jira/browse/PIG-5223 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5223-1.patch > > > TestLimitVariable.testNestedLimitVariable1 > {quote} > Comparing actual and expected results. expected:<\[(1,11), (2,3), (3,10), > (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]> > {quote} > TestSecondarySortMR.testNestedLimitedSort > {quote} > Error during parsing. mismatched input 'in' expecting > INTO > {quote} > Latter is probably a simple syntax error. Former looks serious. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967121#comment-15967121 ] Daniel Dai commented on PIG-5211: - I must make a mistake when rebasing the patch. Sure, go ahead, thanks Koji! > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, > PIG-5211-4.patch, PIG-5211-5.patch, pig-5211-testfix-postcommit.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5217) Pig Streaming over python multiprocessing
[ https://issues.apache.org/jira/browse/PIG-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966570#comment-15966570 ] Daniel Dai commented on PIG-5217: - Are you able to give an example? Pig shall run python as an external process and I don't see a reason why Pig shall disrupt multiprocessing. Do you have a sense why? > Pig Streaming over python multiprocessing > - > > Key: PIG-5217 > URL: https://issues.apache.org/jira/browse/PIG-5217 > Project: Pig > Issue Type: Bug > Components: internal-udfs >Affects Versions: 0.15.0 > Environment: python 2.7,pig 0.15.0,multi-core processor >Reporter: bharatpattani > > python multiprocessing is not working with pig streaming. > Following are the steps for that: > 1. Create python script with "multiprocessing" which can utilise at least two > cores of the processor. > 2. Create an pig script which will call python script which mentioned in > above step. > Please have a look at it and do need full for the same. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5219: Fix Version/s: 0.17.0 > IndexOutOfBoundsException when loading multiple directories with different > schemas using OrcStorage > --- > > Key: PIG-5219 > URL: https://issues.apache.org/jira/browse/PIG-5219 > Project: Pig > Issue Type: Bug >Affects Versions: 0.16.0 > Environment: Pig Version: 0.16.0 > OS: EMR 5.3.1 >Reporter: Omer Tal >Assignee: Daniel Dai > Fix For: 0.17.0 > > > Scenario: > # Data set based on two hours in the same day. In hour 00 the ORC file has 4 > columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e} > # Loading ORC files with the same schema (hour 00): > {code} > x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage(); > dump x; > {code} > Result: > {code} > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > {code} > # Loading ORC files with different schemas in the same directory: > {code} > x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage(); > dump x; > {code} > Result: > {code} > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > {code} > # Loading the whole day (both hour 00 and 02): > {code} > x = load 's3://orc_files/dt=2017-03-21' using OrcStorage(); > dump x; > {code} > Result: > {code} > 37332 [PigTezLauncher-0] INFO > org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: > status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 > Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, > vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, > diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: > Index: 4, Size: 4 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97) > at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966567#comment-15966567 ] Daniel Dai commented on PIG-5219: - Pig use the schema in the first ORC file as the schema for the relation. In general, Pig don't know what the schema should be if schema is different in different ORC file. For a solution, Pig shall not fail in the first place, and shall generate null instead. User shall cast it to right schema eventually: {code} x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage() as (a0, a1, a2, a3); {code} > IndexOutOfBoundsException when loading multiple directories with different > schemas using OrcStorage > --- > > Key: PIG-5219 > URL: https://issues.apache.org/jira/browse/PIG-5219 > Project: Pig > Issue Type: Bug >Affects Versions: 0.16.0 > Environment: Pig Version: 0.16.0 > OS: EMR 5.3.1 >Reporter: Omer Tal >Assignee: Daniel Dai > Fix For: 0.17.0 > > > Scenario: > # Data set based on two hours in the same day. In hour 00 the ORC file has 4 > columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e} > # Loading ORC files with the same schema (hour 00): > {code} > x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage(); > dump x; > {code} > Result: > {code} > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > {code} > # Loading ORC files with different schemas in the same directory: > {code} > x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage(); > dump x; > {code} > Result: > {code} > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > {code} > # Loading the whole day (both hour 00 and 02): > {code} > x = load 's3://orc_files/dt=2017-03-21' using OrcStorage(); > dump x; > {code} > Result: > {code} > 37332 [PigTezLauncher-0] INFO > org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: > status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 > Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, > vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, > diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: > Index: 4, Size: 4 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97) > at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at
[jira] [Assigned] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage
[ https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5219: --- Assignee: Daniel Dai > IndexOutOfBoundsException when loading multiple directories with different > schemas using OrcStorage > --- > > Key: PIG-5219 > URL: https://issues.apache.org/jira/browse/PIG-5219 > Project: Pig > Issue Type: Bug >Affects Versions: 0.16.0 > Environment: Pig Version: 0.16.0 > OS: EMR 5.3.1 >Reporter: Omer Tal >Assignee: Daniel Dai > > Scenario: > # Data set based on two hours in the same day. In hour 00 the ORC file has 4 > columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e} > # Loading ORC files with the same schema (hour 00): > {code} > x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage(); > dump x; > {code} > Result: > {code} > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > {code} > # Loading ORC files with different schemas in the same directory: > {code} > x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage(); > dump x; > {code} > Result: > {code} > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4,5) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > (1,2,3,4) > {code} > # Loading the whole day (both hour 00 and 02): > {code} > x = load 's3://orc_files/dt=2017-03-21' using OrcStorage(); > dump x; > {code} > Result: > {code} > 37332 [PigTezLauncher-0] INFO > org.apache.pig.backend.hadoop.executionengine.tez.TezJob - DAG Status: > status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 > Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, > vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, > diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, > diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( > failure ) : > attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: > Index: 4, Size: 4 > at java.util.ArrayList.rangeCheck(ArrayList.java:653) > at java.util.ArrayList.get(ArrayList.java:429) > at > org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97) > at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381) > at > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204) > at > org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140) > at > org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305) > at > org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376) > at > org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5216) Customizable Error Handling for Loaders in Pig
[ https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966502#comment-15966502 ] Daniel Dai commented on PIG-5216: - This is a very decent patch. Several comments: 1. I think you also intend to remove TestErrorHandlingStoreFunc.java. This can be done via "git rm xxx/TestErrorHandlingStoreFunc.java" before generating patch 2. {code} public static final String PIG_LOADS = "pig.inputs"; {code} We'd better change constant into "pig.loads", since the content changes. People might suspect something wrong if they find "pig.inputs" is different than expected. 3. {code} public static final String ERROR_HANDLER_COUNTER_GROUP = "storer_Error_Handler"; {code} Make it "Error_Handler" 4. Need some documentation, please refer to PIG-4719, include something similar to the changes in src/docs/src/documentation/content/xdocs/udf.xml. > Customizable Error Handling for Loaders in Pig > -- > > Key: PIG-5216 > URL: https://issues.apache.org/jira/browse/PIG-5216 > Project: Pig > Issue Type: Improvement >Reporter: Iris Zeng >Assignee: Iris Zeng > Fix For: 0.17.0 > > Attachments: PIG-5216-1.patch, PIG-5216-2.patch > > > Add Error Handling for Loaders in Pig, so that user can choose to allow > errors when load data, and set error numbers / rate > Ideas based on error handling on store func see > https://issues.apache.org/jira/browse/PIG-4704 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966292#comment-15966292 ] Daniel Dai commented on PIG-5211: - Also opened PIG-5220 to improve NestedLimitOptimizer. > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, > PIG-5211-4.patch, PIG-5211-5.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (PIG-5220) Improve NestedLimitOptimizer to handle general limit push up
Daniel Dai created PIG-5220: --- Summary: Improve NestedLimitOptimizer to handle general limit push up Key: PIG-5220 URL: https://issues.apache.org/jira/browse/PIG-5220 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.17.0 Currently, NestedLimitOptimizer only handles the case limit right after sort. In general, we shall push up limit recursively similar to LimitOptimizer. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5211: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Patch committed to trunk. Thanks [~jins]! > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, > PIG-5211-4.patch, PIG-5211-5.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5214) search any substring in the input string
[ https://issues.apache.org/jira/browse/PIG-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5214: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Patch committed to trunk. Thanks Yuxiang! > search any substring in the input string > > > Key: PIG-5214 > URL: https://issues.apache.org/jira/browse/PIG-5214 > Project: Pig > Issue Type: New Feature > Components: internal-udfs >Reporter: Yuxiang Wang >Assignee: Yuxiang Wang > Fix For: 0.17.0 > > Attachments: PIG-5214-1.patch, PIG-5214-2.patch > > > A new Pig UDF *STRING_SEARCH_ALL* that Implementing regex for searching > keyword(substring) in a line of string, and all matched substrings will be > stored as individual tuples in a bag, i.e.{code} output: ({(a),(b),(c)}){code} > Help us to find all regex matches, for example, we may use > *FLATTEN(STRING_SEARCH_ALL(string, regex))* to list all matches for an easier > view of output. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5216) Customizable Error Handling for Loaders in Pig
[ https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961987#comment-15961987 ] Daniel Dai commented on PIG-5216: - LoadFuncDecorator.java is not included in the patch. If you are using git, you need to use "git add" to add new files, commit your changes, and use "git show" to generate patch with new files. > Customizable Error Handling for Loaders in Pig > -- > > Key: PIG-5216 > URL: https://issues.apache.org/jira/browse/PIG-5216 > Project: Pig > Issue Type: Improvement >Reporter: Iris Zeng >Assignee: Iris Zeng > Fix For: 0.17.0 > > Attachments: PIG-5216-1.patch > > > Add Error Handling for Loaders in Pig, so that user can choose to allow > errors when load data, and set error numbers / rate > Ideas based on error handling on store func see > https://issues.apache.org/jira/browse/PIG-4704 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961983#comment-15961983 ] Daniel Dai commented on PIG-5211: - The code changes looks good now. We'd better add several more tests: 1. improve TestOptimizeNestedLimit to translate logical plan to physical plan, then MR plan, please refer to TestPlanGeneration.testStoreAlias for how to translate query into logicalplan/physical plan/MR plan 2. add a test to run the query with nested limit sort, to make sure the result is correct, please refer to TestEvalPipelineLocal for how to run a query and compare result 3. add a test to TestSecondarySort to make sure nested limited sort is not get optimized with SecondaryKeyOptimizer > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, > PIG-5211-4.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PIG-5216) Customizable Error Handling for Loaders in Pig
[ https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5216: --- Assignee: Iris Zeng > Customizable Error Handling for Loaders in Pig > -- > > Key: PIG-5216 > URL: https://issues.apache.org/jira/browse/PIG-5216 > Project: Pig > Issue Type: Improvement >Reporter: Iris Zeng >Assignee: Iris Zeng > Fix For: 0.17.0 > > > Add Error Handling for Loaders in Pig, so that user can choose to allow > errors when load data, and set error numbers / rate > Ideas based on error handling on store func see > https://issues.apache.org/jira/browse/PIG-4704 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5216) Customizable Error Handling for Loaders in Pig
[ https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5216: Fix Version/s: 0.17.0 > Customizable Error Handling for Loaders in Pig > -- > > Key: PIG-5216 > URL: https://issues.apache.org/jira/browse/PIG-5216 > Project: Pig > Issue Type: Improvement >Reporter: Iris Zeng >Assignee: Iris Zeng > Fix For: 0.17.0 > > > Add Error Handling for Loaders in Pig, so that user can choose to allow > errors when load data, and set error numbers / rate > Ideas based on error handling on store func see > https://issues.apache.org/jira/browse/PIG-4704 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961945#comment-15961945 ] Daniel Dai commented on PIG-5211: - Another comment, in OptimizeNestedLimitTransformer, when we iterator through innerPlan.getOperators(), we cannot assume the order of operators in the iterator, so the check (pred instanceof LOSort && op instanceof LOLimit) is not always valid. We can find the limit operator first, and then use currentPlan.getPredecessors(limit) to make sure the predecessor is LOSort. > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961941#comment-15961941 ] Daniel Dai commented on PIG-5211: - There are still two Java 1.8 only code, error message when compiling with Java 1.7: {code} [javac] /Users/daijy/pig2/src/org/apache/pig/data/LimitedSortedDataBag.java:61: error: no suitable constructor found for PriorityQueue(Comparator) [javac] this.priorityQ = new PriorityQueue(getReversedComparator(mComp)); [javac] /Users/daijy/pig2/src/org/apache/pig/data/LimitedSortedDataBag.java:281: error: local variable comp is accessed from within inner class; needs to be declared final [javac] return -comp.compare(o1, o2); {code} > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-3021) Split results missing records when there is null values in the column comparison
[ https://issues.apache.org/jira/browse/PIG-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961565#comment-15961565 ] Daniel Dai commented on PIG-3021: - [~cheolsoo], do you remember why you change POIsNull.java? If all tests pass without the change, I'd like to commit the patch as is. > Split results missing records when there is null values in the column > comparison > > > Key: PIG-3021 > URL: https://issues.apache.org/jira/browse/PIG-3021 > Project: Pig > Issue Type: Bug >Affects Versions: 0.10.0 >Reporter: Chang Luo >Assignee: Nian Ji > Attachments: PIG-3021-2.patch, PIG-3021-3.patch, PIG-3021-4.patch, > PIG-3021.patch > > > Suppose a(x, y) > split a into b if x==y, c otherwise; > One will expect the union of b and c will be a. However, if x or y is null, > the record won't appear in either b or c. > To workaround this, I have to change to the following: > split a into b if x is not null and y is not null and x==y, c otherwise; -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960409#comment-15960409 ] Daniel Dai commented on PIG-5211: - Thanks for the patch, pretty good actually. Several comments: 1. LimitedSortedDataBag shall not extends SortedDataBag, or even DefaultAbstractBag, since it does not use mContents and it does not handle spill. I'd rather to implement DataBag directly, and implements all methods of DataBag. It should be too hard since we don't need to deal with spill. 2. Comparator.reversed only valid in JDK 1.8. We need to make sure Pig compiles under JDK 1.7 as well 3. We need to add a test case not only make sure it uses limited LOSort, also need to make sure it translate to the right physical plan, and it runs and generate right result 4. I am fine NestedLimitOptimizer only deal with limit right after sort currently, we need to create a Jira to deal with operators in the middle though (push limit all the way up, similar to LimitOptimizer) > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch, PIG-5211-2.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5214) search any substring in the input string
[ https://issues.apache.org/jira/browse/PIG-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960357#comment-15960357 ] Daniel Dai commented on PIG-5214: - Thanks for the patch. Left some comments on review board. > search any substring in the input string > > > Key: PIG-5214 > URL: https://issues.apache.org/jira/browse/PIG-5214 > Project: Pig > Issue Type: New Feature > Components: internal-udfs >Reporter: Yuxiang Wang >Assignee: Yuxiang Wang > Fix For: 0.17.0 > > Attachments: PIG-5214-1.patch > > > A new Pig UDF *STRING_SEARCH_ALL* that Implementing regex for searching > keyword(substring) in a line of string, and all matched substrings will be > stored as individual tuples in a bag, i.e.{code} output: ({(a),(b),(c)}){code} > Help us to find all regex matches, for example, we may use > *FLATTEN(STRING_SEARCH_ALL(string, regex))* to list all matches for an easier > view of output. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching
[ https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5210: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) +1. Patch looks good. Committed to trunk. Thans Lili! > Option to print MR/Tez plan before launching > > > Key: PIG-5210 > URL: https://issues.apache.org/jira/browse/PIG-5210 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.17.0 >Reporter: Lili Yu >Assignee: Lili Yu > Fix For: 0.17.0 > > Attachments: PIG-5210-new.patch, screenshot MR plan.png, screenshot > Tez Plan.png > > > For pig script, users need to use {{pig -e "explain -script test.pig"}} to > print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to > explain the plan automatically. This option can help to print out MR/Tez > plan automatically before implementing MapReduce. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching
[ https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5210: Description: For pig script, users need to use {{pig -e "explain -script test.pig"}} to print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to explain the plan automatically. This option can help to print out MR/Tez plan automatically before implement of MapReduce. was: h5. Adding an option to print out MR/Tez plan before launching For pig script, users need to use {{pig -e "explain -script test.pig"}} to print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to explain the plan automatically. This option can help to print out MR/Tez plan automatically before implement of MapReduce. Steps: - Get clone of 0.17.0 version PIG by git pull - Set up Eclipse - Import Pig src to Eclipse, and set pig.print.exec.plan "true" in file _JobControlCompiler.java_,_TezJobCompiler.java_ before Mapreduce starts - Check for compiling {{ant}} - After building successful, Start remote debugger in Eclipse {{export PIG_OPTS="- agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"}} Or start to run pig only in terminal {{unset PIG_OPTS}} - Test cases: For MR engine {{-x local test.pig}}; For Tez engine {{-x tez_local test.pig}} - Get the plan and test results as expected > Option to print MR/Tez plan before launching > > > Key: PIG-5210 > URL: https://issues.apache.org/jira/browse/PIG-5210 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.17.0 >Reporter: Lili Yu >Assignee: Lili Yu > Fix For: 0.17.0 > > Attachments: PrintPlan.patch, screenshot-change.png, screenshot MR > plan.png, screenshot Tez Plan.png, test.pig > > > For pig script, users need to use {{pig -e "explain -script test.pig"}} to > print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to > explain the plan automatically. This option can help to print out MR/Tez > plan automatically before implement of MapReduce. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort
[ https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954743#comment-15954743 ] Daniel Dai commented on PIG-5211: - Looks pretty good so far. Need to fine tune NestedLimitOptimizer, existence of both LOLimit and LOSort is not enough, must make sure LOLimit is right after LOSort, or you can follow LimitOptimizer to push LOLimit all the way up, which is more sophisticated (I am not insisting this tough). Also SecondaryKeyOptimizer does not recognize limited nested sort currently, it is possible SecondaryKeyOptimizer optimize limited sort into MR/Tez secondary sort, thus the limit is lost. So we shall disable SecondaryKeyOptimizer if the nested sort is a limited sort in SecondaryKeyOptimizer. You can use the following script as the test case which SecondaryKeyOptimizer is get involved: {code} a = load 'studenttab10k' as (name:chararray, age:int, gpa:double); b = group a by name; c = foreach b { c1 = order a by age; c2 = limit c1 5; generate c2; } explain c; {code} > Optimize Nested Limited Sort > > > Key: PIG-5211 > URL: https://issues.apache.org/jira/browse/PIG-5211 > Project: Pig > Issue Type: Improvement >Reporter: Jin Sun >Assignee: Jin Sun > Fix For: 0.17.0 > > Attachments: PIG-5211-1.patch > > > Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig > stores all elements and sort them. It should use a priority queue to be more > efficient in space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5210) Option to print MR/Tez plan before launching
[ https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954222#comment-15954222 ] Daniel Dai commented on PIG-5210: - This will be useful. For python script, it will be hard to do explain and get execution plan. With this option, Pig will print MR/Tez plan on console so user will get idea what is the MR/Tez job doing. We need to put "pig.print.exec.plan" in PigConfiguration and it would be better to add a test case. > Option to print MR/Tez plan before launching > > > Key: PIG-5210 > URL: https://issues.apache.org/jira/browse/PIG-5210 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.17.0 >Reporter: Lili Yu >Assignee: Lili Yu > Fix For: 0.17.0 > > Attachments: PrintPlan.patch > > > Adding an option to print out MR/Tez plan before launching. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching
[ https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5210: Summary: Option to print MR/Tez plan before launching (was: Print MR/Tez plan before launching) > Option to print MR/Tez plan before launching > > > Key: PIG-5210 > URL: https://issues.apache.org/jira/browse/PIG-5210 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.17.0 >Reporter: Lili Yu >Assignee: Lili Yu > Fix For: 0.17.0 > > Attachments: PrintPlan.patch > > > Set pig.print.exec.plan "true" in > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > > and > src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobCompiler.java > Print out MR/Tez plan -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching
[ https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-5210: Description: Adding an option to print out MR/Tez plan before launching. (was: Set pig.print.exec.plan "true" in src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java and src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobCompiler.java Print out MR/Tez plan) > Option to print MR/Tez plan before launching > > > Key: PIG-5210 > URL: https://issues.apache.org/jira/browse/PIG-5210 > Project: Pig > Issue Type: Improvement >Affects Versions: 0.17.0 >Reporter: Lili Yu >Assignee: Lili Yu > Fix For: 0.17.0 > > Attachments: PrintPlan.patch > > > Adding an option to print out MR/Tez plan before launching. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5201) Null handling on FLATTEN
[ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954167#comment-15954167 ] Daniel Dai commented on PIG-5201: - Thanks for pointing out, yes, PIG-2537 is for tuple, for bag, I agree null bag should drop (same as empty bag), bag with null item should generating according to schema (same as PIG-2537), in case no schema, we can generate single null column. > Null handling on FLATTEN > > > Key: PIG-5201 > URL: https://issues.apache.org/jira/browse/PIG-5201 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5201-v00-testonly.patch > > > Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect > results. > Test code/script to follow. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-5201) Null handling on FLATTEN
[ https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952064#comment-15952064 ] Daniel Dai commented on PIG-5201: - This is PIG-2537, this is a bug we shall fix. > Null handling on FLATTEN > > > Key: PIG-5201 > URL: https://issues.apache.org/jira/browse/PIG-5201 > Project: Pig > Issue Type: Bug >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: pig-5201-v00-testonly.patch > > > Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect > results. > Test code/script to follow. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (PIG-5210) Print MR/Tez plan before launching
[ https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-5210: --- Assignee: Lili Yu > Print MR/Tez plan before launching > -- > > Key: PIG-5210 > URL: https://issues.apache.org/jira/browse/PIG-5210 > Project: Pig > Issue Type: Improvement >Reporter: Lili Yu >Assignee: Lili Yu > Fix For: 0.17.0 > > > Set pig.print.exec.plan "true" in > src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java > > and > src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobCompiler.java > Print out MR/Tez plan -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4677) Display failure information on stop on failure
[ https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951709#comment-15951709 ] Daniel Dai commented on PIG-4677: - +1 on PIG-4677-fixflakytest.patch. > Display failure information on stop on failure > -- > > Key: PIG-4677 > URL: https://issues.apache.org/jira/browse/PIG-4677 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Mit Desai >Assignee: Rohini Palaniswamy > Fix For: 0.17.0 > > Attachments: PIG-4677.2.patch, PIG-4677.3.patch, PIG-4677.4.patch, > PIG-4677-5.patch, PIG-4677-fixflakytest.patch, PIG-4677.patch > > > When stop on failure option is specified, pig abruptly exits without > displaying any job stats or failed job information which it usually does in > case of failures. > {code} > 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 9% complete > 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Running jobs are > [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756] > 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR > org.apache.pig.tools.grunt.Grunt - ERROR 6017: Job failed! > Hadoop Job IDs executed by Pig: > job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754 > <<< Invocation of Main class completed <<< > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-4677) Display failure information on stop on failure
[ https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948477#comment-15948477 ] Daniel Dai commented on PIG-4677: - +1 > Display failure information on stop on failure > -- > > Key: PIG-4677 > URL: https://issues.apache.org/jira/browse/PIG-4677 > Project: Pig > Issue Type: Bug >Affects Versions: 0.11.1 >Reporter: Mit Desai >Assignee: Rohini Palaniswamy > Fix For: 0.17.0 > > Attachments: PIG-4677.2.patch, PIG-4677.3.patch, PIG-4677.4.patch, > PIG-4677-5.patch, PIG-4677.patch > > > When stop on failure option is specified, pig abruptly exits without > displaying any job stats or failed job information which it usually does in > case of failures. > {code} > 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - 9% complete > 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher > - Running jobs are > [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756] > 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR > org.apache.pig.tools.grunt.Grunt - ERROR 6017: Job failed! > Hadoop Job IDs executed by Pig: > job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754 > <<< Invocation of Main class completed <<< > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (PIG-5190) ant docs issue by pig-5110
[ https://issues.apache.org/jira/browse/PIG-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-5190. - Resolution: Invalid > ant docs issue by pig-5110 > -- > > Key: PIG-5190 > URL: https://issues.apache.org/jira/browse/PIG-5190 > Project: Pig > Issue Type: Bug >Reporter: Jiang Song > Fix For: 0.17.0 > > > ant docs issue by pig-5110 -- This message was sent by Atlassian JIRA (v6.3.15#6346)