from:"Daniel Dai \(JIRA\)"

[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-05-16 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847097#comment-17847097
 ] 

Daniel Dai commented on PIG-5453:
-

+1

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Fix For: 0.19.0
>
> Attachments: pig-5453-v01.patch, pig-5453-v02.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PIG-5453) FLATTEN shifting fields incorrectly

2024-04-18 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PIG-5453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838865#comment-17838865
 ] 

Daniel Dai commented on PIG-5453:
-

+1

> FLATTEN shifting fields incorrectly
> ---
>
> Key: PIG-5453
> URL: https://issues.apache.org/jira/browse/PIG-5453
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5453-v01.patch
>
>
> Follow up from PIG-5201, PIG-5452.  
> When flatten-ed tuple has less or more fields than specified, entire fields 
> shift incorrectly. 
> Input
> {noformat}
> A       (a,b,c)
> B       (a,b,c)
> C       (a,b,c)
> Y       (a,b)
> Z       (a,b,c,d,e,f)
> E{noformat}
> Script
> {code:java}
> A = load 'input.txt' as (a1:chararray, a2:tuple());
> B = FOREACH A GENERATE a1, FLATTEN(a2) as 
> (b1:chararray,b2:chararray,b3:chararray), a1 as a4;
> dump B; {code}
> Incorrect results
> {noformat}
> (A,a,b,c,A)
> (B,a,b,c,B)
> (C,a,b,c,C)
> (Y,a,b,Y,)
> (Z,a,b,c,d)
> (EE){noformat}
> E is correct.  It's fixed as part of PIG-5201, PIG-5452.
> Y has shifted a4(Y) to the left incorrectly.  
> Should have been (Y,a,b,,Y)
> Z has dropped a4(Z) and overwrote the result with content of FLATTEN(a2).
> Should have been (Z,a,b,c,Z).
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PIG-5406) TestJoinLocal imports org.python.google.common.collect.Lists instead of org.google.common.collect.Lists

2022-08-17 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PIG-5406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581065#comment-17581065
 ] 

Daniel Dai commented on PIG-5406:
-

+1

> TestJoinLocal imports org.python.google.common.collect.Lists instead of 
> org.google.common.collect.Lists
> ---
>
> Key: PIG-5406
> URL: https://issues.apache.org/jira/browse/PIG-5406
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0, 0.16.0, 0.17.0
>Reporter: James Z.M. Gao
>Assignee: Rohini Palaniswamy
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5406-v1.patch
>
>
> [PIG-4366|https://github.com/apache/pig/commit/81abb6bd0adb6e101898d67b3c2a9e35e11ce993]
>  make PIG-2861 coming back.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PIG-5404) FLATTEN infers wrong datatype

2020-10-14 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PIG-5404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214281#comment-17214281
 ] 

Daniel Dai commented on PIG-5404:
-

+1

> FLATTEN infers wrong datatype
> -
>
> Key: PIG-5404
> URL: https://issues.apache.org/jira/browse/PIG-5404
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.17.0
>Reporter: Bruno Pusztahazi
>Assignee: Koji Noguchi
>Priority: Blocker
>  Labels: datatypes, flatten
> Attachments: pig-5404-v01.patch
>
>
> In version 0.12 (checked out branch-0.12) the following code works as 
> expected:
> With the following input file test.csv:
>  
> {code:java}
> John_5,18,4.0F
> Mary_6,19,3.8F
> Bill_7,20,3.9F
> Joe_8,18,3.8F{code}
>  
>  
> {code:java}
> A = LOAD 'test.csv' USING PigStorage (',') AS 
> (name:chararray,age:int,gpr:float);
> B = FOREACH A GENERATE FLATTEN(STRSPLIT(name,'_')) as 
> (name1:chararray,name2:chararray),age,gpr;
> DESCRIBE B;{code}
> and produces the following output:
>  
> {code:java}
> B: {name1: chararray,name2: chararray,age: int,gpr: float}
> {code}
> This is the expected output as the result of flatten is defined as chararrays.
>  
> When using version 0.17 (checkout out branch-0.17) the code produces:
> {code:java}
> B: {name1: bytearray,name2: bytearray,age: int,gpr: float}
> {code}
> This shows that somehow FLATTEN inferred wrong data types (bytearray instead 
> of chararay).
>  
> Using explicit casting as a workaround on 0.17:
> {code:java}
> B1 = FOREACH B GENERATE (chararray)name1,(chararray)name2,age,gpr;
> DESCRIBE B1;{code}
> produces
> {code:java}
> B1: {name1: chararray,name2: chararray,age: int,gpr: float}
> {code}
> This time with the expected data types.
>  
> The plan explain show some strange cast operators that are not really used 
> (or at least the actual data types are wrong):
> {code:java}
> #---
> # New Logical Plan:
> #---
> B: (Name: LOStore Schema: 
> name1#121:chararray,name2#122:chararray,age#105:int,gpr#106:float)
> |
> |---B: (Name: LOForEach Schema: 
> name1#121:chararray,name2#122:chararray,age#105:int,gpr#106:float)
>     |   |
>     |   (Name: LOGenerate[false,false,false,false] Schema: 
> name1#121:chararray,name2#122:chararray,age#105:int,gpr#106:float)ColumnPrune:OutputUids=[121,
>  105, 122, 106]ColumnPrune:InputUids=[121, 105, 122, 106]
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 121)
>     |   |   |
>     |   |   |---name1:(Name: Project Type: bytearray Uid: 121 Input: 0 
> Column: 0)
>     |   |   |
>     |   |   (Name: Cast Type: chararray Uid: 122)
>     |   |   |
>     |   |   |---name2:(Name: Project Type: bytearray Uid: 122 Input: 1 
> Column: 0)
>     |   |   |
>     |   |   age:(Name: Project Type: int Uid: 105 Input: 2 Column: 0)
>     |   |   |
>     |   |   gpr:(Name: Project Type: float Uid: 106 Input: 3 Column: 0)
>     |   |
>     |   |---(Name: LOInnerLoad[0] Schema: name1#121:bytearray)
>     |   |
>     |   |---(Name: LOInnerLoad[1] Schema: name2#122:bytearray)
>     |   |
>     |   |---(Name: LOInnerLoad[2] Schema: age#105:int)
>     |   |
>     |   |---(Name: LOInnerLoad[3] Schema: gpr#106:float)
>     |
>     |---B: (Name: LOForEach Schema: 
> name1#135:bytearray,name2#136:bytearray,age#105:int,gpr#106:float)
>         |   |
>         |   (Name: LOGenerate[true,false,false] Schema: 
> name1#135:bytearray,name2#136:bytearray,age#105:int,gpr#106:float)
>         |   |   |
>         |   |   (Name: UserFunc(org.apache.pig.builtin.STRSPLIT) Type: tuple 
> Uid: 132)
>         |   |   |
>         |   |   |---(Name: Cast Type: chararray Uid: 104)
>         |   |   |   |
>         |   |   |   |---name:(Name: Project Type: bytearray Uid: 104 Input: 0 
> Column: (*))
>         |   |   |
>         |   |   |---(Name: Constant Type: chararray Uid: 131)
>         |   |   |
>         |   |   (Name: Cast Type: int Uid: 105)
>         |   |   |
>         |   |   |---age:(Name: Project Type: bytearray Uid: 105 Input: 1 
> Column: (*))
>         |   |   |
>         |   |   (Name: Cast Type: float Uid: 106)
>         |   |   |
>         |   |   |---gpr:(Name: Project Type: bytearray Uid: 106 Input: 2 
> Column: (*))
>         |   |
>         |   |---(Name: LOInnerLoad[0] Schema: name#104:bytearray)
>         |   |
>         |   |---(Name: LOInnerLoad[1] Schema: age#105:bytearray)
>         |   |
>         |   |---(Name: LOInnerLoad[2] Schema: gpr#106:bytearray)
>         |
>         |---A: (Name: LOLoad Schema: 
> name#104:bytearray,age#105:bytearray,gpr#106:bytearray)RequiredFields:null
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PIG-5243) describe with typecast on as-clause shows the types before the typecasting

2020-10-14 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PIG-5243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214280#comment-17214280
 ] 

Daniel Dai commented on PIG-5243:
-

+1

> describe with typecast on as-clause shows the types before the typecasting
> --
>
> Key: PIG-5243
> URL: https://issues.apache.org/jira/browse/PIG-5243
> Project: Pig
>  Issue Type: Bug
>  Components: parser
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5243-v01.patch
>
>
> For code like 
> {code}
> a = load 'test.txt' as (mytuple:tuple (), gpa:float);
> b = foreach a generate mytuple as (mytuple2:(name:int, age:double));
> store b into '/tmp/deleteme';
> {code}
> {{describe b}} shows 
> {noformat}
> b: {mytuple2: (name: bytearray,age: bytearray)}
> {noformat}
> Execution wise, it is fine since there is an extra foreach typecasting the 
> above relation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PIG-5372) SAMPLE/RANDOM(udf) before skewed join failing with NPE

2019-01-02 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/PIG-5372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732373#comment-16732373
 ] 

Daniel Dai commented on PIG-5372:
-

Wow that's back in 2010 :). I think SkewedPartitioner.setConf is passing conf 
to MapRedUtil.loadPartitionFileFromLocalCache via PigMapReduce.sJobConf. This 
is no longer necessary as MapRedUtil.loadPartitionFileFromLocalCache takes 
mapConf parameter (in a later patch). We can change 
MapRedUtil.loadPartitionFileFromLocalCache to retrieve 
fs.file.impl/fs.hdfs.impl from mapConf. Then we don't need overwrite 
PigMapReduce.sJobConf in SkewedPartitioner.setConf.

> SAMPLE/RANDOM(udf) before skewed join failing with NPE
> --
>
> Key: PIG-5372
> URL: https://issues.apache.org/jira/browse/PIG-5372
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5372-v1.patch
>
>
> Sample short code like below
> {code}
> A = LOAD 'input.txt' AS (a1:int, a2:chararray, a3:int);
> B = LOAD 'input.txt' AS (b1:int, b2:chararray, b3:int);
> A2 = FOREACH A generate *, RANDOM() as randnum;
> D = join A2 by a1, B by b1 using 'skewed' parallel 2;
> store D into '$output';
> {code}
> Fails with NPE. 
> {noformat}
> 2018-12-12 16:06:04,860 [Dispatcher thread: Central] INFO  
> org.apache.tez.dag.history.HistoryEventHandler - 
> [HISTORY][DAG:dag_1544648742542_0001_1][Event:TASK_FINISHED]: 
> vertexName=scope-55, taskId=task_1544648742542_0001_1_02_00, 
> startTime=1544648745036, finishTime=1544648764857, timeTaken=19821, 
> status=KILLED, successfulAttemptID=null, diagnostics=TaskAttempt 0 failed, 
> info=[Error: Failure while running 
> task:org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing (Name: Local Rearrange[tuple]{int}(false) - scope-29 ->   
> scope-58 Operator Key: scope-29): 
> org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception 
> while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:315)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNextTuple(POLocalRearrange.java:287)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POLocalRearrangeTez.getNextTuple(POLocalRearrangeTez.java:131)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:420)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:282)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:337)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
> at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> Exception while executing [POUserFunc (Name: 
> POUserFunc(org.apache.pig.builtin.RANDOM)[double] - scope-40 Operator Key: 
> scope-40) children: null at []]: java.lang.NullPointerException
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:367)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:408)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:325)
> at 
>

[jira] [Commented] (PIG-5368) Braces without escaping in regexes throws error in recent perl versions

2018-12-10 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/PIG-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16715575#comment-16715575
 ] 

Daniel Dai commented on PIG-5368:
-

Also committed PIG-5368-1.addendum.patch.

> Braces without escaping in regexes throws error in recent perl versions
> ---
>
> Key: PIG-5368
> URL: https://issues.apache.org/jira/browse/PIG-5368
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5368-1.addendum.patch, PIG-5368-1.patch
>
>
>  
> |In perl v5.22, using a literal { in a regular expression was deprecated, and 
> will emit a warning if it isn't escaped: \{. In v5.26, this won't just warn, 
> it'll cause a syntax error.|
> Example: 
> [https://github.com/apache/pig/blob/e766b6bf29e610b6312f8447fc008bed6beb4090/test/e2e/pig/tests/cmdline.conf#L47]
>   
> {code}
> $ perl -e 'print "It matches\n" if "Hello World" =~ /World{abc}/'
> Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in 
> m/World\{ <-- HERE abc}/ at -e line 1.
> $ perl -e 'print "It matches\n" if "Hello World" =~ /World\{abc}/'
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-30 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705272#comment-16705272
 ] 

Daniel Dai commented on PIG-5370:
-

+1

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch, pig-5370-v2.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PIG-5370) Union onschema + columnprune dropping used fields

2018-11-28 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/PIG-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16702822#comment-16702822
 ] 

Daniel Dai commented on PIG-5370:
-

+1, sounds good to me.

> Union onschema + columnprune dropping used fields 
> --
>
> Key: PIG-5370
> URL: https://issues.apache.org/jira/browse/PIG-5370
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Major
> Attachments: pig-5370-v1.patch
>
>
> After PIG-5312, below query started failing.
> {code}
> A = load 'input.txt' as (a1:int, a2:chararray, a3:int);
> B = FOREACH (GROUP A by (a1,a2)) {
> A_FOREACH = FOREACH A GENERATE a2,a3;
> GENERATE A, FLATTEN(A_FOREACH) as (a2,a3);
> }
> C = load 'input2.txt' as (A:bag{tuple:(a1: int,a2: chararray,a3:int)},a2: 
> chararray,a3:int);
> D = UNION ONSCHEMA B, C;
> dump D;
> {code}
> {code:title=input1.txt}
> 1   a   3
> 2   b   4
> 2   c   5
> 1   a   6
> 2   b   7
> 1   c   8
> {code}
> {code:title=input2.txt}
> {(10,a0,30),(20,b0,40)} zzz 222
> {code}
> {noformat:title=Expected output}
> ({(10,a0,30),(20,b0,40)},zzz,222)
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}
> {noformat:title=Actual (incorrect) output}
> ({(10,a0,30),(20,b0,40)})ONLY 1 Field 
> ({(1,a,6),(1,a,3)},a,6)
> ({(1,a,6),(1,a,3)},a,3)
> ({(1,c,8)},c,8)
> ({(2,b,7),(2,b,4)},b,7)
> ({(2,b,7),(2,b,4)},b,4)
> ({(2,c,5)},c,5)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (PIG-5368) Braces without escaping in regexes throws error in recent perl versions

2018-11-20 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/PIG-5368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-5368.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.18.0

Patch committed to trunk. Thanks Laszlo!

> Braces without escaping in regexes throws error in recent perl versions
> ---
>
> Key: PIG-5368
> URL: https://issues.apache.org/jira/browse/PIG-5368
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5368-1.patch
>
>
>  
> |In perl v5.22, using a literal { in a regular expression was deprecated, and 
> will emit a warning if it isn't escaped: \{. In v5.26, this won't just warn, 
> it'll cause a syntax error.|
> Example: 
> [https://github.com/apache/pig/blob/e766b6bf29e610b6312f8447fc008bed6beb4090/test/e2e/pig/tests/cmdline.conf#L47]
>   
> {code}
> $ perl -e 'print "It matches\n" if "Hello World" =~ /World\{abc}/'
> Unescaped left brace in regex is illegal here in regex; marked by <-- HERE in 
> m/World\{ <-- HERE abc}/ at -e line 1.
> $ perl -e 'print "It matches\n" if "Hello World" =~ /World\\{abc}/'
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PIG-5366) Enable PigStreamingDepend to load from current directory in newer Perl versions

2018-10-26 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/PIG-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5366:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

+1. Patch committed to trunk. Thanks [~abstractdog]!

> Enable PigStreamingDepend to load from current directory in newer Perl 
> versions
> ---
>
> Key: PIG-5366
> URL: https://issues.apache.org/jira/browse/PIG-5366
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Fix For: 0.18.0
>
> Attachments: PIG-5366_1.patch
>
>
> A perl related issue found while testing streaming. In newer perl versions 
> (>5.26), current directory (".") is not included in @INC, so 
> PerlStreamingDepend may fail during "use PigStreamingModule;". A possible 
> solution is to let this module add current directory for itself to make it 
> more independent from the environment (current perl version).
> Test case was:
> {code}
> define CMD `perl PigStreamingDepend.pl - sio_5_1 sio_5_2` input(stdin) 
> output('sio_5_1', 'sio_5_2') ship('./libexec/PigStreamingDepend.pl', 
> './libexec/PigStreamingModule.pm');
> A = load '/user/hrt_qa/tests/data/singlefile/studenttab10k';
> B = stream A through CMD;
> store B into 
> '/user/hrt_qa/out/hrtqa-1539851229-streaming.conf-StreamingIO/StreamingIO_5.out';
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PIG-5366) Enable PigStreamingDepend to load from current directory in newer Perl versions

2018-10-20 Thread Daniel Dai (JIRA)



[ 
https://issues.apache.org/jira/browse/PIG-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16657989#comment-16657989
 ] 

Daniel Dai commented on PIG-5366:
-

[~abstractdog], assigned to you.

> Enable PigStreamingDepend to load from current directory in newer Perl 
> versions
> ---
>
> Key: PIG-5366
> URL: https://issues.apache.org/jira/browse/PIG-5366
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5366_1.patch
>
>
> A perl related issue found while testing streaming. In newer perl versions 
> (>5.26), current directory (".") is not included in @INC, so 
> PerlStreamingDepend may fail during "use PigStreamingModule;". A possible 
> solution is to let this module add current directory for itself to make it 
> more independent from the environment (current perl version).
> Test case was:
> {code}
> define CMD `perl PigStreamingDepend.pl - sio_5_1 sio_5_2` input(stdin) 
> output('sio_5_1', 'sio_5_2') ship('./libexec/PigStreamingDepend.pl', 
> './libexec/PigStreamingModule.pm');
> A = load '/user/hrt_qa/tests/data/singlefile/studenttab10k';
> B = stream A through CMD;
> store B into 
> '/user/hrt_qa/out/hrtqa-1539851229-streaming.conf-StreamingIO/StreamingIO_5.out';
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (PIG-5366) Enable PigStreamingDepend to load from current directory in newer Perl versions

2018-10-20 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/PIG-5366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5366:
---

Assignee: Laszlo Bodor

> Enable PigStreamingDepend to load from current directory in newer Perl 
> versions
> ---
>
> Key: PIG-5366
> URL: https://issues.apache.org/jira/browse/PIG-5366
> Project: Pig
>  Issue Type: Bug
>Reporter: Laszlo Bodor
>Assignee: Laszlo Bodor
>Priority: Major
> Attachments: PIG-5366_1.patch
>
>
> A perl related issue found while testing streaming. In newer perl versions 
> (>5.26), current directory (".") is not included in @INC, so 
> PerlStreamingDepend may fail during "use PigStreamingModule;". A possible 
> solution is to let this module add current directory for itself to make it 
> more independent from the environment (current perl version).
> Test case was:
> {code}
> define CMD `perl PigStreamingDepend.pl - sio_5_1 sio_5_2` input(stdin) 
> output('sio_5_1', 'sio_5_2') ship('./libexec/PigStreamingDepend.pl', 
> './libexec/PigStreamingModule.pm');
> A = load '/user/hrt_qa/tests/data/singlefile/studenttab10k';
> B = stream A through CMD;
> store B into 
> '/user/hrt_qa/out/hrtqa-1539851229-streaming.conf-StreamingIO/StreamingIO_5.out';
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (PIG-4373) Implement PIG-3861 in Tez

2018-07-06 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/PIG-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-4373:

Status: Patch Available  (was: Open)

> Implement PIG-3861 in Tez
> -
>
> Key: PIG-4373
> URL: https://issues.apache.org/jira/browse/PIG-4373
> Project: Pig
>  Issue Type: Improvement
>  Components: tez
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Daniel Dai
>Priority: Major
>  Labels: MissingFeature
> Fix For: 0.18.0
>
> Attachments: PIG-4373_1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Assigned] (PIG-4373) Implement PIG-3861 in Tez

2018-07-06 Thread Daniel Dai (JIRA)



 [ 
https://issues.apache.org/jira/browse/PIG-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-4373:
---

Assignee: Daniel Dai  (was: Rohini Palaniswamy)

> Implement PIG-3861 in Tez
> -
>
> Key: PIG-4373
> URL: https://issues.apache.org/jira/browse/PIG-4373
> Project: Pig
>  Issue Type: Improvement
>  Components: tez
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Daniel Dai
>Priority: Major
>  Labels: MissingFeature
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (PIG-5329) cwiki training links

2018-01-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-5329.
-
   Resolution: Fixed
 Assignee: Daniel Dai
Fix Version/s: site

Updated, thanks!

> cwiki training links
> 
>
> Key: PIG-5329
> URL: https://issues.apache.org/jira/browse/PIG-5329
> Project: Pig
>  Issue Type: Bug
>  Components: documentation
>Reporter: Csaba Skrabak
>Assignee: Daniel Dai
>Priority: Trivial
> Fix For: site
>
>
> Every single link on the page
> [https://cwiki.apache.org/confluence/display/PIG/Pig+Training]
> is broken.
> Google finds better training courses, e.g.:
> [https://hortonworks.com/tutorial/beginners-guide-to-apache-pig/]
> [https://www.tutorialspoint.com/apache_pig/index.htm]
> [https://cognitiveclass.ai/courses/introduction-to-pig/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (PIG-4608) FOREACH ... UPDATE

2018-01-23 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336593#comment-16336593
 ] 

Daniel Dai edited comment on PIG-4608 at 1/24/18 1:00 AM:
--

bq. a = FOREACH b UPDATE q AS q:int – This should be illegal, right? If the 
type is changed, an explicit modify of the value should occur
This should be valid, AS clause has the capacity to change types. UPDATE clause 
is evaluated before AS clause, so
a = FOREACH b UPDATE q WITH (int)q AS q:chararray;
Will result a chararray q.

bq. flattening a tuple into existing fields - does this make sense
This makes sense, it is a symmetry to the AS clause

I didn't see UPDATE/DROP in a single statement in the example, are we not going 
to support both in the same statement? I actually prefer those in the same 
statement, as I feel users usually think about adjusting all columns in the 
same time. How about APPEND? Actually when I think about DROP/APPEND, I feel we 
have to have INSERT as well to close the loop. But if adding INSERT, other 
syntax might be more proper, such as:
a = FOREACH b generate .., UPDATE a10 WITH 1 as new_a10, ..a20, 2 as 
a_20_plus_half, ..a30, a32.., UPDATE a40 WITH 2 as new_a40, 1 as a41;
Here:
Update: a10, a40 using UPDATE clause
Insert: a_20_plus_half
Drop: a31
Append: a41

In the original use case, it can be written as:
intermediate = foreach i generate .., 3 as f3, .., 6 as f6, .., 48 as f48, ..;

The idea is to make ".." syntax more flexible, skip prefix/suffix if can be 
inferred. Probably more natural to add support for INSERT with this, thus make 
the syntax complete. How's that sound?


was (Author: daijy):
bq. a = FOREACH b UPDATE q AS q:int – This should be illegal, right? If the 
type is changed, an explicit modify of the value should occur
This should be valid, AS clause has the capacity to change types. UPDATE clause 
is evaluated before AS clause, so
a = FOREACH b UPDATE q WITH (int)q AS q:chararray;
Will result a chararray q.

bq. flattening a tuple into existing fields - does this make sense
This makes sense, it is a symmetry to the AS clause

I didn't see UPDATE/DROP in a single statement in the example, are we not going 
to support both in the same statement? I actually prefer those in the same 
statement, as I feel users usually think about adjusting all columns in the 
same time. How about APPEND? Actually when I think about DROP/APPEND, I feel we 
have to have INSERT as well to close the loop. But if adding INSERT, other 
syntax might be more proper, such as:
a = FOREACH b generate .., UPDATE a10 WITH 1 as new_a10, ..a20, 2 as 
a_20_plus_half, ..a30, a32.., UPDATE a40 WITH 2 as new_a40, 1 as a41;
Here:
Update: a10, a40 using UPDATE clause
Insert: a_20_plus_half
Drop: a31
Append: a41

In the original use case, it can be written as:
intermediate = foreach i generate .., 3 as f3, .., 6 as f6, .. 48 as f48, ..;

The idea is to make ".." syntax more flexible, skip prefix/suffix if can be 
inferred. Probably more natural to add support for INSERT with this, thus make 
the syntax complete. How's that sound?

> FOREACH ... UPDATE
> --
>
> Key: PIG-4608
> URL: https://issues.apache.org/jira/browse/PIG-4608
> Project: Pig
>  Issue Type: New Feature
>Reporter: Haley Thrapp
>Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PIG-4608) FOREACH ... UPDATE

2018-01-23 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16336593#comment-16336593
 ] 

Daniel Dai commented on PIG-4608:
-

bq. a = FOREACH b UPDATE q AS q:int – This should be illegal, right? If the 
type is changed, an explicit modify of the value should occur
This should be valid, AS clause has the capacity to change types. UPDATE clause 
is evaluated before AS clause, so
a = FOREACH b UPDATE q WITH (int)q AS q:chararray;
Will result a chararray q.

bq. flattening a tuple into existing fields - does this make sense
This makes sense, it is a symmetry to the AS clause

I didn't see UPDATE/DROP in a single statement in the example, are we not going 
to support both in the same statement? I actually prefer those in the same 
statement, as I feel users usually think about adjusting all columns in the 
same time. How about APPEND? Actually when I think about DROP/APPEND, I feel we 
have to have INSERT as well to close the loop. But if adding INSERT, other 
syntax might be more proper, such as:
a = FOREACH b generate .., UPDATE a10 WITH 1 as new_a10, ..a20, 2 as 
a_20_plus_half, ..a30, a32.., UPDATE a40 WITH 2 as new_a40, 1 as a41;
Here:
Update: a10, a40 using UPDATE clause
Insert: a_20_plus_half
Drop: a31
Append: a41

In the original use case, it can be written as:
intermediate = foreach i generate .., 3 as f3, .., 6 as f6, .. 48 as f48, ..;

The idea is to make ".." syntax more flexible, skip prefix/suffix if can be 
inferred. Probably more natural to add support for INSERT with this, thus make 
the syntax complete. How's that sound?

> FOREACH ... UPDATE
> --
>
> Key: PIG-4608
> URL: https://issues.apache.org/jira/browse/PIG-4608
> Project: Pig
>  Issue Type: New Feature
>Reporter: Haley Thrapp
>Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PIG-4608) FOREACH ... UPDATE

2018-01-17 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16329539#comment-16329539
 ] 

Daniel Dai commented on PIG-4608:
-

"add" (or append?)/"update xxx as"/"drop" syntax sounds good to me. We also 
want to make sure it works with positional reference ($0, $1, etc). You might 
take a look PIG-3122 for keywords conflicts if applicable.

> FOREACH ... UPDATE
> --
>
> Key: PIG-4608
> URL: https://issues.apache.org/jira/browse/PIG-4608
> Project: Pig
>  Issue Type: New Feature
>Reporter: Haley Thrapp
>Priority: Major
>
> I would like to propose a new command in Pig, FOREACH...UPDATE.
> Syntactically, it would look much like FOREACH … GENERATE.
> Example:
> Input data:
> (1,2,3)
> (2,3,4)
> (3,4,5)
> -- Load the data
> three_numbers = LOAD 'input_data'
> USING PigStorage()
> AS (f1:int, f2:int, f3:int);
> -- Sum up the row
> updated = FOREACH three_numbers UPDATE
> 5 as f1,
> f1+f2 as new_sum
> ;
> Dump updated;
> (5,2,3,3)
> (5,3,4,5)
> (5,4,5,7)
> Fields to update must be specified by alias. Any fields in the UPDATE that do 
> not match an existing field will be appended to the end of the tuple.
> This command is particularly desirable in scripts that deal with a large 
> number of fields (in the 20-200 range). Often, we need to only make 
> modifications to a few fields. The FOREACH ... UPDATE statement, allows the 
> developer to focus on the actual logical changes instead of having to list 
> all of the fields that are also being passed through.
> My team has prototyped this with changes to FOREACH ... GENERATE. We believe 
> this can be done with changes to the parser and the creation of a new 
> LOUpdate. No physical plan changes should be needed because we will leverage 
> what LOGenerate does.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (PIG-5293) Suspicious code as missing `this' for a member

2017-08-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-5293.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.18.0

Patch committed to trunk. Thanks [~lifove]!

> Suspicious code as missing `this' for a member
> --
>
> Key: PIG-5293
> URL: https://issues.apache.org/jira/browse/PIG-5293
> Project: Pig
>  Issue Type: Bug
>Reporter: JC
>Assignee: JC
> Fix For: 0.18.0
>
>
> Hi
> In a recent github mirror, I've found suspicious code.
> Branch: trunk
> Path: src/org/apache/pig/pen/util/ExampleTuple.java
> {code:java}
> ...
>  39 Tuple t = null;
> ...
> 110 @Override
> 111 public void reference(Tuple t) {
> 112 t.reference(t);
> 113 }
> {code}
> In Line 112, `t.reference' should be `this.t.reference'? This might be just a 
> trivial thing as the class name as ExampleTuple. But I wanted to report just 
> in case.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5191) Pig HBase 2.0.0 support

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137884#comment-16137884
 ] 

Daniel Dai commented on PIG-5191:
-

Looks good. Can you also try if bin/pig would register all dependent hbase jars 
automatically in hbase2? ie, no need to manually register jars when using 
HBaseStorage. We have done this for hbase1.

> Pig HBase 2.0.0 support
> ---
>
> Key: PIG-5191
> URL: https://issues.apache.org/jira/browse/PIG-5191
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5191_1.patch
>
>
> Pig doesn't support HBase 2.0.0. Since the new HBase API introduces several 
> API changes, we should find a way to support both 1.x and 2.x HBase API.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (PIG-5293) Suspicious code as missing `this' for a member

2017-08-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5293:
---

Assignee: JC

> Suspicious code as missing `this' for a member
> --
>
> Key: PIG-5293
> URL: https://issues.apache.org/jira/browse/PIG-5293
> Project: Pig
>  Issue Type: Bug
>Reporter: JC
>Assignee: JC
>
> Hi
> In a recent github mirror, I've found suspicious code.
> Branch: trunk
> Path: src/org/apache/pig/pen/util/ExampleTuple.java
> {code:java}
> ...
>  39 Tuple t = null;
> ...
> 110 @Override
> 111 public void reference(Tuple t) {
> 112 t.reference(t);
> 113 }
> {code}
> In Line 112, `t.reference' should be `this.t.reference'? This might be just a 
> trivial thing as the class name as ExampleTuple. But I wanted to report just 
> in case.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5293) Suspicious code as missing `this' for a member

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137880#comment-16137880
 ] 

Daniel Dai commented on PIG-5293:
-

That sounds valid. Can you upload a patch?

> Suspicious code as missing `this' for a member
> --
>
> Key: PIG-5293
> URL: https://issues.apache.org/jira/browse/PIG-5293
> Project: Pig
>  Issue Type: Bug
>Reporter: JC
>
> Hi
> In a recent github mirror, I've found suspicious code.
> Branch: trunk
> Path: src/org/apache/pig/pen/util/ExampleTuple.java
> {code:java}
> ...
>  39 Tuple t = null;
> ...
> 110 @Override
> 111 public void reference(Tuple t) {
> 112 t.reference(t);
> 113 }
> {code}
> In Line 112, `t.reference' should be `this.t.reference'? This might be just a 
> trivial thing as the class name as ExampleTuple. But I wanted to report just 
> in case.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5289) update .eclipse.templates/.classpath with latest jars

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137875#comment-16137875
 ] 

Daniel Dai commented on PIG-5289:
-

I don't think .eclipse.templates is used in eclipse-files. We already use ivy 
cache to generate .classpath file in PIG-2282.

> update .eclipse.templates/.classpath with latest jars
> -
>
> Key: PIG-5289
> URL: https://issues.apache.org/jira/browse/PIG-5289
> Project: Pig
>  Issue Type: Bug
>  Components: build
>Affects Versions: 0.17.0
>Reporter: Artem Ervits
>Assignee: Artem Ervits
> Fix For: trunk
>
>
> The file still references hadoop 0.20, zk 3.3.3, etc. We have to fix it 
> sometime to work with newer versions of hadoop and add Tez and Spark. Instead 
> of having a hardcoded file that will go outdated often as versions are 
> incremented would be better to have a ant target that generates the file 
> based on dependencies in build/ivy/lib



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5286) Run verify_pig in e2e with old version of Pig

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137870#comment-16137870
 ] 

Daniel Dai commented on PIG-5286:
-

Sounds good to me. Running verify_pig on old Pig is more natural.

> Run verify_pig in e2e with old version of Pig
> -
>
> Key: PIG-5286
> URL: https://issues.apache.org/jira/browse/PIG-5286
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
> Attachments: PIG-5286-1.patch
>
>
>  Currently verify_pig runs a different equivalent script as the testcase but 
> runs with the same version of Pig. Ran into a issue where a test passed when 
> a bug was introduced and benchmark files were not present. The newly 
> generated benchmarks were also wrong. Caught the failure when running again 
> pointing to previously generated benchmarks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5271) StackOverflowError when compiling in Tez mode (with union and replicated join)

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137589#comment-16137589
 ] 

Daniel Dai commented on PIG-5271:
-

Looks good to me. [~rohini], do you want a second look?

> StackOverflowError when compiling in Tez mode (with union and replicated join)
> --
>
> Key: PIG-5271
> URL: https://issues.apache.org/jira/browse/PIG-5271
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5271-v01.patch, pig-5271-v02.patch
>
>
> Sample script
> {code}
> a4 = LOAD 'studentnulltab10k' as (name, age:int, gpa:float);
> a4_1 = filter a4 by gpa is null or gpa >= 3.9;
> a4_2 = filter a4 by gpa < 1;
> b4 = union a4_1, a4_2;
> b4_1 = filter b4 by age < 30;
> b4_2 = foreach b4 generate name, age, FLOOR(gpa) as gpa;
> c4 = load 'voternulltab10k' as (name, age, registration, contributions);
> d4 = join b4_2 by name, c4 by name using 'replicated';
> e4 = foreach d4 generate b4_2::name as name, b4_2::age as age, gpa, 
> registration, contributions;
> f4 = order e4 by name, age DESC;
> store f4 into 'tmp_table_4' ;
> a5_1 = filter a4 by gpa is null or gpa <= 3.9;
> a5_2 = filter a4 by gpa < 2;
> b5 = union a5_1, a5_2;
> d5 = join c4 by name, b5 by name using 'replicated';
> store d5 into 'tmp_table_5' ;
> {code}
> This script fails to compile with StackOverflowError.
> {noformat}
> at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323)
> Pig Stack Trace
> ---
> ERROR 2998: Unhandled internal error. null
> java.lang.StackOverflowError
> at java.lang.reflect.Constructor.newInstance(Constructor.java:415)
> at java.lang.Class.newInstance(Class.java:442)
> at org.apache.pig.impl.util.Utils.mergeCollection(Utils.java:490)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:101)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.doAllPredecessors(DependencyOrderWalker.java:105)
> ...
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Comment Edited] (PIG-5272) BagToString Output Schema

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137270#comment-16137270
 ] 

Daniel Dai edited comment on PIG-5272 at 8/22/17 7:41 PM:
--

Are you saying your data does not match your declared schema? If you are not 
sure about the bag inner schema, you shall leave it empty by just declaring it 
as \{()\}, which means this is a bag with unknown inner schema. I see 
BagToString does have an issue, it does not deal with unknown inner schema. If 
that's the issue you are trying to fix, you are welcome to submit a patch.


was (Author: daijy):
Are you saying your data does not match your declared schema? If you are not 
sure about the bag inner schema, you shall leave it empty by just declaring it 
as {()}, which means this is a bag with unknown inner schema. I see BagToString 
does have an issue, it does not deal with unknown inner schema. If that's the 
issue you are trying to fix, you are welcome to submit a patch.

> BagToString Output Schema
> -
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
>  Issue Type: Improvement
>Reporter: Joshua Juen
>Priority: Minor
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5272) BagToString Output Schema

2017-08-22 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16137270#comment-16137270
 ] 

Daniel Dai commented on PIG-5272:
-

Are you saying your data does not match your declared schema? If you are not 
sure about the bag inner schema, you shall leave it empty by just declaring it 
as {()}, which means this is a bag with unknown inner schema. I see BagToString 
does have an issue, it does not deal with unknown inner schema. If that's the 
issue you are trying to fix, you are welcome to submit a patch.

> BagToString Output Schema
> -
>
> Key: PIG-5272
> URL: https://issues.apache.org/jira/browse/PIG-5272
> Project: Pig
>  Issue Type: Improvement
>Reporter: Joshua Juen
>Priority: Minor
>
> The output schema from BagToTuple is nonsensical causing problems using the 
> tuple later in the same script. 
> For example: Given a bag: { data:chararray }, calling BagToTuple yields the 
> schema: ( data:chararray )
> But, this makes no sense since if the above bag contains: {data1, data2, 
> data3} entries, the output tuple from BagToTuple will be:
> (data1:chararray, data2:chararray, data3:chararray) != (data:chararray),the 
> declared output schema from the UDF.
> Unfortunately, the schema of the tuple cannot be known during the initial 
> validation phase. Thus, I believe the output schema from the UDF should be 
> modified to be type tuple without the number of fields being fixed to the 
> number of columns in the input bag. 
> Under the current way, the elements in the tuple cannot be accessed in the 
> script after calling BagToTuple without getting an incompatible type error. 
> We have modified the UDF in our internal UDF jars to work around the issue. 
> Let me know if this sounds reasonable and I can generate the patch.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage

2017-08-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5268:

   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.18.0
   Status: Resolved  (was: Patch Available)

+1.

I don't think there's anything wrong to cleanup code especially for new 
contributors.

Patch committed to trunk. Thanks Beluga!

> Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
> 
>
> Key: PIG-5268
> URL: https://issues.apache.org/jira/browse/PIG-5268
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.17.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Fix For: 0.18.0
>
> Attachments: PIG-5268.1.patch, PIG-5268.2.patch
>
>
> # Optimize for case where {{asCollection}} is empty
> # Tidy up



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (PIG-5268) Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage

2017-08-22 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5268:
---

Assignee: BELUGA BEHR

> Review of org.apache.pig.backend.hadoop.datastorage.HDataStorage
> 
>
> Key: PIG-5268
> URL: https://issues.apache.org/jira/browse/PIG-5268
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.17.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: PIG-5268.1.patch, PIG-5268.2.patch
>
>
> # Optimize for case where {{asCollection}} is empty
> # Tidy up



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5201) Null handling on FLATTEN

2017-08-09 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121076#comment-16121076
 ] 

Daniel Dai commented on PIG-5201:
-

What's your idea for the column padding?

> Null handling on FLATTEN
> 
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, 
> pig-5201-v02.patch, pig-5201-v03.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5256) Bytecode generation for POFilter and POForeach

2017-08-09 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121068#comment-16121068
 ] 

Daniel Dai commented on PIG-5256:
-

Haven't checked the code line by line, but the overall approach looks good to 
me. The patch flattened expression tree and nested plan, so we can avoid 
virtual function call, and have a cleaner solution for multi-time evaluation 
issue. This can be extended to flatten the whole operator tree in the future 
(whole stage codegen).

> Bytecode generation for POFilter and POForeach
> --
>
> Key: PIG-5256
> URL: https://issues.apache.org/jira/browse/PIG-5256
> Project: Pig
>  Issue Type: Sub-task
>  Components: impl
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5201) Null handling on FLATTEN

2017-08-09 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120551#comment-16120551
 ] 

Daniel Dai commented on PIG-5201:
-

It is equivalent to flatten a scalar. That's sounds fine. How about columns? 
Shall we produce the same number of nulls columns according to schema? It might 
be the same as PIG-2537.

> Null handling on FLATTEN
> 
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, 
> pig-5201-v02.patch, pig-5201-v03.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5254) Hit Ctrl-D to quit grunt shell fail

2017-08-07 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5254:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and 0.17 branch. Thanks Weijun for contributing!

> Hit Ctrl-D to quit grunt shell fail
> ---
>
> Key: PIG-5254
> URL: https://issues.apache.org/jira/browse/PIG-5254
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.18.0, 0.17.1
>Reporter: Daniel Dai
>Assignee: Weijun Qian
> Fix For: 0.18.0, 0.17.1
>
> Attachments: PIG-5254.patch
>
>
> Exception:
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.tools.grunt.ConsoleReaderInputStream$ConsoleLineInputStream.read(ConsoleReaderInputStream.java:107)
> at java.io.InputStream.read(InputStream.java:170)
> at java.io.SequenceInputStream.read(SequenceInputStream.java:207)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.read1(BufferedReader.java:212)
> at java.io.BufferedReader.read(BufferedReader.java:286)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.FillBuff(JavaCharStream.java:143)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.ReadByte(JavaCharStream.java:171)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.readChar(JavaCharStream.java:274)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.BeginToken(JavaCharStream.java:193)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3215)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1511)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
> at org.apache.pig.Main.run(Main.java:564)
> at org.apache.pig.Main.main(Main.java:175)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5282) Upgade to Java 8

2017-08-02 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111280#comment-16111280
 ] 

Daniel Dai commented on PIG-5282:
-

I don't have problem for that. Hive already move to JDK 8 and we can follow.

> Upgade to Java 8
> 
>
> Key: PIG-5282
> URL: https://issues.apache.org/jira/browse/PIG-5282
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
> Fix For: 0.18.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Resolved] (PIG-5270) Typo in Pig Logging

2017-07-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-5270.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to trunk. Thank for fixing typo.

> Typo in Pig Logging
> ---
>
> Key: PIG-5270
> URL: https://issues.apache.org/jira/browse/PIG-5270
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0
> Environment: All
>Reporter: Andrew Hutton
>Assignee: Andrew Hutton
>Priority: Minor
>  Labels: easyfix, patch
> Fix For: 0.18.0
>
> Attachments: PIG-5270.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the log output of the internalCopyAllGeneratedToDistributedCache() method 
> in pig/data/SchemaTupleFrontend.java the word "cache" is misspelled as 
> "cacche". According to another issue, this was already addressed and resolved 
> in 2013, however the issue persists in the latest releases.
> Here is a link to the previous issue: 
> https://issues.apache.org/jira/browse/PIG-3432
> I also issued a pull request to the Github mirror: 
> https://github.com/apache/pig/pull/30



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5270) Typo in Pig Logging

2017-07-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5270:

Fix Version/s: (was: trunk)
   (was: 0.17.1)
   (was: 0.16.1)
   (was: 0.17.0)
   (was: 0.15.1)
   (was: 0.16.0)
   (was: 0.15.0)
   (was: 0.14.0)
   (was: 0.13.0)

> Typo in Pig Logging
> ---
>
> Key: PIG-5270
> URL: https://issues.apache.org/jira/browse/PIG-5270
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0
> Environment: All
>Reporter: Andrew Hutton
>Assignee: Andrew Hutton
>Priority: Minor
>  Labels: easyfix, patch
> Fix For: 0.18.0
>
> Attachments: PIG-5270.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the log output of the internalCopyAllGeneratedToDistributedCache() method 
> in pig/data/SchemaTupleFrontend.java the word "cache" is misspelled as 
> "cacche". According to another issue, this was already addressed and resolved 
> in 2013, however the issue persists in the latest releases.
> Here is a link to the previous issue: 
> https://issues.apache.org/jira/browse/PIG-3432
> I also issued a pull request to the Github mirror: 
> https://github.com/apache/pig/pull/30



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Assigned] (PIG-5270) Typo in Pig Logging

2017-07-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5270:
---

Assignee: Andrew Hutton

> Typo in Pig Logging
> ---
>
> Key: PIG-5270
> URL: https://issues.apache.org/jira/browse/PIG-5270
> Project: Pig
>  Issue Type: Bug
>  Components: data
>Affects Versions: 0.13.0, 0.14.0, 0.15.0, 0.16.0, 0.17.0
> Environment: All
>Reporter: Andrew Hutton
>Assignee: Andrew Hutton
>Priority: Minor
>  Labels: easyfix, patch
> Fix For: 0.18.0
>
> Attachments: PIG-5270.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> In the log output of the internalCopyAllGeneratedToDistributedCache() method 
> in pig/data/SchemaTupleFrontend.java the word "cache" is misspelled as 
> "cacche". According to another issue, this was already addressed and resolved 
> in 2013, however the issue persists in the latest releases.
> Here is a link to the previous issue: 
> https://issues.apache.org/jira/browse/PIG-3432
> I also issued a pull request to the Github mirror: 
> https://github.com/apache/pig/pull/30



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-4767) Partition filter not pushed down when filter clause references variable from another load path

2017-07-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084758#comment-16084758
 ] 

Daniel Dai commented on PIG-4767:
-

That's right, PartitionFilterOptimizer and PredicatePushdownOptimizer does not 
push filter up. The problem PIG-1669 try to solve does not exist. +1.

> Partition filter not pushed down when filter clause references variable from 
> another load path
> --
>
> Key: PIG-4767
> URL: https://issues.apache.org/jira/browse/PIG-4767
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
>Reporter: Anthony Hsu
>Assignee: Koji Noguchi
> Fix For: 0.18.0
>
> Attachments: pig-4767-v01.patch
>
>
> To reproduce:
> {noformat:title=test.pig}
> a = load 'a.txt';
> a_group = group a all;
> a_count = foreach a_group generate COUNT(a) as count;
> b = load 'mytable' using org.apache.hcatalog.pig.HCatLoader();
> b = filter b by datepartition == '2015-09-01-00' and foo == a_count.count;
> dump b;
> {noformat}
> The above query ends up reading all the table partitions. If you remove the 
> {{foo == a_count.count}} clause or replace {{a_count.count}} with a constant, 
> then partition filtering happens properly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (PIG-5264) Remove deprecated keys from PigConfiguration

2017-07-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084623#comment-16084623
 ] 

Daniel Dai commented on PIG-5264:
-

+1

> Remove deprecated keys from PigConfiguration
> 
>
> Key: PIG-5264
> URL: https://issues.apache.org/jira/browse/PIG-5264
> Project: Pig
>  Issue Type: Improvement
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Minor
> Fix For: 0.18.0
>
> Attachments: PIG-5264_1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> PigConfiguration includes several deprecated constants (like INSERT_ENABLED, 
> SCHEMA_TUPLE_SHOULD_ALLOW_FORCE, etc.). This should be removed as all a 
> deprecated since multiple version.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (PIG-5254) Hit Ctrl-D to quit grunt shell fail

2017-06-07 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5254:

Fix Version/s: 0.17.1

> Hit Ctrl-D to quit grunt shell fail
> ---
>
> Key: PIG-5254
> URL: https://issues.apache.org/jira/browse/PIG-5254
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0, 0.17.1
>
>
> Exception:
> {code}
> java.lang.NullPointerException
> at 
> org.apache.pig.tools.grunt.ConsoleReaderInputStream$ConsoleLineInputStream.read(ConsoleReaderInputStream.java:107)
> at java.io.InputStream.read(InputStream.java:170)
> at java.io.SequenceInputStream.read(SequenceInputStream.java:207)
> at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
> at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
> at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
> at java.io.InputStreamReader.read(InputStreamReader.java:184)
> at java.io.BufferedReader.fill(BufferedReader.java:161)
> at java.io.BufferedReader.read1(BufferedReader.java:212)
> at java.io.BufferedReader.read(BufferedReader.java:286)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.FillBuff(JavaCharStream.java:143)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.ReadByte(JavaCharStream.java:171)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.readChar(JavaCharStream.java:274)
> at 
> org.apache.pig.tools.pigscript.parser.JavaCharStream.BeginToken(JavaCharStream.java:193)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3215)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1511)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
> at org.apache.pig.Main.run(Main.java:564)
> at org.apache.pig.Main.main(Main.java:175)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PIG-5254) Hit Ctrl-D to quit grunt shell fail

2017-06-07 Thread Daniel Dai (JIRA)

Daniel Dai created PIG-5254:
---

 Summary: Hit Ctrl-D to quit grunt shell fail
 Key: PIG-5254
 URL: https://issues.apache.org/jira/browse/PIG-5254
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.18.0


Exception:
{code}
java.lang.NullPointerException
at 
org.apache.pig.tools.grunt.ConsoleReaderInputStream$ConsoleLineInputStream.read(ConsoleReaderInputStream.java:107)
at java.io.InputStream.read(InputStream.java:170)
at java.io.SequenceInputStream.read(SequenceInputStream.java:207)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.read1(BufferedReader.java:212)
at java.io.BufferedReader.read(BufferedReader.java:286)
at 
org.apache.pig.tools.pigscript.parser.JavaCharStream.FillBuff(JavaCharStream.java:143)
at 
org.apache.pig.tools.pigscript.parser.JavaCharStream.ReadByte(JavaCharStream.java:171)
at 
org.apache.pig.tools.pigscript.parser.JavaCharStream.readChar(JavaCharStream.java:274)
at 
org.apache.pig.tools.pigscript.parser.JavaCharStream.BeginToken(JavaCharStream.java:193)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:3215)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:1511)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:117)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)
at org.apache.pig.Main.run(Main.java:564)
at org.apache.pig.Main.main(Main.java:175)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5225) Several unit tests are not annotated with @Test

2017-05-31 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16032007#comment-16032007
 ] 

Daniel Dai commented on PIG-5225:
-

The test is added even before me. The test won't throw exception, it will get a 
null result and a warning counter like Rohini points out. However, the test 
name suggest it is testing a failed UDF. I don't this is valid anymore and fine 
to remove it.

> Several unit tests are not annotated with @Test
> ---
>
> Key: PIG-5225
> URL: https://issues.apache.org/jira/browse/PIG-5225
> Project: Pig
>  Issue Type: Bug
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
> Fix For: 0.18.0
>
> Attachments: PIG-5225.patch
>
>
> Several test cases are not annotated with @Test. Since we use JUnit 4, these 
> test cases seems to be excluded.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-05-31 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-5216.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

Also did rebase after spark merge. Patch committed to trunk. Thanks Iris!

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.18.0
>
> Attachments: PIG-5216-1.patch, PIG-5216-2.patch, PIG-5216-3.patch, 
> PIG-5216-4.patch
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-05-31 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5216:

Attachment: PIG-5216-4.patch

Find several issues when running unit tests:
1. In POLoad.setup, we also need to set LoadFuncDecorator
2. When serializing "pig.loads", set POLoad.parentPlan null as we don't want to 
serialize all physical plan
3. In MRJobStats, we still refer "pig.inputs"
4. Some formatting issues

Attach PIG-5216-4.patch.

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.18.0
>
> Attachments: PIG-5216-1.patch, PIG-5216-2.patch, PIG-5216-3.patch, 
> PIG-5216-4.patch
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5184) set command to view value of a variable

2017-05-29 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028088#comment-16028088
 ] 

Daniel Dai commented on PIG-5184:
-

Addressing Rohini's review comments.

> set command to view value of a variable
> ---
>
> Key: PIG-5184
> URL: https://issues.apache.org/jira/browse/PIG-5184
> Project: Pig
>  Issue Type: Improvement
>  Components: parser
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
> Attachments: PIG-5184-1.patch, PIG-5184-2.patch
>
>
> Currently, set command can set the value of a variable, or show all variables 
> along with value. I'd like to add another form which show the value of a 
> particular variable. For example:
> >set fs.defaultFS (show value of fs.defaultFS).
> That will help us debug a Pig session in a cleaner way (as compare to show 
> all variables).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5184) set command to view value of a variable

2017-05-29 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5184:

Attachment: PIG-5184-2.patch

> set command to view value of a variable
> ---
>
> Key: PIG-5184
> URL: https://issues.apache.org/jira/browse/PIG-5184
> Project: Pig
>  Issue Type: Improvement
>  Components: parser
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.18.0
>
> Attachments: PIG-5184-1.patch, PIG-5184-2.patch
>
>
> Currently, set command can set the value of a variable, or show all variables 
> along with value. I'd like to add another form which show the value of a 
> particular variable. For example:
> >set fs.defaultFS (show value of fs.defaultFS).
> That will help us debug a Pig session in a cleaner way (as compare to show 
> all variables).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4059) Pig on Spark

2017-05-28 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027728#comment-16027728
 ] 

Daniel Dai commented on PIG-4059:
-

+1. Didn't get a chance to review the patch, but we shall not further delay it.

> Pig on Spark
> 
>
> Key: PIG-4059
> URL: https://issues.apache.org/jira/browse/PIG-4059
> Project: Pig
>  Issue Type: New Feature
>  Components: spark
>Reporter: Rohini Palaniswamy
>Assignee: Praveen Rachabattuni
>  Labels: spork
> Fix For: spark-branch
>
> Attachments: Pig-on-Spark-Design-Doc.pdf, Pig-on-Spark-Scope.pdf
>
>
> Setting up your development environment:
> 0. download spark release package(currently pig on spark only support spark 
> 1.6).
> 1. Check out Pig Spark branch.
> 2. Build Pig by running "ant jar" and "ant -Dhadoopversion=23 jar" for 
> hadoop-2.x versions
> 3. Configure these environmental variables:
> export HADOOP_USER_CLASSPATH_FIRST="true"
> Now we support “local” and "yarn-client" mode, you can export system variable 
> “SPARK_MASTER” like:
> export SPARK_MASTER=local or export SPARK_MASTER="yarn-client"
> 4. In local mode: ./pig -x spark_local xxx.pig
> In yarn-client mode: 
> export SPARK_HOME=xx; 
> export SPARK_JAR=hdfs://example.com:8020/ (the hdfs location where 
> you upload the spark-assembly*.jar)
> ./pig -x spark xxx.pig



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5201) Null handling on FLATTEN

2017-05-26 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026631#comment-16026631
 ] 

Daniel Dai commented on PIG-5201:
-

It is not a regression, so it is certainly fine to push out. Koji can still try 
but it should not be a release blocker.

> Null handling on FLATTEN
> 
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Fix For: 0.17.0
>
> Attachments: pig-5201-v00-testonly.patch, pig-5201-v01.patch, 
> pig-5201-v02.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4662) New optimizer rule: filter nulls before inner joins

2017-05-25 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025387#comment-16025387
 ] 

Daniel Dai commented on PIG-4662:
-

I don't think it would make noticeable performance difference going either way. 
I'd like to see a modular design rather than intermingle different concept 
together. Also I don't feel it is hard to find the join key in the logical 
optimizer and adding a filter on it.

> New optimizer rule: filter nulls before inner joins
> ---
>
> Key: PIG-4662
> URL: https://issues.apache.org/jira/browse/PIG-4662
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ido Hadanny
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>  Labels: Performance
> Fix For: 0.18.0
>
>
> As stated in the docs, rewriting an inner join and filtering nulls from 
> inputs can be a big performance gain: 
> http://pig.apache.org/docs/r0.14.0/perf.html#nulls
> We would like to add an optimizer rule which detects inner joins, and filters 
> nulls in all inputs:
> A = filter A by t is not null;
> B = filter B by x is not null;
> C = join A by t, B by x;
> see also: 
> http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5194) HiveUDF fails with Spark exec type

2017-05-25 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025249#comment-16025249
 ] 

Daniel Dai commented on PIG-5194:
-

+1 for the HiveUDAF change, thanks for catching this!

> HiveUDF fails with Spark exec type
> --
>
> Key: PIG-5194
> URL: https://issues.apache.org/jira/browse/PIG-5194
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Adam Szita
>Assignee: Adam Szita
> Fix For: spark-branch
>
> Attachments: PIG-5194.0.patch, PIG-5194.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5231) PigStorage with -schema may produce inconsistent outputs with more fields

2017-05-25 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025155#comment-16025155
 ] 

Daniel Dai commented on PIG-5231:
-

Vote for 3. We pick the first schema in dirs in all LoadFunc, such as 
OrcStorage, AvroStorage. I don't think we shall make an exception for 
PigStorage. +1 for the patch.

> PigStorage with -schema may produce inconsistent outputs with more fields
> -
>
> Key: PIG-5231
> URL: https://issues.apache.org/jira/browse/PIG-5231
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5231-v01.patch
>
>
> When multiple directories are passed to PigStorage(',','-schema'), pig will 
> {quote}
> No attempt to merge conflicting schemas is made during loading. The first 
> schema encountered during a file system scan is used.
> {quote}
> For two directories input with schema
> file1: (f1:chararray, f2:int) and 
> file2: (f1:chararray, f2:int, f3:int) 
> Pig will pick the first schema from file1 and only allow f1, f2 access. 
> However, output would still contain 3 fields for tuples from file2.  This 
> later leads to complete corrupt outputs due to shifted fields resulting in 
> incorrect references. 
> (This may also happen when input itself contains the delimiter.)
> If file2 schema is picked, this is already handled by filling the missing 
> fields with null.  (PIG-3100)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-05-25 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024919#comment-16024919
 ] 

Daniel Dai commented on PIG-5224:
-

bq. Well, if next LOForEach is not removing all the columns which are not used, 
then essentially those columns are being used and therefore ColumnPruner would 
not have tried to prune them in the first place?
That's only if user write "foreach" statement carefully. If he project a column 
but never used in the script, Column pruner will still think this is a column 
should remove.

+1 for pig-5224-v2.patch.

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch, 
> pig-5224-v2.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PIG-3021) Split results missing records when there is null values in the column comparison

2017-05-25 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-3021.
-
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 0.17.0

+1 for PIG-3021-4.patch. Patch committed to trunk. Thanks Nian, Cheolsoo!

[~jeffjee617], do you mind adding some documentation as well (in other Jira)?

> Split results missing records when there is null values in the column 
> comparison
> 
>
> Key: PIG-3021
> URL: https://issues.apache.org/jira/browse/PIG-3021
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Chang Luo
>Assignee: Nian Ji
> Fix For: 0.17.0
>
> Attachments: PIG-3021-2.patch, PIG-3021-3.patch, PIG-3021-4.patch, 
> PIG-3021.patch
>
>
> Suppose a(x, y)
> split a into b if x==y, c otherwise;
> One will expect the union of b and c will be a.  However, if x or y is null, 
> the record won't appear in either b or c.
> To workaround this, I have to change to the following:
> split a into b if x is not null and y is not null and x==y, c otherwise;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-05-25 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024320#comment-16024320
 ] 

Daniel Dai commented on PIG-5224:
-

The inserted LOForEach remove all the columns which are not used in the scripts 
going forward. The next LOForEach is not necessary doing that. I believe this 
is not for performance reason (The performance gain for removing several 
columns might be debatable), this is to make ColumnPruner simpler.

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5235) Typecast with as-clause fails for tuple/bag with an empty schema

2017-05-25 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024288#comment-16024288
 ] 

Daniel Dai commented on PIG-5235:
-

+1

> Typecast with as-clause fails for tuple/bag with an empty schema
> 
>
> Key: PIG-5235
> URL: https://issues.apache.org/jira/browse/PIG-5235
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5235-v01.patch
>
>
> Following script fails with trunk(0.17).
> {code}
> a = load 'test.txt' as (mytuple:tuple (), gpa:float);
> b = foreach a generate mytuple as (mytuple2:(name:int, age:double));
> store b into '/tmp/deleteme';
> {code}
> 2017-05-16 09:52:31,280 \[main] ERROR org.apache.pig.tools.grunt.Grunt - 
> ERROR 2999: Unexpected internal error. null
> (This is a continuation from the as-clause fix at PIG-2315 and follow up jira 
> PIG-4933)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4924) Translate failures.maxpercent MR setting to Tez

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024195#comment-16024195
 ] 

Daniel Dai commented on PIG-4924:
-

+1

> Translate failures.maxpercent MR setting to Tez
> ---
>
> Key: PIG-4924
> URL: https://issues.apache.org/jira/browse/PIG-4924
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4924-1.patch
>
>
> TEZ-3271 adds support equivalent to mapreduce.map.failures.maxpercent and 
> mapreduce.reduce.failures.maxpercent. We need to translate that per vertex.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4662) New optimizer rule: filter nulls before inner joins

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024192#comment-16024192
 ] 

Daniel Dai commented on PIG-4662:
-

I prefer to do it in optimizer, it seems to be more clear.

> New optimizer rule: filter nulls before inner joins
> ---
>
> Key: PIG-4662
> URL: https://issues.apache.org/jira/browse/PIG-4662
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ido Hadanny
>Assignee: Satish Subhashrao Saley
>Priority: Minor
>  Labels: Performance
> Fix For: 0.18.0
>
>
> As stated in the docs, rewriting an inner join and filtering nulls from 
> inputs can be a big performance gain: 
> http://pig.apache.org/docs/r0.14.0/perf.html#nulls
> We would like to add an optimizer rule which detects inner joins, and filters 
> nulls in all inputs:
> A = filter A by t is not null;
> B = filter B by x is not null;
> C = join A by t, B by x;
> see also: 
> http://stackoverflow.com/questions/32088389/is-the-pig-optimizer-filtering-nulls-before-joining



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4914) Add testcase for join with special characters in chararray

2017-05-24 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024182#comment-16024182
 ] 

Daniel Dai commented on PIG-4914:
-

This is only for tuple join key, right? String join key of utf8 characters is 
already covered in PIG-4358.

> Add testcase for join with special characters in chararray
> --
>
> Key: PIG-4914
> URL: https://issues.apache.org/jira/browse/PIG-4914
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.18.0
>
>
>   This jira is to add testcase for PIG-4821.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5185) Job name show "DefaultJobName" when running a Python script

2017-05-24 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5185:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thanks Rohini for review!

> Job name show "DefaultJobName" when running a Python script
> ---
>
> Key: PIG-5185
> URL: https://issues.apache.org/jira/browse/PIG-5185
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
> Attachments: PIG-5185-1.patch, PIG-5185-2.patch
>
>
> Run a python script with Pig, Hadoop WebUI show "DefaultJobName" instead of 
> script name. We shall use script name, the same semantic for regular Pig 
> script.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5222) Fix Junit Deprecations

2017-04-17 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5222:

Attachment: PIG-5222-fixtest.patch

TestEvalPipelineLocal.testFunctionInsideFunction, 
TestEvalPipelineLocal.testBagFunctionWithFlattening 
andTestEvalPipelineLocal.testMapLookup failed with the patch. Attach fix.

> Fix Junit Deprecations
> --
>
> Key: PIG-5222
> URL: https://issues.apache.org/jira/browse/PIG-5222
> Project: Pig
>  Issue Type: Improvement
>Reporter: William Watson
>Assignee: William Watson
> Fix For: 0.17.0
>
> Attachments: fix-junit-deprecations.patch, PIG-5222-fixtest.patch
>
>
> junit.framework.Assert is deprecated in favor of org.junit.Assert. Warnings 
> pop up all over the tests



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5221) More fs.default.name deprecation warnings

2017-04-15 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5221:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Switch the order of checking should be Ok.

Patch committed to trunk. Thanks William!

> More fs.default.name deprecation warnings
> -
>
> Key: PIG-5221
> URL: https://issues.apache.org/jira/browse/PIG-5221
> Project: Pig
>  Issue Type: Improvement
>Reporter: William Watson
>Assignee: William Watson
> Fix For: 0.17.0
>
> Attachments: remove-fs-default-name-deprecations.patch
>
>
> There are more places in the code, especially in the tests where we're still 
> using fs.default.name instead of fs.defaultFS and we get deprecation warnings 
> because of it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5222) Fix Junit Deprecations

2017-04-15 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5222:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Patch committed to trunk. Thanks William!

> Fix Junit Deprecations
> --
>
> Key: PIG-5222
> URL: https://issues.apache.org/jira/browse/PIG-5222
> Project: Pig
>  Issue Type: Improvement
>Reporter: William Watson
>Assignee: William Watson
> Fix For: 0.17.0
>
> Attachments: fix-junit-deprecations.patch
>
>
> junit.framework.Assert is deprecated in favor of org.junit.Assert. Warnings 
> pop up all over the tests



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5224) Extra foreach from ColumnPrune preventing Accumulator usage

2017-04-15 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969841#comment-15969841
 ] 

Daniel Dai commented on PIG-5224:
-

If this is a problem of extra foreach after LOCogroup, why not adding the same 
check in ColumnPruneVisitor.visit(LOCogroup cg) instead of 
addForEachIfNecessary?

> Extra foreach from ColumnPrune preventing Accumulator usage
> ---
>
> Key: PIG-5224
> URL: https://issues.apache.org/jira/browse/PIG-5224
> Project: Pig
>  Issue Type: Improvement
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
> Attachments: pig-5224-v0-testonly.patch, pig-5224-v1.patch
>
>
> {code}
> A = load 'input' as (id:int, fruit);
> B = foreach A generate id; -- to enable columnprune
> C = group B by id;
> D = foreach C {
> o = order B by id;
> generate org.apache.pig.test.utils.AccumulatorBagCount(o);
> }
> STORE D into ...
> {code}
> Pig fails to use Accumulator interface for this UDF.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing

2017-04-14 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5223:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Both tests pass. Thanks for additional test.

Patch committed to trunk. Thanks Jin!

> TestLimitVariable.testNestedLimitVariable1 and 
> TestSecondarySortMR.testNestedLimitedSort  failing
> -
>
> Key: PIG-5223
> URL: https://issues.apache.org/jira/browse/PIG-5223
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5223-1.patch, PIG-5223-2.patch
>
>
> TestLimitVariable.testNestedLimitVariable1 
> {quote}
> Comparing actual and expected results.  expected:<\[(1,11), (2,3), (3,10), 
> (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]>
> {quote}
> TestSecondarySortMR.testNestedLimitedSort
> {quote}
> Error during parsing.   mismatched input 'in' expecting 
> INTO
> {quote}
> Latter is probably a simple syntax error.  Former looks serious. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing

2017-04-14 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969265#comment-15969265
 ] 

Daniel Dai commented on PIG-5223:
-

Both conditions should be met in limit by variable case. It should not happen 
if mLimit==-1 and mlimitPlan==null.

> TestLimitVariable.testNestedLimitVariable1 and 
> TestSecondarySortMR.testNestedLimitedSort  failing
> -
>
> Key: PIG-5223
> URL: https://issues.apache.org/jira/browse/PIG-5223
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5223-1.patch
>
>
> TestLimitVariable.testNestedLimitVariable1 
> {quote}
> Comparing actual and expected results.  expected:<\[(1,11), (2,3), (3,10), 
> (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]>
> {quote}
> TestSecondarySortMR.testNestedLimitedSort
> {quote}
> Error during parsing.   mismatched input 'in' expecting 
> INTO
> {quote}
> Latter is probably a simple syntax error.  Former looks serious. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing

2017-04-14 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15969250#comment-15969250
 ] 

Daniel Dai commented on PIG-5223:
-

mlimitplan is an expression to calculate limit variable. That is, the number of 
limited rows is determined at runtime by evaluation expressionPlan (the 
physical plan equivalence of mlimitplan). If (mLimit == -1 && mlimitPlan != 
null), that means limiting by variable not constant. It is basically the same 
as your (mLimit = -1) condition, but add another condition for assurance.

> TestLimitVariable.testNestedLimitVariable1 and 
> TestSecondarySortMR.testNestedLimitedSort  failing
> -
>
> Key: PIG-5223
> URL: https://issues.apache.org/jira/browse/PIG-5223
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5223-1.patch
>
>
> TestLimitVariable.testNestedLimitVariable1 
> {quote}
> Comparing actual and expected results.  expected:<\[(1,11), (2,3), (3,10), 
> (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]>
> {quote}
> TestSecondarySortMR.testNestedLimitedSort
> {quote}
> Error during parsing.   mismatched input 'in' expecting 
> INTO
> {quote}
> Latter is probably a simple syntax error.  Former looks serious. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing

2017-04-13 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968670#comment-15968670
 ] 

Daniel Dai edited comment on PIG-5223 at 4/14/17 5:54 AM:
--

Yes, you can use the condition (mLimit == -1 && mlimitPlan != null), and if 
true, disable the optimization. It is possible to push the limitPlan to 
LimitedSortedDataBag, but it is not a one line change and we can do it in 
followup.


was (Author: daijy):
Yes, you can use the condition (mLimit == -1 && mlimitPlan != null), and if 
true, disable the optimization. It is possible to push the limitPlan to 
LimitedSortedDataBag, but it is not a one line change and we can do it in 
followup. Please upload the patch.

> TestLimitVariable.testNestedLimitVariable1 and 
> TestSecondarySortMR.testNestedLimitedSort  failing
> -
>
> Key: PIG-5223
> URL: https://issues.apache.org/jira/browse/PIG-5223
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5223-1.patch
>
>
> TestLimitVariable.testNestedLimitVariable1 
> {quote}
> Comparing actual and expected results.  expected:<\[(1,11), (2,3), (3,10), 
> (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]>
> {quote}
> TestSecondarySortMR.testNestedLimitedSort
> {quote}
> Error during parsing.   mismatched input 'in' expecting 
> INTO
> {quote}
> Latter is probably a simple syntax error.  Former looks serious. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5223) TestLimitVariable.testNestedLimitVariable1 and TestSecondarySortMR.testNestedLimitedSort failing

2017-04-13 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968670#comment-15968670
 ] 

Daniel Dai commented on PIG-5223:
-

Yes, you can use the condition (mLimit == -1 && mlimitPlan != null), and if 
true, disable the optimization. It is possible to push the limitPlan to 
LimitedSortedDataBag, but it is not a one line change and we can do it in 
followup. Please upload the patch.

> TestLimitVariable.testNestedLimitVariable1 and 
> TestSecondarySortMR.testNestedLimitedSort  failing
> -
>
> Key: PIG-5223
> URL: https://issues.apache.org/jira/browse/PIG-5223
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5223-1.patch
>
>
> TestLimitVariable.testNestedLimitVariable1 
> {quote}
> Comparing actual and expected results.  expected:<\[(1,11), (2,3), (3,10), 
> (6,15)]> but was:<\[(1,11), (2,3), (3,10), (4,11), (5,10), (6,15)]>
> {quote}
> TestSecondarySortMR.testNestedLimitedSort
> {quote}
> Error during parsing.   mismatched input 'in' expecting 
> INTO
> {quote}
> Latter is probably a simple syntax error.  Former looks serious. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967121#comment-15967121
 ] 

Daniel Dai commented on PIG-5211:
-

I must make a mistake when rebasing the patch. Sure, go ahead, thanks Koji!

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, 
> PIG-5211-4.patch, PIG-5211-5.patch, pig-5211-testfix-postcommit.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5217) Pig Streaming over python multiprocessing

2017-04-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966570#comment-15966570
 ] 

Daniel Dai commented on PIG-5217:
-

Are you able to give an example? Pig shall run python as an external process 
and I don't see a reason why Pig shall disrupt multiprocessing. Do you have a 
sense why?

> Pig Streaming over python multiprocessing
> -
>
> Key: PIG-5217
> URL: https://issues.apache.org/jira/browse/PIG-5217
> Project: Pig
>  Issue Type: Bug
>  Components: internal-udfs
>Affects Versions: 0.15.0
> Environment: python 2.7,pig 0.15.0,multi-core processor
>Reporter: bharatpattani
>
> python multiprocessing is not working with pig streaming.
> Following are the steps for that:
> 1. Create python script with "multiprocessing" which can utilise at least two 
> cores of the processor.
> 2. Create an pig script which will call python script which mentioned in 
> above step.
> Please have a look at it and do need full for the same. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage

2017-04-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5219:

Fix Version/s: 0.17.0

> IndexOutOfBoundsException when loading multiple directories with different 
> schemas using OrcStorage
> ---
>
> Key: PIG-5219
> URL: https://issues.apache.org/jira/browse/PIG-5219
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
> Environment: Pig Version: 0.16.0
> OS: EMR 5.3.1
>Reporter: Omer Tal
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
>
> Scenario:
> # Data set based on two hours in the same day. In hour 00 the ORC file has 4 
> columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e}
> # Loading ORC files with the same schema (hour 00):
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading ORC files with different schemas in the same directory:
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading the whole day (both hour 00 and 02):
> {code}
> x = load 's3://orc_files/dt=2017-03-21' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> 37332 [PigTezLauncher-0] INFO  
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - DAG Status: 
> status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 
> Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, 
> vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, 
> diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: 
> Index: 4, Size: 4
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97)
> at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage

2017-04-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966567#comment-15966567
 ] 

Daniel Dai commented on PIG-5219:
-

Pig use the schema in the first ORC file as the schema for the relation. In 
general, Pig don't know what the schema should be if schema is different in 
different ORC file. For a solution, Pig shall not fail in the first place, and 
shall generate null instead. User shall cast it to right schema eventually:
{code}
x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage() as (a0, a1, 
a2, a3);
{code}

> IndexOutOfBoundsException when loading multiple directories with different 
> schemas using OrcStorage
> ---
>
> Key: PIG-5219
> URL: https://issues.apache.org/jira/browse/PIG-5219
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
> Environment: Pig Version: 0.16.0
> OS: EMR 5.3.1
>Reporter: Omer Tal
>Assignee: Daniel Dai
> Fix For: 0.17.0
>
>
> Scenario:
> # Data set based on two hours in the same day. In hour 00 the ORC file has 4 
> columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e}
> # Loading ORC files with the same schema (hour 00):
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading ORC files with different schemas in the same directory:
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading the whole day (both hour 00 and 02):
> {code}
> x = load 's3://orc_files/dt=2017-03-21' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> 37332 [PigTezLauncher-0] INFO  
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - DAG Status: 
> status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 
> Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, 
> vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, 
> diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: 
> Index: 4, Size: 4
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97)
> at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at

[jira] [Assigned] (PIG-5219) IndexOutOfBoundsException when loading multiple directories with different schemas using OrcStorage

2017-04-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5219:
---

Assignee: Daniel Dai

> IndexOutOfBoundsException when loading multiple directories with different 
> schemas using OrcStorage
> ---
>
> Key: PIG-5219
> URL: https://issues.apache.org/jira/browse/PIG-5219
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.16.0
> Environment: Pig Version: 0.16.0
> OS: EMR 5.3.1
>Reporter: Omer Tal
>Assignee: Daniel Dai
>
> Scenario:
> # Data set based on two hours in the same day. In hour 00 the ORC file has 4 
> columns {a,b,c,d} and during hour 02 it changes to 5 columns {a,b,c,d,e}
> # Loading ORC files with the same schema (hour 00):
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=00' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading ORC files with different schemas in the same directory:
> {code}
> x = load 's3://orc_files/dt=2017-03-21/hour=02' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4,5)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> (1,2,3,4)
> {code}
> # Loading the whole day (both hour 00 and 02):
> {code}
> x = load 's3://orc_files/dt=2017-03-21' using OrcStorage();
> dump x;
> {code}
> Result:
> {code}
> 37332 [PigTezLauncher-0] INFO  
> org.apache.pig.backend.hadoop.executionengine.tez.TezJob  - DAG Status: 
> status=FAILED, progress=TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 1 
> Killed: 0 FailedTaskAttempts: 4, diagnostics=Vertex failed, 
> vertexName=scope-2, vertexId=vertex_1491991474861_0006_1_00, 
> diagnostics=[Task failed, taskId=task_1491991474861_0006_1_00_00, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1491991474861_0006_1_00_00_0:java.lang.IndexOutOfBoundsException: 
> Index: 4, Size: 4
> at java.util.ArrayList.rangeCheck(ArrayList.java:653)
> at java.util.ArrayList.get(ArrayList.java:429)
> at 
> org.apache.pig.impl.util.hive.HiveUtils.convertHiveToPig(HiveUtils.java:97)
> at org.apache.pig.builtin.OrcStorage.getNext(OrcStorage.java:381)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
> at 
> org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:119)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:140)
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:123)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:376)
> at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:241)
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-04-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966502#comment-15966502
 ] 

Daniel Dai commented on PIG-5216:
-

This is a very decent patch. Several comments:
1. I think you also intend to remove TestErrorHandlingStoreFunc.java. This can 
be done via "git rm xxx/TestErrorHandlingStoreFunc.java" before generating patch
2. 
{code}
public static final String PIG_LOADS = "pig.inputs";
{code}
We'd better change constant into "pig.loads", since the content changes. People 
might suspect something wrong if they find "pig.inputs" is different than 
expected.
3. 
{code}
public static final String ERROR_HANDLER_COUNTER_GROUP = "storer_Error_Handler";
{code}
Make it "Error_Handler"
4. Need some documentation, please refer to PIG-4719, include something similar 
to the changes in src/docs/src/documentation/content/xdocs/udf.xml.

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.17.0
>
> Attachments: PIG-5216-1.patch, PIG-5216-2.patch
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15966292#comment-15966292
 ] 

Daniel Dai commented on PIG-5211:
-

Also opened PIG-5220 to improve NestedLimitOptimizer.

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, 
> PIG-5211-4.patch, PIG-5211-5.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (PIG-5220) Improve NestedLimitOptimizer to handle general limit push up

2017-04-12 Thread Daniel Dai (JIRA)

Daniel Dai created PIG-5220:
---

 Summary: Improve NestedLimitOptimizer to handle general limit push 
up
 Key: PIG-5220
 URL: https://issues.apache.org/jira/browse/PIG-5220
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.17.0


Currently, NestedLimitOptimizer only handles the case limit right after sort. 
In general, we shall push up limit recursively similar to LimitOptimizer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5211) Optimize Nested Limited Sort

2017-04-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5211:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1.

Patch committed to trunk. Thanks [~jins]!

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, 
> PIG-5211-4.patch, PIG-5211-5.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5214) search any substring in the input string

2017-04-12 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5214:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1.

Patch committed to trunk. Thanks Yuxiang!

> search any substring in the input string
> 
>
> Key: PIG-5214
> URL: https://issues.apache.org/jira/browse/PIG-5214
> Project: Pig
>  Issue Type: New Feature
>  Components: internal-udfs
>Reporter: Yuxiang Wang
>Assignee: Yuxiang Wang
> Fix For: 0.17.0
>
> Attachments: PIG-5214-1.patch, PIG-5214-2.patch
>
>
> A new Pig UDF *STRING_SEARCH_ALL* that Implementing regex for searching 
> keyword(substring) in a line of string, and all matched substrings will be 
> stored as individual tuples in a bag, i.e.{code} output: ({(a),(b),(c)}){code}
> Help us to find all regex matches, for example, we may use 
> *FLATTEN(STRING_SEARCH_ALL(string, regex))* to list all matches for an easier 
> view of output.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-04-08 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961987#comment-15961987
 ] 

Daniel Dai commented on PIG-5216:
-

LoadFuncDecorator.java is not included in the patch. If you are using git, you 
need to use "git add" to add new files, commit your changes, and use "git show" 
to generate patch with new files.

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.17.0
>
> Attachments: PIG-5216-1.patch
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-08 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961983#comment-15961983
 ] 

Daniel Dai commented on PIG-5211:
-

The code changes looks good now. We'd better add several more tests:
1. improve TestOptimizeNestedLimit to translate logical plan to physical plan, 
then MR plan, please refer to TestPlanGeneration.testStoreAlias for how to 
translate query into logicalplan/physical plan/MR plan
2. add a test to run the query with nested limit sort, to make sure the result 
is correct, please refer to TestEvalPipelineLocal for how to run a query and 
compare result
3. add a test to TestSecondarySort to make sure nested limited sort is not get 
optimized with SecondaryKeyOptimizer

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch, 
> PIG-5211-4.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-04-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5216:
---

Assignee: Iris Zeng

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.17.0
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5216) Customizable Error Handling for Loaders in Pig

2017-04-08 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5216:

Fix Version/s: 0.17.0

> Customizable Error Handling for Loaders in Pig
> --
>
> Key: PIG-5216
> URL: https://issues.apache.org/jira/browse/PIG-5216
> Project: Pig
>  Issue Type: Improvement
>Reporter: Iris Zeng
>Assignee: Iris Zeng
> Fix For: 0.17.0
>
>
> Add Error Handling for Loaders in Pig, so that user can choose to allow 
> errors when load data, and set error numbers / rate
> Ideas based on error handling on store func see 
> https://issues.apache.org/jira/browse/PIG-4704



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-08 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961945#comment-15961945
 ] 

Daniel Dai commented on PIG-5211:
-

Another comment, in OptimizeNestedLimitTransformer, when we iterator through 
innerPlan.getOperators(), we cannot assume the order of operators in the 
iterator, so the check (pred instanceof LOSort && op instanceof LOLimit) is not 
always valid. We can find the limit operator first, and then use 
currentPlan.getPredecessors(limit) to make sure the predecessor is LOSort.

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-08 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961941#comment-15961941
 ] 

Daniel Dai commented on PIG-5211:
-

There are still two Java 1.8 only code, error message when compiling with Java 
1.7:
{code}
[javac] 
/Users/daijy/pig2/src/org/apache/pig/data/LimitedSortedDataBag.java:61: error: 
no suitable constructor found for PriorityQueue(Comparator)
[javac] this.priorityQ = new 
PriorityQueue(getReversedComparator(mComp));
[javac] 
/Users/daijy/pig2/src/org/apache/pig/data/LimitedSortedDataBag.java:281: error: 
local variable comp is accessed from within inner class; needs to be declared 
final
[javac] return -comp.compare(o1, o2);
{code}

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch, PIG-5211-3.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-3021) Split results missing records when there is null values in the column comparison

2017-04-07 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-3021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15961565#comment-15961565
 ] 

Daniel Dai commented on PIG-3021:
-

[~cheolsoo], do you remember why you change POIsNull.java? If all tests pass 
without the change, I'd like to commit the patch as is.

> Split results missing records when there is null values in the column 
> comparison
> 
>
> Key: PIG-3021
> URL: https://issues.apache.org/jira/browse/PIG-3021
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.10.0
>Reporter: Chang Luo
>Assignee: Nian Ji
> Attachments: PIG-3021-2.patch, PIG-3021-3.patch, PIG-3021-4.patch, 
> PIG-3021.patch
>
>
> Suppose a(x, y)
> split a into b if x==y, c otherwise;
> One will expect the union of b and c will be a.  However, if x or y is null, 
> the record won't appear in either b or c.
> To workaround this, I have to change to the following:
> split a into b if x is not null and y is not null and x==y, c otherwise;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-07 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960409#comment-15960409
 ] 

Daniel Dai commented on PIG-5211:
-

Thanks for the patch, pretty good actually. Several comments:
1. LimitedSortedDataBag shall not extends SortedDataBag, or even 
DefaultAbstractBag, since it does not use mContents and it does not handle 
spill. I'd rather to implement DataBag directly, and implements all methods of 
DataBag. It should be too hard since we don't need to deal with spill.
2. Comparator.reversed only valid in JDK 1.8. We need to make sure Pig compiles 
under JDK 1.7 as well
3. We need to add a test case not only make sure it uses limited LOSort, also 
need to make sure it translate to the right physical plan, and it runs and 
generate right result
4. I am fine NestedLimitOptimizer only deal with limit right after sort 
currently, we need to create a Jira to deal with operators in the middle though 
(push limit all the way up, similar to LimitOptimizer)

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch, PIG-5211-2.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5214) search any substring in the input string

2017-04-07 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15960357#comment-15960357
 ] 

Daniel Dai commented on PIG-5214:
-

Thanks for the patch. Left some comments on review board.

> search any substring in the input string
> 
>
> Key: PIG-5214
> URL: https://issues.apache.org/jira/browse/PIG-5214
> Project: Pig
>  Issue Type: New Feature
>  Components: internal-udfs
>Reporter: Yuxiang Wang
>Assignee: Yuxiang Wang
> Fix For: 0.17.0
>
> Attachments: PIG-5214-1.patch
>
>
> A new Pig UDF *STRING_SEARCH_ALL* that Implementing regex for searching 
> keyword(substring) in a line of string, and all matched substrings will be 
> stored as individual tuples in a bag, i.e.{code} output: ({(a),(b),(c)}){code}
> Help us to find all regex matches, for example, we may use 
> *FLATTEN(STRING_SEARCH_ALL(string, regex))* to list all matches for an easier 
> view of output.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching

2017-04-06 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5210:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

+1. Patch looks good. Committed to trunk. Thans Lili!

> Option to print MR/Tez plan before launching
> 
>
> Key: PIG-5210
> URL: https://issues.apache.org/jira/browse/PIG-5210
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Lili Yu
>Assignee: Lili Yu
> Fix For: 0.17.0
>
> Attachments: PIG-5210-new.patch, screenshot MR plan.png, screenshot 
> Tez Plan.png
>
>
> For pig script, users need to use {{pig -e "explain -script test.pig"}} to 
> print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to 
> explain the plan automatically. This option can help to print out  MR/Tez 
> plan automatically before implementing MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching

2017-04-06 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5210:

Description: 
For pig script, users need to use {{pig -e "explain -script test.pig"}} to 
print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to 
explain the plan automatically. This option can help to print out  MR/Tez plan 
automatically before implement of MapReduce.







  was:
h5. Adding an option to print out MR/Tez plan before launching 

For pig script, users need to use {{pig -e "explain -script test.pig"}} to 
print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to 
explain the plan automatically. This option can help to print out  MR/Tez plan 
automatically before implement of MapReduce.

Steps:
- Get clone of 0.17.0 version PIG by git pull
- Set up Eclipse 
- Import Pig src to Eclipse, and set pig.print.exec.plan "true" in file 
_JobControlCompiler.java_,_TezJobCompiler.java_ before Mapreduce starts
- Check for compiling {{ant}}
- After building successful, Start remote debugger in Eclipse {{export 
PIG_OPTS="- 
agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8000"}} Or start 
to run pig only in terminal {{unset PIG_OPTS}} 
- Test cases: For MR  engine {{-x local test.pig}}; For Tez engine {{-x 
tez_local test.pig}}
- Get the plan and test results as expected










> Option to print MR/Tez plan before launching
> 
>
> Key: PIG-5210
> URL: https://issues.apache.org/jira/browse/PIG-5210
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Lili Yu
>Assignee: Lili Yu
> Fix For: 0.17.0
>
> Attachments: PrintPlan.patch, screenshot-change.png, screenshot MR 
> plan.png, screenshot Tez Plan.png, test.pig
>
>
> For pig script, users need to use {{pig -e "explain -script test.pig"}} to 
> print out MR/Tez Plan. But for Python script, it is a hard thing for PIG to 
> explain the plan automatically. This option can help to print out  MR/Tez 
> plan automatically before implement of MapReduce.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5211) Optimize Nested Limited Sort

2017-04-04 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954743#comment-15954743
 ] 

Daniel Dai commented on PIG-5211:
-

Looks pretty good so far. Need to fine tune NestedLimitOptimizer, existence of 
both LOLimit and LOSort is not enough, must make sure LOLimit is right after 
LOSort, or you can follow LimitOptimizer to push LOLimit all the way up, which 
is more sophisticated (I am not insisting this tough). Also 
SecondaryKeyOptimizer does not recognize limited nested sort currently, it is 
possible SecondaryKeyOptimizer optimize limited sort into MR/Tez secondary 
sort, thus the limit is lost. So we shall disable SecondaryKeyOptimizer if the 
nested sort is a limited sort in SecondaryKeyOptimizer. You can use the 
following script as the test case which SecondaryKeyOptimizer is get involved:
{code}
a = load 'studenttab10k' as (name:chararray, age:int, gpa:double);
b = group a by name;
c = foreach b {
c1 = order a by age;
c2 = limit c1 5;
generate c2;
}
explain c;
{code}

> Optimize Nested Limited Sort
> 
>
> Key: PIG-5211
> URL: https://issues.apache.org/jira/browse/PIG-5211
> Project: Pig
>  Issue Type: Improvement
>Reporter: Jin Sun
>Assignee: Jin Sun
> Fix For: 0.17.0
>
> Attachments: PIG-5211-1.patch
>
>
> Currently in FOREACH clause, if both LIMIT and ORDER BY are present, pig 
> stores all elements and sort them. It should use a priority queue to be more 
> efficient in space. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5210) Option to print MR/Tez plan before launching

2017-04-03 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954222#comment-15954222
 ] 

Daniel Dai commented on PIG-5210:
-

This will be useful. For python script, it will be hard to do explain and get 
execution plan. With this option, Pig will print MR/Tez plan on console so user 
will get idea what is the MR/Tez job doing.

We need to put "pig.print.exec.plan" in PigConfiguration and it would be better 
to add a test case.

> Option to print MR/Tez plan before launching
> 
>
> Key: PIG-5210
> URL: https://issues.apache.org/jira/browse/PIG-5210
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Lili Yu
>Assignee: Lili Yu
> Fix For: 0.17.0
>
> Attachments: PrintPlan.patch
>
>
> Adding an option to print out MR/Tez plan before launching.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching

2017-04-03 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5210:

Summary: Option to print MR/Tez plan before launching  (was: Print MR/Tez 
plan before launching)

> Option to print MR/Tez plan before launching
> 
>
> Key: PIG-5210
> URL: https://issues.apache.org/jira/browse/PIG-5210
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Lili Yu
>Assignee: Lili Yu
> Fix For: 0.17.0
>
> Attachments: PrintPlan.patch
>
>
> Set pig.print.exec.plan "true" in 
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  
> and
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobCompiler.java
> Print out MR/Tez plan



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (PIG-5210) Option to print MR/Tez plan before launching

2017-04-03 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-5210:

Description: Adding an option to print out MR/Tez plan before launching.  
(was: Set pig.print.exec.plan "true" in 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
 
and
src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobCompiler.java

Print out MR/Tez plan)

> Option to print MR/Tez plan before launching
> 
>
> Key: PIG-5210
> URL: https://issues.apache.org/jira/browse/PIG-5210
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.17.0
>Reporter: Lili Yu
>Assignee: Lili Yu
> Fix For: 0.17.0
>
> Attachments: PrintPlan.patch
>
>
> Adding an option to print out MR/Tez plan before launching.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5201) Null handling on FLATTEN

2017-04-03 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954167#comment-15954167
 ] 

Daniel Dai commented on PIG-5201:
-

Thanks for pointing out, yes, PIG-2537 is for tuple, for bag, I agree null bag 
should drop (same as empty bag), bag with null item should generating according 
to schema (same as PIG-2537), in case no schema, we can generate single null 
column.

> Null handling on FLATTEN
> 
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5201-v00-testonly.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-5201) Null handling on FLATTEN

2017-04-01 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15952064#comment-15952064
 ] 

Daniel Dai commented on PIG-5201:
-

This is PIG-2537, this is a bug we shall fix.

> Null handling on FLATTEN
> 
>
> Key: PIG-5201
> URL: https://issues.apache.org/jira/browse/PIG-5201
> Project: Pig
>  Issue Type: Bug
>Reporter: Koji Noguchi
>Assignee: Koji Noguchi
>Priority: Minor
> Attachments: pig-5201-v00-testonly.patch
>
>
> Sometimes, FLATTEN(null) or FLATTEN(bag-with-null) seem to produce incorrect 
> results.
> Test code/script to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (PIG-5210) Print MR/Tez plan before launching

2017-03-31 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned PIG-5210:
---

Assignee: Lili Yu

> Print MR/Tez plan before launching
> --
>
> Key: PIG-5210
> URL: https://issues.apache.org/jira/browse/PIG-5210
> Project: Pig
>  Issue Type: Improvement
>Reporter: Lili Yu
>Assignee: Lili Yu
> Fix For: 0.17.0
>
>
> Set pig.print.exec.plan "true" in 
> src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
>  
> and
> src/org/apache/pig/backend/hadoop/executionengine/tez/TezJobCompiler.java
> Print out MR/Tez plan



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4677) Display failure information on stop on failure

2017-03-31 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15951709#comment-15951709
 ] 

Daniel Dai commented on PIG-4677:
-

+1 on PIG-4677-fixflakytest.patch.

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4677.2.patch, PIG-4677.3.patch, PIG-4677.4.patch, 
> PIG-4677-5.patch, PIG-4677-fixflakytest.patch, PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (PIG-4677) Display failure information on stop on failure

2017-03-30 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15948477#comment-15948477
 ] 

Daniel Dai commented on PIG-4677:
-

+1

> Display failure information on stop on failure
> --
>
> Key: PIG-4677
> URL: https://issues.apache.org/jira/browse/PIG-4677
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.11.1
>Reporter: Mit Desai
>Assignee: Rohini Palaniswamy
> Fix For: 0.17.0
>
> Attachments: PIG-4677.2.patch, PIG-4677.3.patch, PIG-4677.4.patch, 
> PIG-4677-5.patch, PIG-4677.patch
>
>
> When stop on failure option is specified, pig abruptly exits without 
> displaying any job stats or failed job information which it usually does in 
> case of failures.
> {code}
> 2015-06-04 20:35:38,170 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - 9% complete
> 2015-06-04 20:35:38,171 [uber-SubtaskRunner] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>   - Running jobs are 
> [job_1428329756093_3741748,job_1428329756093_3741752,job_1428329756093_3741753,job_1428329756093_3741754,job_1428329756093_3741756]
> 2015-06-04 20:35:40,201 [uber-SubtaskRunner] ERROR 
> org.apache.pig.tools.grunt.Grunt  - ERROR 6017: Job failed!
> Hadoop Job IDs executed by Pig: 
> job_1428329756093_3739816,job_1428329756093_3741752,job_1428329756093_3739814,job_1428329756093_3741748,job_1428329756093_3741756,job_1428329756093_3741753,job_1428329756093_3741754
> <<< Invocation of Main class completed <<<
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Resolved] (PIG-5190) ant docs issue by pig-5110

2017-03-15 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-5190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-5190.
-
Resolution: Invalid

> ant docs issue by pig-5110
> --
>
> Key: PIG-5190
> URL: https://issues.apache.org/jira/browse/PIG-5190
> Project: Pig
>  Issue Type: Bug
>Reporter: Jiang Song
> Fix For: 0.17.0
>
>
> ant docs issue by pig-5110



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4398 matches

Mail list logo