[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498819#comment-14498819
 ] 

Rohini Palaniswamy commented on PIG-4509:
-

Committed to trunk. Thanks for reporting this and the review [~thejas].

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch, PIG-4509-FixCompileError.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498812#comment-14498812
 ] 

Thejas M Nair commented on PIG-4509:


+1
The change looks good to me.
Thanks Rohini!



> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch, PIG-4509-FixCompileError.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4509:

Attachment: PIG-4509-FixCompileError.patch

Attached fix. 

Was surprised that I did not get a compilation error. Just wanted to be sure 
that I don't have a screwed up build environment. Interesting to know that MAC 
jdk does not fail compilation for this.

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch, PIG-4509-FixCompileError.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498787#comment-14498787
 ] 

Thejas M Nair commented on PIG-4509:


It builds fine on my mac as well with jdk 7. However, it is failing with jdk7 
in our internal build environment as well (probably linux).

The fact that it passes in some setups is certainly very strange. I think we 
should still go ahead and fix this, as far as i know this should result in a 
syntax error.


> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4489) Enable local mode tests for Spark engine

2015-04-16 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated PIG-4489:
-
Attachment: PIG-4489.1.patch

> Enable local mode tests for Spark engine
> 
>
> Key: PIG-4489
> URL: https://issues.apache.org/jira/browse/PIG-4489
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4489.1.patch, PIG-4489.patch
>
>
> Util.getLocalTestMode() currently only returns "tez_local" or "local".
> I see that ~212 testcases do this check, and we are not running these tests 
> against Spark at this point.
> Currently all Spark tests run in local mode ("local" as a the Spark Cluster 
> URL passed to JavaSparkContext), so we should enable these tests as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4489) Enable local mode tests for Spark engine

2015-04-16 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498432#comment-14498432
 ] 

Mohit Sabharwal commented on PIG-4489:
--

FYI [~kellyzly] and [~praveenr019], please take a look at attached patch.  Will 
link to RB, having some issues there.

We'll be running tests with this patch that we weren't running before. For 
example:
{code}
ant test-spark -Dhadoopversion=23 -Dtestcase=TestLoadStoreFuncLifeCycle
{code}

> Enable local mode tests for Spark engine
> 
>
> Key: PIG-4489
> URL: https://issues.apache.org/jira/browse/PIG-4489
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4489.patch
>
>
> Util.getLocalTestMode() currently only returns "tez_local" or "local".
> I see that ~212 testcases do this check, and we are not running these tests 
> against Spark at this point.
> Currently all Spark tests run in local mode ("local" as a the Spark Cluster 
> URL passed to JavaSparkContext), so we should enable these tests as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4489) Enable local mode tests for Spark engine

2015-04-16 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated PIG-4489:
-
Attachment: PIG-4489.patch

> Enable local mode tests for Spark engine
> 
>
> Key: PIG-4489
> URL: https://issues.apache.org/jira/browse/PIG-4489
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4489.patch
>
>
> Util.getLocalTestMode() currently only returns "tez_local" or "local".
> I see that ~212 testcases do this check, and we are not running these tests 
> against Spark at this point.
> Currently all Spark tests run in local mode ("local" as a the Spark Cluster 
> URL passed to JavaSparkContext), so we should enable these tests as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4489) Enable local mode tests for Spark engine

2015-04-16 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated PIG-4489:
-
Status: Patch Available  (was: Open)

> Enable local mode tests for Spark engine
> 
>
> Key: PIG-4489
> URL: https://issues.apache.org/jira/browse/PIG-4489
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
> Attachments: PIG-4489.patch
>
>
> Util.getLocalTestMode() currently only returns "tez_local" or "local".
> I see that ~212 testcases do this check, and we are not running these tests 
> against Spark at this point.
> Currently all Spark tests run in local mode ("local" as a the Spark Cluster 
> URL passed to JavaSparkContext), so we should enable these tests as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4489) Enable local mode tests for Spark engine

2015-04-16 Thread Mohit Sabharwal (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohit Sabharwal updated PIG-4489:
-
Summary: Enable local mode tests for Spark engine  (was: Add spark option 
to Util.getLocalTestMode() )

> Enable local mode tests for Spark engine
> 
>
> Key: PIG-4489
> URL: https://issues.apache.org/jira/browse/PIG-4489
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Mohit Sabharwal
>Assignee: Mohit Sabharwal
> Fix For: spark-branch
>
>
> Util.getLocalTestMode() currently only returns "tez_local" or "local".
> I see that ~212 testcases do this check, and we are not running these tests 
> against Spark at this point.
> Currently all Spark tests run in local mode ("local" as a the Spark Cluster 
> URL passed to JavaSparkContext), so we should enable these tests as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498377#comment-14498377
 ] 

Rohini Palaniswamy commented on PIG-4509:
-

[~knoguchi] tried and it compiles fine for him too. I have jdk8 in eclipse and 
jdk7 for ant. But it failed for me when I compiled with jdk6 on a Linux box. 
[~thejas], what is the jdk version you are using and Mac or Linux?

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498357#comment-14498357
 ] 

Rohini Palaniswamy commented on PIG-4509:
-

I see the problem with the code. But it is odd that it builds fine for me with 
ant clean jar -Dhadoopversion=23 and eclipse also does not show any errors. Let 
me put up a patch to fix that.

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498325#comment-14498325
 ] 

Thejas M Nair commented on PIG-4509:


[~rohini] This results in a compilation failure. 

{code}
src/org/apache/pig/backend/hadoop/executionengine/tez/TezSessionManager.java:105:
 error: unreported exception Throwable; must be caught or declared to be thrown
[javac] throw e;
[javac] ^
{code}

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (PIG-4505) [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497370#comment-14497370
 ] 

Rohini Palaniswamy edited comment on PIG-4505 at 4/16/15 3:28 PM:
--

AM creates one thread for each nodemanager it needs to talk to and can be high 
in a big cluster with more nodes when number of tasks is high requiring more 
native memory (thread stack size). More threads will not only use up pmem, but 
also will allocate lot more vmem most of which mostly go unused. In fact jdk8 
and 64bit jvms use lot more virtual address space. Since pmem-vmem ratio is 
usually being turned off or set to a higher value in Hadoop 2.x when jdk8 is 
being used, virtual address space usage exceeding container size or pmem-vmem 
ratio is not an issue for 64 bit jvms. But with 32-bit jvm (irrespective of 
value of pmem-vmem ratio), jvm cannot go beyond 4G limit for allocating virtual 
address space and that is where it hits the problem. So need to keep the heap 
size small giving room for it to not hit 4G limit.


was (Author: rohini):
AM creates one thread for each nodemanager it needs to talk to and can be high 
in a big cluster with more nodes when number of tasks is high requiring more 
native memory (thread stack size). More threads will not only use up pmem, but 
also will allocate lot more vmem most of which mostly go unused. In fact jdk8 
and 64bit jvms use lot more virtual address space. Since pmem-vmem ratio is 
turned off in Hadoop 2.x, virtual address space usage exceeding container size 
or pmem-vmem ratio is not an issue. But with 32-bit jvm, jvm cannot go beyond 
4G limit for allocating virtual address space and that is where it hits the 
problem. So need to keep the heap size small giving room for it to not hit 4G 
limit.

> [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx
> ---
>
> Key: PIG-4505
> URL: https://issues.apache.org/jira/browse/PIG-4505
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4505-1.patch
>
>
>   If the cluster is big and can launch many containers, Tez can try to create 
> more threads to talk to the nodes and that can cause 
> java.lang.OutOfMemoryError: unable to create new native thread as there is 
> only 512MB of native memory (4GB limit due to 32-bit jvm) to create threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy resolved PIG-4509.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

Committed to trunk. Thanks for the review Daniel.

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4508) [Pig on Tez] PigProcessor check for commit only on MROutput

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4508:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the review Daniel.

> [Pig on Tez] PigProcessor check for commit only on MROutput
> ---
>
> Key: PIG-4508
> URL: https://issues.apache.org/jira/browse/PIG-4508
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4508-1.patch, PIG-4508-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4505) [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4505:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks for the review Daniel.

> [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx
> ---
>
> Key: PIG-4505
> URL: https://issues.apache.org/jira/browse/PIG-4505
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4505-1.patch
>
>
>   If the cluster is big and can launch many containers, Tez can try to create 
> more threads to talk to the nodes and that can cause 
> java.lang.OutOfMemoryError: unable to create new native thread as there is 
> only 512MB of native memory (4GB limit due to 32-bit jvm) to create threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4503) [Pig on Tez] NPE in UnionOptimizer with multiple levels of union

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4503:

Attachment: PIG-4503-1.patch

Changes done:
   - Reversed the equal checks in UnionOptimizer to avoid NPE
   - Added an additional store to TestTezCompiler.testUnionUnion to simulate 
this issue.
   - testUnionScalar is a new test. Not related to the issue, but included it 
as I wrote the test while initially debugging the big script in this issue and 
is a good one to have to cover the missing case of union and scalar. 

> [Pig on Tez] NPE in UnionOptimizer with multiple levels of union
> 
>
> Key: PIG-4503
> URL: https://issues.apache.org/jira/browse/PIG-4503
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4503-1.patch
>
>
>When there are multiple levels of union, with the last union having both 
> store and an output (group by,join, etc) following it then there is a NPE in 
> {code}
> if (succ.isVertexGroup()
> && succ.getVertexGroupInfo().getOutput()
> .equals(succOp.getOperatorKey().toString())) {
> succOpVertexGroup = succ;
> break;
> }
> {code}
> It should check for getOutput() != null as it now has a store vertexgroup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4503) [Pig on Tez] NPE in UnionOptimizer with multiple levels of union

2015-04-16 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4503:

Status: Patch Available  (was: Open)

> [Pig on Tez] NPE in UnionOptimizer with multiple levels of union
> 
>
> Key: PIG-4503
> URL: https://issues.apache.org/jira/browse/PIG-4503
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4503-1.patch
>
>
>When there are multiple levels of union, with the last union having both 
> store and an output (group by,join, etc) following it then there is a NPE in 
> {code}
> if (succ.isVertexGroup()
> && succ.getVertexGroupInfo().getOutput()
> .equals(succOp.getOperatorKey().toString())) {
> succOpVertexGroup = succ;
> break;
> }
> {code}
> It should check for getOutput() != null as it now has a store vertexgroup



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4418) NullPointerException in JVMReuseImpl

2015-04-16 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498075#comment-14498075
 ] 

Rohini Palaniswamy commented on PIG-4418:
-

Just general prevention

> NullPointerException in JVMReuseImpl
> 
>
> Key: PIG-4418
> URL: https://issues.apache.org/jira/browse/PIG-4418
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Jeff Zhang
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4418-1.patch
>
>
> {code}
> 2015-02-13 15:17:11,067 INFO [TezChild] task.TezTaskRunner: Encounted an 
> error while executing task: attempt_1423730493153_0019_1_04_02_0
> java.lang.NullPointerException
>   at org.apache.pig.JVMReuseImpl.cleanupStaticData(JVMReuseImpl.java:46)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.close(PigProcessor.java:175)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:338)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:166)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Jenkins build is back to normal : Pig-trunk-commit #2098

2015-04-16 Thread Apache Jenkins Server
See 



[jira] [Commented] (PIG-4507) Problem with REGEX which just match for the first word

2015-04-16 Thread Adrien Bidault (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497873#comment-14497873
 ] 

Adrien Bidault commented on PIG-4507:
-

Thanks for your answer, I really appreciate it.

We have thought of a similar technique. However there is a particularity to the 
ponctuation cleaning in our case: for instance, we need to suppress all the 
dots excpet for those that belong to the "word" in a given list (for instance 
since ".net" is a part of this list, dot being a part of this word must not be 
cleaned out). This list is pretty extensive, thus looping trhough it with a 
condition seems tedious and inefficient. This is where the idea of the REGEX 
treatments stemmed from. 

At the moment we are doing this but it's not efficient because of the fact that 
the list of the "reserved termes" may contain couple of words and the 
comparison with a simple token never matches.
Exemple:
Now we have (.net)
  (3.0)
And If we just keep the couples of terms (.net 3.0) it can't work here.

 Consequently, the need to apply the REGEX to the entire string (not a 
collection of tokens).

clean2 = FOREACH clean1 GENERATE id, FLATTEN(TOKENIZE(query)) as query;
clean3 = FILTER clean2 by query MATCHES '(.net)|(.net 3.0)|(.net 
4.0)|.*(\\w+).*' ; (it's just an extract of the REGEX)

Regards

Adrien


> Problem with REGEX which just match for the first word
> --
>
> Key: PIG-4507
> URL: https://issues.apache.org/jira/browse/PIG-4507
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.12.0
> Environment: IBM Infosphere BigInsights v3.0.0.1
>Reporter: Adrien Bidault
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> I am trying to eliminate punctuation and special symbols from a string using 
> REGEX of a type "(\\w+)". The problem is that this REGEX treatment is applied 
> to the first word of the string only.
> Example:
> clean3 = FOREACH clean1 GENERATE id, REGEX_EXTRACT_ALL('toto,  likes ... to 
> play ', '(\\w+)');
> It just resturn "toto" instead of "toto likes to play"
> Would you guys have any ideas?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4418) NullPointerException in JVMReuseImpl

2015-04-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497699#comment-14497699
 ] 

Daniel Dai commented on PIG-4418:
-

Rohini, did you rootcause it, or just a general prevention of NPE?

> NullPointerException in JVMReuseImpl
> 
>
> Key: PIG-4418
> URL: https://issues.apache.org/jira/browse/PIG-4418
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Jeff Zhang
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4418-1.patch
>
>
> {code}
> 2015-02-13 15:17:11,067 INFO [TezChild] task.TezTaskRunner: Encounted an 
> error while executing task: attempt_1423730493153_0019_1_04_02_0
> java.lang.NullPointerException
>   at org.apache.pig.JVMReuseImpl.cleanupStaticData(JVMReuseImpl.java:46)
>   at 
> org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.close(PigProcessor.java:175)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.close(LogicalIOProcessorRuntimeTask.java:338)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:171)
>   at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:166)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4508) [Pig on Tez] PigProcessor check for commit only on MROutput

2015-04-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497693#comment-14497693
 ] 

Daniel Dai commented on PIG-4508:
-

+1

> [Pig on Tez] PigProcessor check for commit only on MROutput
> ---
>
> Key: PIG-4508
> URL: https://issues.apache.org/jira/browse/PIG-4508
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4508-1.patch, PIG-4508-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4509) [Pig on Tez] Unassigned applications not killed on shutdown

2015-04-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497675#comment-14497675
 ] 

Daniel Dai commented on PIG-4509:
-

+1

> [Pig on Tez] Unassigned applications not killed on shutdown
> ---
>
> Key: PIG-4509
> URL: https://issues.apache.org/jira/browse/PIG-4509
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.14.0
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4509-1.patch
>
>
>  tezclient.stop() should be called when tezClient.waitTillReady() is 
> interrupted on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4505) [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx

2015-04-16 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14497670#comment-14497670
 ] 

Daniel Dai commented on PIG-4505:
-

+1

> [Pig on Tez] Auto adjust AM memory can hit OOM with 3.5GXmx
> ---
>
> Key: PIG-4505
> URL: https://issues.apache.org/jira/browse/PIG-4505
> Project: Pig
>  Issue Type: Bug
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.15.0
>
> Attachments: PIG-4505-1.patch
>
>
>   If the cluster is big and can launch many containers, Tez can try to create 
> more threads to talk to the nodes and that can cause 
> java.lang.OutOfMemoryError: unable to create new native thread as there is 
> only 512MB of native memory (4GB limit due to 32-bit jvm) to create threads. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)