Re: Welcome our new Pig PMC chair Daniel Dai

2016-03-23 Thread Xuefu Zhang
Congratulations, Daniel!

On Wed, Mar 23, 2016 at 3:23 PM, Rohini Palaniswamy  wrote:

> Hi folks,
> I am very happy to announce that we elected Daniel Dai as our new Pig
> PMC Chair and it is official now.  Please join me in congratulating Daniel.
>
> Regards,
> Rohini
>


Welcome our new Pig PMC chair Daniel Dai

2016-03-23 Thread Rohini Palaniswamy
Hi folks,
I am very happy to announce that we elected Daniel Dai as our new Pig
PMC Chair and it is official now.  Please join me in congratulating Daniel.

Regards,
Rohini


[jira] [Commented] (PIG-4849) pig on tez will cause tez-ui to crash,because the content from timeline server is too long.

2016-03-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209274#comment-15209274
 ] 

Rohini Palaniswamy commented on PIG-4849:
-

  We do run with the following setting in our tez-site.xml. We should also 
probably put this in pig-default.properties so that other users do not hit this 
issue. 

{code}

Publish configuration information to Timeline 
server.
tez.runtime.convert.user-payload.to.history-text
false
  
{code}

> pig on tez will cause tez-ui to crash,because the content from timeline 
> server is too long. 
> 
>
> Key: PIG-4849
> URL: https://issues.apache.org/jira/browse/PIG-4849
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
> Environment: pig:0.15,tez:0.6.2,hadoop:2.5.0
>Reporter: shenxianqiang
>
> After running several Pig Job,the tez-ui server crashed,because the content 
> from timeline server exceed 80MB.
> When I input :
> http://timeline:48188/ws/v1/timeline/TEZ_DAG_ID?limit=51
> The content is too long...
> Like this :
> {quote}
> {"vertexName":"scope-18","processorClass":"org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor","userPayloadAsText":"{\"desc\":\"wordcount[4,12]
>  
> (GROUP_BY)\",\"config\":{\"dfs.datanode.data.dir\":\"file:\\/\\/\\/search\\/hadoop\\/dfs_data,\",\"dfs.namenode.checkpoint.txns\":\"100\",\"s3.replication\":\"3\",\"mapreduce.output.fileoutputformat.compress.type\":\"RECORD\",\"mapreduce.jobtracker.jobhistory.lru.cache.size\":\"5\",\"hadoop.http.filter.initializers\":\"org.apache.hadoop.http.lib.StaticUserWebFilter\",\"yarn.nodemanager.keytab\":\"\\/etc\\/krb5.keytab\",\"nfs.mountd.port\":\"4242\",\"yarn.resourcemanager.zk-acl\":\"world:anyone:rwcda\",\"dfs.https.server.keystore.resource\":\"ssl-server.xml\",\"mapr..
> {quote}
> I am surprised that each vertex has a "userPayloadAsText",and "config" 
> information is particularly large.
> When I run a more complex pig on tez job, the tez-ui server is easy to crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Build failed in Jenkins: Pig-trunk-commit #2306

2016-03-23 Thread Apache Jenkins Server
See 

Changes:

[rohini] PIG-4847: POPartialAgg processing and spill improvements (rohini)

[rohini] PIG-4847: POPartialAgg processing and spill improvements (rohini)

--
[...truncated 2589 lines...]
A license/javacc-LICENSE.txt
A license/junit-LICENSE.txt
A license/hadoop-LICENSE.txt
A shims
A shims/src
A shims/src/hadoop20
A shims/src/hadoop20/org
A shims/src/hadoop20/org/apache
A shims/src/hadoop20/org/apache/pig
A shims/src/hadoop20/org/apache/pig/backend
A shims/src/hadoop20/org/apache/pig/backend/hadoop
A shims/src/hadoop20/org/apache/pig/backend/hadoop/executionengine
A 
shims/src/hadoop20/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer
A 
shims/src/hadoop20/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduce.java
A 
shims/src/hadoop20/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapBase.java
A shims/src/hadoop20/org/apache/pig/backend/hadoop/executionengine/shims
A 
shims/src/hadoop20/org/apache/pig/backend/hadoop/executionengine/shims/HadoopShims.java
A shims/src/hadoop20/org/apache/pig/backend/hadoop20
A shims/src/hadoop20/org/apache/pig/backend/hadoop20/PigJobControl.java
A shims/src/hadoop23
A shims/src/hadoop23/org
A shims/src/hadoop23/org/apache
A shims/src/hadoop23/org/apache/pig
A shims/src/hadoop23/org/apache/pig/backend
A shims/src/hadoop23/org/apache/pig/backend/hadoop
A shims/src/hadoop23/org/apache/pig/backend/hadoop/executionengine
A 
shims/src/hadoop23/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer
A 
shims/src/hadoop23/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapReduce.java
A 
shims/src/hadoop23/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigMapBase.java
A shims/src/hadoop23/org/apache/pig/backend/hadoop/executionengine/shims
A 
shims/src/hadoop23/org/apache/pig/backend/hadoop/executionengine/shims/HadoopShims.java
A shims/src/hadoop23/org/apache/pig/backend/hadoop23
A shims/src/hadoop23/org/apache/pig/backend/hadoop23/PigJobControl.java
A shims/src/hadoop23/org/apache/hadoop
A shims/src/hadoop23/org/apache/hadoop/mapred
A shims/src/hadoop23/org/apache/hadoop/mapred/DowngradeHelper.java
A shims/test
A shims/test/hadoop20
A shims/test/hadoop20/org
A shims/test/hadoop20/org/apache
A shims/test/hadoop20/org/apache/pig
A shims/test/hadoop20/org/apache/pig/test
A shims/test/hadoop20/org/apache/pig/test/MiniCluster.java
A shims/test/hadoop20/org/apache/pig/test/TezMiniCluster.java
A shims/test/hadoop23
A shims/test/hadoop23/org
A shims/test/hadoop23/org/apache
A shims/test/hadoop23/org/apache/pig
A shims/test/hadoop23/org/apache/pig/test
A shims/test/hadoop23/org/apache/pig/test/MiniCluster.java
A shims/test/hadoop23/org/apache/pig/test/TezMiniCluster.java
 U.
At revision 1736379
Cleaning local Directory nightly
Checking out http://svn.apache.org/repos/asf/hadoop/nightly at revision 
'2016-03-23T20:58:22.418 +'
A buildMR-279Branch.sh
AUhudsonBuildHadoopPatch.sh
AUhudsonBuildHadoopRelease.sh
AUprocessHadoopPatchEmailRemote.sh
AUprocessHadoopPatchEmail.sh
A README.txt
A test-patch
A test-patch/test-patch.sh
AUtar-munge
A commitBuild.sh
AUjenkinsPrecommitAdmin.py
A hudsonEnv.sh
A jenkinsSetup
A jenkinsSetup/installTools.sh
AUhudsonBuildHadoopNightly.sh
At revision 1736379
no change for http://svn.apache.org/repos/asf/hadoop/nightly since the previous 
build
[Pig-trunk-commit] $ /bin/bash /tmp/hudson2384814851134059296.sh
6


==
==
CLEAN: cleaning workspace
==
==


Buildfile: 

clean:

clean:

clean:

BUILD SUCCESSFUL
Total time: 1 second


==
==
BUILD: ant mvn-deploy -Dtest.junit.output.format=xml 
-Dfindbugs.home=$FINDBUGS_HOME -Djava5.home=$JAVA5_HOME 
-Dforrest.home=$FORREST_HOME -Dclover.home=$CLOVER_HOME 
-Declipse.home=$ECLIPSE_HOME
==
==



[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Daniel and Koji for the review.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch, 
> PIG-4847-3-reviewcomments.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209125#comment-15209125
 ] 

Koji Noguchi commented on PIG-4847:
---

Thanks Rohini.  +1.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch, 
> PIG-4847-3-reviewcomments.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

Attachment: PIG-4847-3-reviewcomments.patch

Thanks Koji. Makes sense. Updated patch with review comments.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch, 
> PIG-4847-3-reviewcomments.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209059#comment-15209059
 ] 

Koji Noguchi commented on PIG-4847:
---

Can we get rid of the 1G condition? 
Why not simply compare {{30% of oldgen}} and 
{{pig.spill.unused.memory.threshold.size}} and take the smaller one? 
Also maybe use pig.spill.unused.memory.threshold.size=0 to disable this new 
condition.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15209003#comment-15209003
 ] 

Rohini Palaniswamy commented on PIG-4847:
-

Also wanted to note that I wanted to get rid of collection threshold 
notifications as when big sort buffers are involved we will keep hitting it. 
But retained them and kept them same as usage threshold due to the below 
comment in the code.

{code}
// we want to set both collection and usage threshold alerts to be
// safe. In some local tests after a point only collection threshold
// notifications were being sent though usage threshold notifications
// were sent early on.
{code}

Another thing that would happen with this patch is it might try to free more 
memory as threshold size is higher now. spillFileSizeThreshold of 5MB should 
avoid spill of small bags. But if there is unnecessary spill of other bigger 
bags we might have to cap toFree at a maximum fixed size instead of 50% of 
thresholdsize so that toFree is not too big for bigger heaps.
{code}
toFree = info.getUsage().getUsed() - memoryThresholdSize + 
(long)(memoryThresholdSize * 0.5);
{code}

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208890#comment-15208890
 ] 

Rohini Palaniswamy commented on PIG-4847:
-

This patch reduced runtime of the problem job which used 4G heap with 
mapPartAgg from 2 hr 40 mins to 43 min. Keeping fingers crossed for any corner 
case OOMs with pig scripts having big bag spills.

bq. Also let's turn on mapPartAgg by default in 0.16.
  Definitely. Will do that in a separate patch before the release. 



> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208876#comment-15208876
 ] 

Daniel Dai commented on PIG-4847:
-

+1. Also let's turn on mapPartAgg by default in 0.16.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

Attachment: PIG-4847-2.patch

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

Attachment: (was: PIG-4847-2.patch)

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

Status: Patch Available  (was: Open)

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

Attachment: PIG-4847-2.patch

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch, PIG-4847-2.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208727#comment-15208727
 ] 

Rohini Palaniswamy commented on PIG-4847:
-

Better to make everything configurable. Attaching a new patch.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208635#comment-15208635
 ] 

Daniel Dai commented on PIG-4847:
-

+1

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (PIG-4847) POPartialAgg processing and spill improvements

2016-03-23 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-4847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4847:

Attachment: PIG-4847-1.patch

Changes done:
SpillabeMemoryManager:
- Set the collection and usage threshold to 70% of old gen or (old gen size 
- 350 MB). This would avoid unnecessary spills with bigger heaps. Previously 
collection threshold was set at 50% which was causing unnecessary spills. 
Especially in Tez with multiple inputs and outputs, the sort buffers 
(io.sort.mb) can take up a lot of space. For eg: One user had 2G heap 
configured and io.sort.mb 896. Spill was triggered around 1G lot of times 
because the 896MB sort buffer cannot be GCed and collection threshold was hit 
way too often.

POPartialAgg:
   - For the same case above with thresholds - Primary: 170629. Secondary: 
28438, due to eary trigger of spills there would only < 1000 entries in primary 
before POPartialAgg.spill() is invoked. Secondary value stayed around 20K and 
so aggregation was very inefficient
   - Avoided running through valuePlans if it was only single tuple.
   - Update processedMap inplace instead of creating another hashmap of same 
size.

> POPartialAgg processing and spill improvements
> --
>
> Key: PIG-4847
> URL: https://issues.apache.org/jira/browse/PIG-4847
> Project: Pig
>  Issue Type: Improvement
>Reporter: Rohini Palaniswamy
>Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4847-1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4849) pig on tez will cause tez-ui to crash,because the content from timeline server is too long.

2016-03-23 Thread shenxianqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208139#comment-15208139
 ] 

shenxianqiang commented on PIG-4849:


In src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
In newVertex function calls (vertexInfo, payloadConf).
This is the cause to lead tez-ui to crash.
But,In tez-0.6,2,there is a 
property:tez.runtime.convert.user-payload.to.history-text
It is mainly used to control the output of user.payload information.
We modify this bug in tez or pig? Any suggestions?

> pig on tez will cause tez-ui to crash,because the content from timeline 
> server is too long. 
> 
>
> Key: PIG-4849
> URL: https://issues.apache.org/jira/browse/PIG-4849
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.15.0
> Environment: pig:0.15,tez:0.6.2,hadoop:2.5.0
>Reporter: shenxianqiang
>
> After running several Pig Job,the tez-ui server crashed,because the content 
> from timeline server exceed 80MB.
> When I input :
> http://timeline:48188/ws/v1/timeline/TEZ_DAG_ID?limit=51
> The content is too long...
> Like this :
> {quote}
> {"vertexName":"scope-18","processorClass":"org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor","userPayloadAsText":"{\"desc\":\"wordcount[4,12]
>  
> (GROUP_BY)\",\"config\":{\"dfs.datanode.data.dir\":\"file:\\/\\/\\/search\\/hadoop\\/dfs_data,\",\"dfs.namenode.checkpoint.txns\":\"100\",\"s3.replication\":\"3\",\"mapreduce.output.fileoutputformat.compress.type\":\"RECORD\",\"mapreduce.jobtracker.jobhistory.lru.cache.size\":\"5\",\"hadoop.http.filter.initializers\":\"org.apache.hadoop.http.lib.StaticUserWebFilter\",\"yarn.nodemanager.keytab\":\"\\/etc\\/krb5.keytab\",\"nfs.mountd.port\":\"4242\",\"yarn.resourcemanager.zk-acl\":\"world:anyone:rwcda\",\"dfs.https.server.keystore.resource\":\"ssl-server.xml\",\"mapr..
> {quote}
> I am surprised that each vertex has a "userPayloadAsText",and "config" 
> information is particularly large.
> When I run a more complex pig on tez job, the tez-ui server is easy to crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (PIG-4849) pig on tez will cause tez-ui to crash,because the content from timeline server is too long.

2016-03-23 Thread shenxianqiang (JIRA)
shenxianqiang created PIG-4849:
--

 Summary: pig on tez will cause tez-ui to crash,because the content 
from timeline server is too long. 
 Key: PIG-4849
 URL: https://issues.apache.org/jira/browse/PIG-4849
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.15.0
 Environment: pig:0.15,tez:0.6.2,hadoop:2.5.0
Reporter: shenxianqiang


After running several Pig Job,the tez-ui server crashed,because the content 
from timeline server exceed 80MB.
When I input :
http://timeline:48188/ws/v1/timeline/TEZ_DAG_ID?limit=51
The content is too long...
Like this :
{quote}
{"vertexName":"scope-18","processorClass":"org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor","userPayloadAsText":"{\"desc\":\"wordcount[4,12]
 
(GROUP_BY)\",\"config\":{\"dfs.datanode.data.dir\":\"file:\\/\\/\\/search\\/hadoop\\/dfs_data,\",\"dfs.namenode.checkpoint.txns\":\"100\",\"s3.replication\":\"3\",\"mapreduce.output.fileoutputformat.compress.type\":\"RECORD\",\"mapreduce.jobtracker.jobhistory.lru.cache.size\":\"5\",\"hadoop.http.filter.initializers\":\"org.apache.hadoop.http.lib.StaticUserWebFilter\",\"yarn.nodemanager.keytab\":\"\\/etc\\/krb5.keytab\",\"nfs.mountd.port\":\"4242\",\"yarn.resourcemanager.zk-acl\":\"world:anyone:rwcda\",\"dfs.https.server.keystore.resource\":\"ssl-server.xml\",\"mapr..
{quote}
I am surprised that each vertex has a "userPayloadAsText",and "config" 
information is particularly large.
When I run a more complex pig on tez job, the tez-ui server is easy to crash.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4848) pig.noSplitCombination=true should always be set internally for a merge join

2016-03-23 Thread Xianda Ke (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15208067#comment-15208067
 ] 

Xianda Ke commented on PIG-4848:


In MR mode, the flag was set as true internally for a merge join.
{code}
MRCompiler.visitMergeJoin() {
 //...
 curMROp.noCombineSmallSplits();
 //...
}
{code}
and
{code}
JobControlCompiler.getJob() {
//..
if (!mro.combineSmallSplits() || 
pigContext.getProperties().getProperty("pig.splitCombination", 
"true").equals("false"))
conf.setBoolean("pig.noSplitCombination", true);
//..
}

{code}

However, it doesn't work now in MR mode. The output is still out of order, 
because the input splits of pig will be sorted again based on size by hadoop.
{code:title=org.apache.hadoop.mapreduce.JobSubmitter.java}
writeNewSplits () {
List splits = input.getSplits(job);
//...
T[] array = (T[]) splits.toArray(new InputSplit[splits.size()]);

// sort the splits into order based on size, so that the biggest
// go first
Arrays.sort(array, new SplitComparator());
JobSplitWriter.createSplitFiles(jobSubmitDir, conf, 
jobSubmitDir.getFileSystem(conf), array);
}
{code}

In spark mode, there is no such sorting. if we set pig.noSplitCombination=true 
internally, it should work.

> pig.noSplitCombination=true should always be set internally for a merge join
> 
>
> Key: PIG-4848
> URL: https://issues.apache.org/jira/browse/PIG-4848
> Project: Pig
>  Issue Type: Sub-task
>  Components: spark
>Reporter: Xianda Ke
>Assignee: Xianda Ke
> Fix For: spark-branch
>
>
> In spark mode, for a merge join, the flag is NOT set as true internally. The 
> input splits will be in the order of file size. The output is out of order.
> Scenaro:
> cat input1
> {code}
> 1 1
> {code}
> cat input2
> {code}
> 2 2
> {code}
> cat input3
> {code}
> 3333
> {code}
> A = LOAD 'input*' as (a:int, b:int);
> B = LOAD 'input*' as (a:int, b:int);
> C = JOIN A BY $0, B BY $0 USING 'merge';
> DUMP C;
> expected result:
> {code}
> (1,1,1,1)
> (2,2,2,2)
> (33,33,33,33)
> {code}
> actual result:
> {code}
> (33,33,33,33)
> (1,1,1,1)
> (2,2,2,2)
> {code}
> In MR mode, the flag was set as true internally for a merge join(see: 
> PIG-2773). However, it doesn't work now. The output is still out of order, 
> because the splits will be ordered again by hadoop-client. In spark mode, we 
> can solve this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Subscription: PIG patch available

2016-03-23 Thread jira
Issue Subscription
Filter: PIG patch available (30 issues)

Subscriber: pigdaily

Key Summary
PIG-4788the value BytesRead metric info always returns 0 even the length of 
input file is not 0 in spark engine
https://issues.apache.org/jira/browse/PIG-4788
PIG-4771Implement FR Join for spark engine
https://issues.apache.org/jira/browse/PIG-4771
PIG-4745DataBag should protect content of passed list of tuples
https://issues.apache.org/jira/browse/PIG-4745
PIG-4734TOMAP schema inferring breaks some scripts in type checking for 
bincond
https://issues.apache.org/jira/browse/PIG-4734
PIG-4684Exception should be changed to warning when job diagnostics cannot 
be fetched
https://issues.apache.org/jira/browse/PIG-4684
PIG-4656Improve String serialization and comparator performance in 
BinInterSedes
https://issues.apache.org/jira/browse/PIG-4656
PIG-4641Print the instance of Object without using toString()
https://issues.apache.org/jira/browse/PIG-4641
PIG-4598Allow user defined plan optimizer rules
https://issues.apache.org/jira/browse/PIG-4598
PIG-4581thread safe issue in NodeIdGenerator
https://issues.apache.org/jira/browse/PIG-4581
PIG-4551Partition filter is not pushed down in case of SPLIT
https://issues.apache.org/jira/browse/PIG-4551
PIG-4539New PigUnit
https://issues.apache.org/jira/browse/PIG-4539
PIG-4526Make setting up the build environment easier
https://issues.apache.org/jira/browse/PIG-4526
PIG-4515org.apache.pig.builtin.Distinct throws ClassCastException
https://issues.apache.org/jira/browse/PIG-4515
PIG-4455Should use DependencyOrderWalker instead of DepthFirstWalker in 
MRPrinter
https://issues.apache.org/jira/browse/PIG-4455
PIG-4341Add CMX support to pig.tmpfilecompression.codec
https://issues.apache.org/jira/browse/PIG-4341
PIG-4323PackageConverter hanging in Spark
https://issues.apache.org/jira/browse/PIG-4323
PIG-4313StackOverflowError in LIMIT operation on Spark
https://issues.apache.org/jira/browse/PIG-4313
PIG-4251Pig on Storm
https://issues.apache.org/jira/browse/PIG-4251
PIG-4111Make Pig compiles with avro-1.7.7
https://issues.apache.org/jira/browse/PIG-4111
PIG-4002Disable combiner when map-side aggregation is used
https://issues.apache.org/jira/browse/PIG-4002
PIG-3952PigStorage accepts '-tagSplit' to return full split information
https://issues.apache.org/jira/browse/PIG-3952
PIG-3911Define unique fields with @OutputSchema
https://issues.apache.org/jira/browse/PIG-3911
PIG-3906ant site errors out
https://issues.apache.org/jira/browse/PIG-3906
PIG-3877Getting Geo Latitude/Longitude from Address Lines
https://issues.apache.org/jira/browse/PIG-3877
PIG-3873Geo distance calculation using Haversine
https://issues.apache.org/jira/browse/PIG-3873
PIG-3866Create ThreadLocal classloader per PigContext
https://issues.apache.org/jira/browse/PIG-3866
PIG-3864ToDate(userstring, format, timezone) computes DateTime with strange 
handling of Daylight Saving Time with location based timezones
https://issues.apache.org/jira/browse/PIG-3864
PIG-3851Upgrade jline to 2.11
https://issues.apache.org/jira/browse/PIG-3851
PIG-3668COR built-in function when atleast one of the coefficient values is 
NaN
https://issues.apache.org/jira/browse/PIG-3668
PIG-3587add functionality for rolling over dates
https://issues.apache.org/jira/browse/PIG-3587

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=16328=12322384