[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-23 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794217#action_12794217
 ] 

Sriranjan Manjunath commented on PIG-1102:
--

(3) refers to the case where we try to guess the number of records that fit 
into memory and start spilling the other records. InternalCachedBag.java 
addresses this case:

+if (cacheLimit!= 0  mContents.size() % cacheLimit == 0) {
+/* Increment the spill count*/
+incSpillCount(PigCounters.PROACTIVE_SPILL_COUNT);  
  
+}
 }

cacheLimit holds the number of records that can be held in memory whereas 
mContents is the tuple that holds all the records. Here, I do not increment the 
counter for every record. Instead I count every n'th record, n being the 
cacheLimit.

This however, does not increment the counter by the buffer size. Incrementing 
it by the buffer size will give us a value which approximately equal to the 
number of spilled records.

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch, PIG_1102.patch.1


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-22 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793771#action_12793771
 ] 

Sriranjan Manjunath commented on PIG-1102:
--

1. The default is -1 to distinguish it from the case where there is no spill. 
The value is set to -1, if counters could not be initialized which is an 
exception.
2. warn is a misnomer. I reused an existing function which updates the counter 
if initialized. If the counter is not initialized, it dumps a warning.
3,4. I have fixed these and submitted a new patch.


 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch, PIG_1102.patch.1


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793868#action_12793868
 ] 

Olga Natkovich commented on PIG-1102:
-

I could not figure out how the patch addressed (3). Could you, please, explain.

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch, PIG_1102.patch.1


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793911#action_12793911
 ] 

Hadoop QA commented on PIG-1102:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428760/PIG_1102.patch.1
  against trunk revision 893053.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 411 release audit warnings 
(more than the trunk's current 408 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/152/console

This message is automatically generated.

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch, PIG_1102.patch.1


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-21 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793369#action_12793369
 ] 

Olga Natkovich commented on PIG-1102:
-

A few questions/comments on the patch:

(1) I think the count should default to 0, not -1.
(2) Does increment of count have to be combined with warn statement. Does this 
mean that users will see this many warnings? If so, should we combine this with 
spill message we already print?
(3) I thought we discussed having increment per buffer not per record and to 
approximate that based on the buffer size. I did not see the code that did this.
(4) I don't think you correctly separated bags that practively spill vs the 
bags that are spilled by memory manager. All the bags created by 
DefaultBagFactory get registerf with SpillableMemoryManager and belong to the 
second category.


 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-18 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792399#action_12792399
 ] 

Hadoop QA commented on PIG-1102:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428356/PIG_1102.patch
  against trunk revision 892125.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 400 release audit warnings 
(more than the trunk's current 397 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/console

This message is automatically generated.

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-18 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792612#action_12792612
 ] 

Olga Natkovich commented on PIG-1102:
-

I will be reviewing this patch

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-18 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792680#action_12792680
 ] 

Sriranjan Manjunath commented on PIG-1102:
--

I ran the test again on my local machine, and it passes. The test failed 
because of too many open file descriptors. Is this a hudson related issue?

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12791810#action_12791810
 ] 

Hadoop QA commented on PIG-1102:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12428253/PIG_1102.patch
  against trunk revision 891499.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 406 release audit warnings 
(more than the trunk's current 403 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/134/console

This message is automatically generated.

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0

 Attachments: PIG_1102.patch


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1102) Collect number of spills per job

2009-12-01 Thread Sriranjan Manjunath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784148#action_12784148
 ] 

Sriranjan Manjunath commented on PIG-1102:
--

Hadoop currently does not provide us average CPU usage  / mem usage per job. It 
even does not provide the number of spills per job. I have created a jira 
requesting the same: https://issues.apache.org/jira/browse/MAPREDUCE-1257

The only information we can currently gather is the number of spill records.

 Collect number of spills per job
 

 Key: PIG-1102
 URL: https://issues.apache.org/jira/browse/PIG-1102
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Fix For: 0.7.0


 Memory shortage is one of the main performance issues in Pig. Knowing when we 
 spill do the disk is useful for understanding query performance and also to 
 see how certain changes in Pig effect that.
 Other interesting stats to collect would be average CPU usage and max mem 
 usage but I am not sure if this information is easily retrievable.
 Using Hadoop counters for this would make sense.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.