[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055199#comment-15055199
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

[~lewuathe], thanks for working on this issue. About the patch, We don't need 
to calculate the count for the entries being removed.
Can we do all the calculations in the {{else}} section:
{code}
if(firstValue.didMoveFail() &&
firstValue.jobIndexInfo.getFinishTime() <= cutoff) {
...
} else {
  if (firstValue.didMoveFail()) {
if (moveFailedCount == 0) {
  firstMoveFailedKey = key;
}
moveFailedCount += 1;
  } else {
if (inIntermediateCount == 0) {
  firstInIntermediateKey = key;
}
inIntermediateCount += 1;
  }
}
{code}

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread Kai Sasaki (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kai Sasaki updated MAPREDUCE-6436:
--
Attachment: MAPREDUCE-6436.3.patch

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055308#comment-15055308
 ] 

Hadoop QA commented on MAPREDUCE-6436:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 
20s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
11s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
39s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} trunk passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} trunk passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 11s 
{color} | {color:red} Patch generated 5 new checkstyle issues in 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs 
(total was 16, now 21). {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 25s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
14s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} Patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
47s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s 
{color} | {color:green} the patch passed with JDK v1.8.0_66 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s 
{color} | {color:green} the patch passed with JDK v1.7.0_91 {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 5m 55s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.8.0_66. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 6m 12s 
{color} | {color:green} hadoop-mapreduce-client-hs in the patch passed with JDK 
v1.7.0_91. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 23s 
{color} | {color:red} Patch generated 14 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 27m 42s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:0ca8df7 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12777379/MAPREDUCE-6436.3.patch
 |
| JIRA Issue | MAPREDUCE-6436 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux 0098dd90cfb6 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT 

[jira] [Commented] (MAPREDUCE-6417) MapReduceClient's primitives.h is toxic and should be extirpated

2015-12-13 Thread Binglin Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055444#comment-15055444
 ] 

Binglin Chang commented on MAPREDUCE-6417:
--

I think it's probably OK to remove them, most of those method are added because 
of old compilers(gcc3) or glibc, which have inefficient memcpy & memcmp, like 
which we used in 2011 in our production environment, I recall it's the main 
reason adding them.
At least on macosx clang and gcc4.4+, I see memcpy are pretty fast. 
So if we are drop old compilers/oses support and need to support sparc/arm, use 
system library is better.


> MapReduceClient's primitives.h is toxic and should be extirpated
> 
>
> Key: MAPREDUCE-6417
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6417
> Project: Hadoop Map/Reduce
>  Issue Type: Sub-task
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Alan Burlison
>Assignee: Alan Burlison
>Priority: Blocker
> Attachments: MAPREDUCE-6417.001.patch
>
>
> MapReduceClient's primitives.h attempts to provide optimised versions of 
> standard library memory copy and comparison functions. It has been the 
> subject of several portability-related bugs:
> * HADOOP-11505 hadoop-mapreduce-client-nativetask uses bswap where be32toh is 
> needed, doesn't work on non-x86
> * HADOOP-11665 Provide and unify cross platform byteorder support in native 
> code
> * MAPREDUCE-6397 MAPREDUCE makes many endian-dependent assumptions
> * HADOOP-11484 hadoop-mapreduce-client-nativetask fails to build on ARM 
> AARCH64 due to x86 asm statements
> At present it only works on x86 and ARM64 as it lacks definitions for bswap 
> and bswap64 for any platforms other than those.
> However it has even more serious problems on non-x86 architectures, for 
> example on SPARC simple_memcpy simply doesn't work at all:
> {code}
> $ cat bang.cc
> #include 
> #define SIMPLE_MEMCPY
> #include "primitives.h"
> int main(int argc, char **argv)
> {
> char b1[9];
> char b2[9];
> simple_memcpy(b2, b1, sizeof(b1));
> }
> $ gcc -o bang bang.cc && ./bang
> Bus Error (core dumped)
> {code}
> That's because simple_memcpy does pointer fiddling that results in misaligned 
> accesses, which are illegal on SPARC.
> fmemcmp is also broken. Even if a definition of bswap is provided, on 
> big-endian architectures the result is simply wrong because of its 
> unconditional use of bswap:
> {code}
> $ cat thud.cc
> #include 
> #include 
> #include "primitives.h"
> int main(int argc, char **argv)
> {
> char a[] = { 0,1,2,0 };
> char b[] = { 0,2,1,0 };
> printf("%lld %d\n", fmemcmp(a, b, sizeof(a), memcmp(a, b, sizeof(a;
> }
> $ g++ -o thud thud.cc && ./thud
> 65280 -1
> {code}
> And in addition fmemcmp suffers from the same misalignment issues as 
> simple_memcpy and coredumps on SPARC when asked to compare odd-sized buffers.
> primitives.h provides the following functions:
> * bswap - used in 12 files in MRC but as HADOOP-11505 points out, mostly 
> incorrectly as it takes no account of platform endianness
> * bswap64 - used in 4 files in MRC, same comments as per bswap apply
> * simple_memcpy - used in 3 files in MRC, should be replaced with the 
> standard memcpy
> * fmemcmp - used in 1 file, should be replaced with the standard memcmp
> * fmemeq - used in 1 file, should be replaced with the standard memcmp
> * frmemeq - not used at all, should just be removed
> *Summary*: primitives.h should simply be deleted and replaced with the 
> standard memory copy & compare functions, or with thin wrappers around them 
> where the APIs are different.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6436) JobHistory cache issue

2015-12-13 Thread zhihai xu (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055461#comment-15055461
 ] 

zhihai xu commented on MAPREDUCE-6436:
--

Thanks for updating the patch [~lewuathe]! the new patch looks good except the 
checkstyle issue.
{code}
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:265:
 Line is longer than 80 characters (found 97).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:267:
 Line is longer than 80 characters (found 102).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:268:
 Line is longer than 80 characters (found 118).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:271:
 Line is longer than 80 characters (found 94).
./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java:272:
 Line is longer than 80 characters (found 114).
{code}
Could you fix the above checkstyle issue?

> JobHistory cache issue
> --
>
> Key: MAPREDUCE-6436
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6436
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Ryu Kobayashi
>Assignee: Kai Sasaki
> Attachments: MAPREDUCE-6436.1.patch, MAPREDUCE-6436.2.patch, 
> MAPREDUCE-6436.3.patch, stacktrace1.txt, stacktrace2.txt, stacktrace3.txt
>
>
> Problem: 
> HistoryFileManager.addIfAbsent produces large amount of logs if number of
> cached entries whose age is less than mapreduce.jobhistory.max-age-ms becomes
> larger than mapreduce.jobhistory.joblist.cache.size by far.
> Example:
> For example, if the cache contains 5 entries in total and 10,000 entries
> newer than mapreduce.jobhistory.max-age-ms where
> mapreduce.jobhistory.joblist.cache.size is 2, 
> HistoryFileManager.addIfAbsent
> method produces 5 - 2 = 3 lines of "Waiting to remove  from
> JobListCache because it is not in done yet" message.
> It will attach a stacktrace.
> Impact:
> In addition to large disk consumption, this issue blocks JobHistory.getJob
> long time and slows job execution down significantly because getJob is called
> by RPC such as HistoryClientService.HSClientProtocolHandler.getJobReport.
> This impact happens because HistoryFileManager.UserLogDir.scanIfNeeded
> eventually calls HistoryFileManager.addIfAbsent in a synchronized block. When
> multiple threads call scanIfNeeded simultaneously, one of them acquires lock
> and the other threads are blocked until the first thread completes 
> long-running
> HistoryFileManager.addIfAbsent call.
> Solution: 
> * Reduce amount of logs so that HistoryFileManager.addIfAbsent doesn't take 
> too long time.
> * Good to have if possible: HistoryFileManager.UserLogDir.scanIfNeeded skips
>   scanning if another thread is already scanning. This changes semantics of
>   some HistoryFileManager methods (such as getAllFileInfo and getFileInfo)
>   because scanIfNeeded keep outdated state.
> * Good to have if possible: Make scanIfNeeded asynchronous so that RPC calls 
> are
>   not blocked by a loop at scale of tens of thousands.
>  
> This patch implemented the first item.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)