[jira] [Commented] (MAPREDUCE-6693) Job history entry missing when JOB name is of mapreduce.jobhistory.jobname.limit length

2016-05-11 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280373#comment-15280373
 ] 

Kousuke Saruta commented on MAPREDUCE-6693:
---

On the second thought, only 
{code}
if (encodedString.length() < limitLength)
{code}
should be changed to
{code}
if (encodedString.length() <= limitLength)
{code}

and 

{code}
index + increase > limitLength
{code}
should be kept.

The reason is if we have
{code}
if (encodedString.length() <= limitLength) {
  return encodedString;
}
{code}
the size of strBytes is at least limitLength + 1, means maximum index is 
limitLength. So even if index + increase is limitLength, it's safe.

> Job history entry missing when JOB name is of 
> mapreduce.jobhistory.jobname.limit length
> ---
>
> Key: MAPREDUCE-6693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6693
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Critical
>
> Job history entry missing when JOB name is of 
> {{mapreduce.jobhistory.jobname.limit}} character
> {noformat}
> 2016-05-10 06:51:00,674 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Interrupting 
> Event Handling thread
> 2016-05-10 06:51:00,674 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Waiting for 
> Event Handling thread to complete
> 2016-05-10 06:51:00,674 ERROR [eventHandlingThread] 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[eventHandlingThread,5,main] threw an Exception.
> java.lang.ArrayIndexOutOfBoundsException: 50
>   at 
> org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.trimURLEncodedString(FileNameIndexUtils.java:326)
>   at 
> org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:86)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1147)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:635)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:341)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-05-10 06:51:00,675 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down 
> timer for Job MetaInfo for job_1462840033869_0009 history file 
> hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist
> 2016-05-10 06:51:00,675 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down 
> timer Job MetaInfo for job_1462840033869_0009 history file 
> hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist
> 2016-05-10 06:51:00,676 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Closing Writer
> {noformat}
> Looks like 50 character check is going wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6693) Job history entry missing when JOB name is of mapreduce.jobhistory.jobname.limit length

2016-05-10 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278330#comment-15278330
 ] 

Kousuke Saruta commented on MAPREDUCE-6693:
---

Good catch. Thanks for trying fixing the issue.

{code}
if (encodedString.length() <= limitLength)
{code}

and

{code}
index + increase >= limitLength
{code}

make sense.

About adding
{code}
index + increase >= strBytes.length
{code}
I don't disagree but I wonder are there any possibility that index + increase 
>= strBytes.length.
If the size of strArray is less than 50, e.g. 49, encodedString.length() is 49, 
means encodingString.length() <= 50 so trimURLEncodingString method will return 
in

{code}
if (encodedString.length() < limitLength) {
  return encodedString;
}
{code}

If I miss something, please correct me.


> Job history entry missing when JOB name is of 
> mapreduce.jobhistory.jobname.limit length
> ---
>
> Key: MAPREDUCE-6693
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6693
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Bibin A Chundatt
>Assignee: Ajith S
>Priority: Critical
>
> Job history entry missing when JOB name is of 
> {{mapreduce.jobhistory.jobname.limit}} character
> {noformat}
> 2016-05-10 06:51:00,674 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Interrupting 
> Event Handling thread
> 2016-05-10 06:51:00,674 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Waiting for 
> Event Handling thread to complete
> 2016-05-10 06:51:00,674 ERROR [eventHandlingThread] 
> org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread 
> Thread[eventHandlingThread,5,main] threw an Exception.
> java.lang.ArrayIndexOutOfBoundsException: 50
>   at 
> org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.trimURLEncodedString(FileNameIndexUtils.java:326)
>   at 
> org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:86)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1147)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:635)
>   at 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:341)
>   at java.lang.Thread.run(Thread.java:745)
> 2016-05-10 06:51:00,675 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down 
> timer for Job MetaInfo for job_1462840033869_0009 history file 
> hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist
> 2016-05-10 06:51:00,675 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down 
> timer Job MetaInfo for job_1462840033869_0009 history file 
> hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist
> 2016-05-10 06:51:00,676 DEBUG [Thread-73] 
> org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Closing Writer
> {noformat}
> Looks like 50 character check is going wrong



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6616) Fail to create jobhistory file if there are some multibyte characters in the job name

2016-01-28 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6616:
--
Attachment: MAPREDUCE-6616.1.patch

Fixed style issues and brushed up.

> Fail to create jobhistory file if there are some multibyte characters in the 
> job name
> -
>
> Key: MAPREDUCE-6616
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6616
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Akira AJISAKA
>Assignee: Kousuke Saruta
>  Labels: i18n
> Attachments: MAPREDUCE-6616-test.patch, MAPREDUCE-6616.0.patch, 
> MAPREDUCE-6616.1.patch
>
>
> When creating jobhistory file, job name is trimmed within 50 characters by 
> default, and the name is URL-encoded *after* the job name is trimmed. 
> Therefore, if there are some multibyte characters in the job name, the 
> encoded job name can be longer than 50 characters. Eventually it can break 
> the limit of the file name (Usually 255 characters).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-27 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Attachment: MAPREDUCE-6615.0.patch

Fixed findbugs issue.

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Attachments: MAPREDUCE-6615.0.patch
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-27 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Attachment: (was: MAPREDUCE-6277.patch.0)

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6616) Fail to create jobhistory file if there are some multibyte characters in the job name

2016-01-26 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6616:
--
Attachment: MAPREDUCE-6616.0.patch

I've added the initial patch.

> Fail to create jobhistory file if there are some multibyte characters in the 
> job name
> -
>
> Key: MAPREDUCE-6616
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6616
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: jobhistoryserver
>Reporter: Akira AJISAKA
>Assignee: Kousuke Saruta
>  Labels: i18n
> Attachments: MAPREDUCE-6616-test.patch, MAPREDUCE-6616.0.patch
>
>
> When creating jobhistory file, job name is trimmed within 50 characters by 
> default, and the name is URL-encoded *after* the job name is trimmed. 
> Therefore, if there are some multibyte characters in the job name, the 
> encoded job name can be longer than 50 characters. Eventually it can break 
> the limit of the file name (Usually 255 characters).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6617) flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask

2016-01-25 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6617:
--
Attachment: MAPREDUCE-6617.0.patch

I've attached the initial patch.

> flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask
> -
>
> Key: MAPREDUCE-6617
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6617
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: MAPREDUCE-6617.0.patch
>
>
> In JobHistoryEventHandler, flushTask is not purged after it's canceled so GC 
> never sweep flushTask.
> It can cause memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6617) flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask

2016-01-25 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6617:
--
Status: Patch Available  (was: Open)

> flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask
> -
>
> Key: MAPREDUCE-6617
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6617
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: MAPREDUCE-6617.0.patch
>
>
> In JobHistoryEventHandler, flushTask is not purged after it's canceled so GC 
> never sweep flushTask.
> It can cause memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6617) flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask

2016-01-25 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created MAPREDUCE-6617:
-

 Summary: flushTimer in JobHistoryEventHandler should purge 
canceled flushTimerTask
 Key: MAPREDUCE-6617
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6617
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


In JobHistoryEventHandler, flushTask is not purged after it's canceled so GC 
never sweep flushTask.
It can cause memory leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-23 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Priority: Minor  (was: Major)

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Attachments: MAPREDUCE-6277.patch.0
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-23 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Attachment: (was: MAPREDUCE-6615.patch.0)

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: MAPREDUCE-6277.patch.0
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-23 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Attachment: MAPREDUCE-6277.patch.0

I've fixed the compile error.

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: MAPREDUCE-6277.patch.0
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-22 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Status: Patch Available  (was: Open)

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: MAPREDUCE-6615.patch.0
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-22 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-6615:
--
Attachment: MAPREDUCE-6615.patch.0

I've attached the initial patch.

> Remove useless boxing/unboxing code (Hadoop MapReduce)
> --
>
> Key: MAPREDUCE-6615
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: performance
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Attachments: MAPREDUCE-6615.patch.0
>
>
> There are lots of places where useless boxing/unboxing occur.
> To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)

2016-01-22 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created MAPREDUCE-6615:
-

 Summary: Remove useless boxing/unboxing code (Hadoop MapReduce)
 Key: MAPREDUCE-6615
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance
Affects Versions: 3.0.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


There are lots of places where useless boxing/unboxing occur.
To avoid performance issue, let's remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (MAPREDUCE-5895) FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-27 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5895:
--

Attachment: MAPREDUCE-5895.patch

I understood.
In the new patch, bos is closed and two cleanup method calls are into one 
method call.

> FileAlreadyExistsException was thrown : Temporary Index File can not be 
> cleaned up because OutputStream doesn't close properly
> --
>
> Key: MAPREDUCE-5895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
> Attachments: MAPREDUCE-5895.patch, MAPREDUCE-5895.patch
>
>
> In TaskLog.java, Temporary Index File is created by following code.
> {code}
> BufferedOutputStream bos =
>   new BufferedOutputStream(
> SecureIOUtils.createForWrite(tmpIndexFile, 0644));
> DataOutputStream dos = new DataOutputStream(bos);
> {code}
> The code is surrounded by try-finally so if some Exception/ERROR is thrown 
> between constructing bos and dos, temporary file is not cleaned up.
> I met the situation that when a thread ran, OOM was thrown after bos created 
> and temporary file is not cleaned up. At different time, another thread 
> executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5895) FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-27 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009771#comment-14009771
 ] 

Kousuke Saruta commented on MAPREDUCE-5895:
---

Thank you for your comment [~devaraj.k]!
I think dos.close calls bos.close internally right?

> FileAlreadyExistsException was thrown : Temporary Index File can not be 
> cleaned up because OutputStream doesn't close properly
> --
>
> Key: MAPREDUCE-5895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
> Attachments: MAPREDUCE-5895.patch
>
>
> In TaskLog.java, Temporary Index File is created by following code.
> {code}
> BufferedOutputStream bos =
>   new BufferedOutputStream(
> SecureIOUtils.createForWrite(tmpIndexFile, 0644));
> DataOutputStream dos = new DataOutputStream(bos);
> {code}
> The code is surrounded by try-finally so if some Exception/ERROR is thrown 
> between constructing bos and dos, temporary file is not cleaned up.
> I met the situation that when a thread ran, OOM was thrown after bos created 
> and temporary file is not cleaned up. At different time, another thread 
> executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5895) FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-26 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5895:
--

Summary: FileAlreadyExistsException was thrown : Temporary Index File can 
not be cleaned up because OutputStream doesn't close properly  (was: Temporary 
Index File can not be cleaned up because OutputStream doesn't close properly)

> FileAlreadyExistsException was thrown : Temporary Index File can not be 
> cleaned up because OutputStream doesn't close properly
> --
>
> Key: MAPREDUCE-5895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
> Attachments: MAPREDUCE-5895.patch
>
>
> In TaskLog.java, Temporary Index File is created by following code.
> {code}
> BufferedOutputStream bos =
>   new BufferedOutputStream(
> SecureIOUtils.createForWrite(tmpIndexFile, 0644));
> DataOutputStream dos = new DataOutputStream(bos);
> {code}
> The code is surrounded by try-finally so if some Exception/ERROR is thrown 
> between constructing bos and dos, temporary file is not cleaned up.
> I met the situation that when a thread ran, OOM was thrown after bos created 
> and temporary file is not cleaned up. At different time, another thread 
> executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-22 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006327#comment-14006327
 ] 

Kousuke Saruta commented on MAPREDUCE-5895:
---

Who can review the patch?

> Temporary Index File can not be cleaned up because OutputStream doesn't close 
> properly
> --
>
> Key: MAPREDUCE-5895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
> Attachments: MAPREDUCE-5895.patch
>
>
> In TaskLog.java, Temporary Index File is created by following code.
> {code}
> BufferedOutputStream bos =
>   new BufferedOutputStream(
> SecureIOUtils.createForWrite(tmpIndexFile, 0644));
> DataOutputStream dos = new DataOutputStream(bos);
> {code}
> The code is surrounded by try-finally so if some Exception/ERROR is thrown 
> between constructing bos and dos, temporary file is not cleaned up.
> I met the situation that when a thread ran, OOM was thrown after bos created 
> and temporary file is not cleaned up. At different time, another thread 
> executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-20 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5895:
--

Status: Patch Available  (was: Open)

> Temporary Index File can not be cleaned up because OutputStream doesn't close 
> properly
> --
>
> Key: MAPREDUCE-5895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
> Attachments: MAPREDUCE-5895.patch
>
>
> In TaskLog.java, Temporary Index File is created by following code.
> {code}
> BufferedOutputStream bos =
>   new BufferedOutputStream(
> SecureIOUtils.createForWrite(tmpIndexFile, 0644));
> DataOutputStream dos = new DataOutputStream(bos);
> {code}
> The code is surrounded by try-finally so if some Exception/ERROR is thrown 
> between constructing bos and dos, temporary file is not cleaned up.
> I met the situation that when a thread ran, OOM was thrown after bos created 
> and temporary file is not cleaned up. At different time, another thread 
> executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-20 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5895:
--

Attachment: MAPREDUCE-5895.patch

I've attached a patch for this issue.

> Temporary Index File can not be cleaned up because OutputStream doesn't close 
> properly
> --
>
> Key: MAPREDUCE-5895
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.0.0
>Reporter: Kousuke Saruta
> Attachments: MAPREDUCE-5895.patch
>
>
> In TaskLog.java, Temporary Index File is created by following code.
> {code}
> BufferedOutputStream bos =
>   new BufferedOutputStream(
> SecureIOUtils.createForWrite(tmpIndexFile, 0644));
> DataOutputStream dos = new DataOutputStream(bos);
> {code}
> The code is surrounded by try-finally so if some Exception/ERROR is thrown 
> between constructing bos and dos, temporary file is not cleaned up.
> I met the situation that when a thread ran, OOM was thrown after bos created 
> and temporary file is not cleaned up. At different time, another thread 
> executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly

2014-05-20 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created MAPREDUCE-5895:
-

 Summary: Temporary Index File can not be cleaned up because 
OutputStream doesn't close properly
 Key: MAPREDUCE-5895
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.0.0
Reporter: Kousuke Saruta


In TaskLog.java, Temporary Index File is created by following code.

{code}
BufferedOutputStream bos =
  new BufferedOutputStream(
SecureIOUtils.createForWrite(tmpIndexFile, 0644));
DataOutputStream dos = new DataOutputStream(bos);
{code}

The code is surrounded by try-finally so if some Exception/ERROR is thrown 
between constructing bos and dos, temporary file is not cleaned up.
I met the situation that when a thread ran, OOM was thrown after bos created 
and temporary file is not cleaned up. At different time, another thread 
executed same logic and fail because of FileAlreadyExistsException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (MAPREDUCE-5600) ConcurrentModificationException on /tasktracker.jsp

2013-11-12 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820550#comment-13820550
 ] 

Kousuke Saruta commented on MAPREDUCE-5600:
---

I think branch-1 has the same issue.

When TaskTracker#getTasksFromRunningJobs is called, we'll get a iterator of 
"runningJobs"

{code}
  /**
   * Get the list of tasks from running jobs on this task tracker.
   * @return a copy of the list of TaskStatus objects
   */
  synchronized List getTasksFromRunningJobs() {
List result = new ArrayList(tasks.size());
for (Map.Entry  item : runningJobs.entrySet()) {
  ...
  }
}
return result;
  }
{code}

On the other hand, TaskTracker#addTaskToJob can be called during iteration.
addTaskToJob method modifies "runningJob" so it can cause 
ConcurrentModificationException.


{code}
  private RunningJob addTaskToJob(JobID jobId,
  TaskInProgress tip) {
synchronized (runningJobs) {
   ...
runningJobs.put(jobId, rJob);
   ...
} 
  }
{code}

When we call getTasksFromRunningJobs, we get a synchronized monitor for the 
instance of TaskTracker, but, when we call addTaskToJob,
we get a synchronized monitor for "runningJob". So, we may need to modify 
TaskTracker to get an appropriate monitor when calling getTasksFromRunningJobs. 

> ConcurrentModificationException on /tasktracker.jsp
> ---
>
> Key: MAPREDUCE-5600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5600
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
> Environment: Found this in the MR1 bundled with 
> hadoop-2.0.0-mr1-cdh4.1.3 (which I think is based on some 0.20 version).
>Reporter: Benoit Sigoure
>
> If you request {{/tasktracker.jsp}} frequently on a TaskTracker that's busy, 
> every once in a while you'll get this:
> {code}
> 2013-10-29 13:25:55,524 ERROR org.mortbay.log: /tasktracker.jsp
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1117)
> at java.util.TreeMap$EntryIterator.next(TreeMap.java:1153)
> at java.util.TreeMap$EntryIterator.next(TreeMap.java:1148)
> at 
> org.apache.hadoop.mapred.TaskTracker.getTasksFromRunningJobs(TaskTracker.java:3991)
> at 
> org.apache.hadoop.mapred.tasktracker_jsp._jspService(tasktracker_jsp.java:98)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1056)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5600) ConcurrentModificationException on /tasktracker.jsp

2013-11-11 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819851#comment-13819851
 ] 

Kousuke Saruta commented on MAPREDUCE-5600:
---

Hi Benoit,
Can you reproduce that?

> ConcurrentModificationException on /tasktracker.jsp
> ---
>
> Key: MAPREDUCE-5600
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5600
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: tasktracker
> Environment: Found this in the MR1 bundled with 
> hadoop-2.0.0-mr1-cdh4.1.3 (which I think is based on some 0.20 version).
>Reporter: Benoit Sigoure
>
> If you request {{/tasktracker.jsp}} frequently on a TaskTracker that's busy, 
> every once in a while you'll get this:
> {code}
> 2013-10-29 13:25:55,524 ERROR org.mortbay.log: /tasktracker.jsp
> java.util.ConcurrentModificationException
> at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1117)
> at java.util.TreeMap$EntryIterator.next(TreeMap.java:1153)
> at java.util.TreeMap$EntryIterator.next(TreeMap.java:1148)
> at 
> org.apache.hadoop.mapred.TaskTracker.getTasksFromRunningJobs(TaskTracker.java:3991)
> at 
> org.apache.hadoop.mapred.tasktracker_jsp._jspService(tasktracker_jsp.java:98)
> at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> at 
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> at 
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1056)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (MAPREDUCE-5503) TestMRJobClient.testJobClient is failing

2013-09-20 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773173#comment-13773173
 ] 

Kousuke Saruta commented on MAPREDUCE-5503:
---

Oh, I missunderstood. I got it.

> TestMRJobClient.testJobClient is failing
> 
>
> Key: MAPREDUCE-5503
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5503
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-5503.patch
>
>
> TestMRJobClient.testJobClient is failing on trunk and causing precommit 
> builds to complain:
> {noformat}
> testJobClient(org.apache.hadoop.mapreduce.TestMRJobClient)  Time elapsed: 
> 26.361 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at junit.framework.Assert.assertEquals(Assert.java:199)
>   at junit.framework.Assert.assertEquals(Assert.java:205)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobList(TestMRJobClient.java:474)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:112)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5503) TestMRJobClient.testJobClient is failing

2013-09-20 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772926#comment-13772926
 ] 

Kousuke Saruta commented on MAPREDUCE-5503:
---

Hey Akira, I think TestMRJobClient expect to test operations for running 
MapReduce jobs.
So, modifying expected value doesn't make sense, it's not test.
Current implementation of TestMRJobClient has some wrong preconditions ( e.g. 
"mapred job -list" after job completion, as you said), and some part of  test 
code of which the result depends on timing.

We should consider that and modify. 

> TestMRJobClient.testJobClient is failing
> 
>
> Key: MAPREDUCE-5503
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5503
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Jason Lowe
>Priority: Blocker
> Attachments: MAPREDUCE-5503.patch
>
>
> TestMRJobClient.testJobClient is failing on trunk and causing precommit 
> builds to complain:
> {noformat}
> testJobClient(org.apache.hadoop.mapreduce.TestMRJobClient)  Time elapsed: 
> 26.361 sec  <<< FAILURE!
> junit.framework.AssertionFailedError: expected:<1> but was:<0>
>   at junit.framework.Assert.fail(Assert.java:50)
>   at junit.framework.Assert.failNotEquals(Assert.java:287)
>   at junit.framework.Assert.assertEquals(Assert.java:67)
>   at junit.framework.Assert.assertEquals(Assert.java:199)
>   at junit.framework.Assert.assertEquals(Assert.java:205)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobList(TestMRJobClient.java:474)
>   at 
> org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:112)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types

2013-09-13 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5504:
--

Target Version/s: 3.0.0, 0.23.10  (was: 0.23.10)
  Status: Patch Available  (was: Open)

> mapred queue -info inconsistent with types
> --
>
> Key: MAPREDUCE-5504
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.23.9
>Reporter: Thomas Graves
> Attachments: MAPREDUCE-5504.patch
>
>
> $ mapred queue -info default
> ==
> Queue Name : default
> Queue State : running
> Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: 
> 0.9309831
> The capacity is displayed in % as 4, however maximum capacity is displayed as 
> an absolute number 0.67 instead of 67%.
> We should make these consistent with the type we are displaying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types

2013-09-13 Thread Kousuke Saruta (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated MAPREDUCE-5504:
--

Attachment: MAPREDUCE-5504.patch

Hi,
I've created a patch for this issue.

> mapred queue -info inconsistent with types
> --
>
> Key: MAPREDUCE-5504
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 0.23.9
>Reporter: Thomas Graves
> Attachments: MAPREDUCE-5504.patch
>
>
> $ mapred queue -info default
> ==
> Queue Name : default
> Queue State : running
> Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: 
> 0.9309831
> The capacity is displayed in % as 4, however maximum capacity is displayed as 
> an absolute number 0.67 instead of 67%.
> We should make these consistent with the type we are displaying

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-07-16 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709588#comment-13709588
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

Thanks Stan. I have asked in HADOOP-7771.

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-07-07 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701539#comment-13701539
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

May I recreate this jira as HDFS issue?

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers

2013-06-28 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695641#comment-13695641
 ] 

Kousuke Saruta commented on MAPREDUCE-5153:
---

Radim, what kind of workload do you want to use combiners without reducers?
We consider whether the feature is really needed or not .

> Support for running combiners without reducers
> --
>
> Key: MAPREDUCE-5153
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Reporter: Radim Kolar
>
> scenario: Workflow mapper -> sort -> combiner -> hdfs
> No api change is need, if user set combiner class and reducers = 0 then run 
> combiner and sent output to HDFS.
> Popular libraries such as scalding and cascading are offering this 
> functionality, but they use caching entire mapper output in memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-06-21 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691008#comment-13691008
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

As Devaraj said, we can use "mapred.input.pathFilter.class" but, as far as I 
know, the name of the temporary file is undocumented and I think changes of the 
specification or implementation of HDFS should not affect users who have ever 
used HDFS.
So, I think we should consider the name of the temporary file. It may good that 
the name of the temporary file starts with "." or "_".


> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-06-19 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688650#comment-13688650
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

I found the jira where the code which "._COPYING_" temporary file being created 
is added.
https://issues.apache.org/jira/browse/HADOOP-7771
In that jira, they discussed NPE problem when using copyToLocal and the reason 
why the "._COPYING_"  is created is to copy file persistently.
So, I think the temporary file is not necessarily assigned the name with 
"._COPYING_" suffix.

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-06-19 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688335#comment-13688335
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

Stan, I think we should modify FsShell to create a file assigned the name with 
underscore prefix for a file being created so that FileInputFormat can ignore 
the file rather than modify FileInputFormat to handle a ._COPYING_ suffix file 
as a hidden file.
It's just HDFS matter and I think the specification change shouldn't affects 
MapReduce.
How do you think?

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-06-17 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686274#comment-13686274
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

I succeeded to reproduce in branch-2.1-beta.
I saw a temporary file with prefix "._COPYING_" during putting a file into HDFS.
As you say, it causes something bad that we run MapReduce jobs when there are 
._COPYING_ files in the directory where MapReduce jobs use for input path.

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-06-11 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680872#comment-13680872
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

OK. I think we should change Affects Version/s to trunc.

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix

2013-06-11 Thread Kousuke Saruta (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680830#comment-13680830
 ] 

Kousuke Saruta commented on MAPREDUCE-5247:
---

We should discuss just community version, not specific distribution.
But what you say seems to affect Hadoop-2 and trunc.

> FileInputFormat should filter files with '._COPYING_' sufix
> ---
>
> Key: MAPREDUCE-5247
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Stan Rosenberg
>
> FsShell copy/put creates staging files with '._COPYING_' suffix.  These files 
> should be considered hidden by FileInputFormat.  (A simple fix is to add the 
> following conjunct to the existing hiddenFilter: 
> {code}
> !name.endsWith("._COPYING_")
> {code}
> After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data 
> loader which uses 'hadoop fs -put' to load data into hourly partitions.  We 
> also have intra-hourly jobs which are scheduled to execute several times per 
> hour using the same hourly partition as input.  Thus, as the new data is 
> continuously loaded, these staging files (i.e., ._COPYING_) are breaking our 
> jobs (since when copy/put completes staging files are moved).
> As a workaround, we've defined a custom input path filter and loaded it with 
> "mapred.input.pathFilter.class".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira