[jira] [Commented] (MAPREDUCE-6693) Job history entry missing when JOB name is of mapreduce.jobhistory.jobname.limit length
[ https://issues.apache.org/jira/browse/MAPREDUCE-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280373#comment-15280373 ] Kousuke Saruta commented on MAPREDUCE-6693: --- On the second thought, only {code} if (encodedString.length() < limitLength) {code} should be changed to {code} if (encodedString.length() <= limitLength) {code} and {code} index + increase > limitLength {code} should be kept. The reason is if we have {code} if (encodedString.length() <= limitLength) { return encodedString; } {code} the size of strBytes is at least limitLength + 1, means maximum index is limitLength. So even if index + increase is limitLength, it's safe. > Job history entry missing when JOB name is of > mapreduce.jobhistory.jobname.limit length > --- > > Key: MAPREDUCE-6693 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6693 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Critical > > Job history entry missing when JOB name is of > {{mapreduce.jobhistory.jobname.limit}} character > {noformat} > 2016-05-10 06:51:00,674 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Interrupting > Event Handling thread > 2016-05-10 06:51:00,674 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Waiting for > Event Handling thread to complete > 2016-05-10 06:51:00,674 ERROR [eventHandlingThread] > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[eventHandlingThread,5,main] threw an Exception. > java.lang.ArrayIndexOutOfBoundsException: 50 > at > org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.trimURLEncodedString(FileNameIndexUtils.java:326) > at > org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:86) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1147) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:635) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:341) > at java.lang.Thread.run(Thread.java:745) > 2016-05-10 06:51:00,675 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down > timer for Job MetaInfo for job_1462840033869_0009 history file > hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist > 2016-05-10 06:51:00,675 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down > timer Job MetaInfo for job_1462840033869_0009 history file > hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist > 2016-05-10 06:51:00,676 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Closing Writer > {noformat} > Looks like 50 character check is going wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6693) Job history entry missing when JOB name is of mapreduce.jobhistory.jobname.limit length
[ https://issues.apache.org/jira/browse/MAPREDUCE-6693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15278330#comment-15278330 ] Kousuke Saruta commented on MAPREDUCE-6693: --- Good catch. Thanks for trying fixing the issue. {code} if (encodedString.length() <= limitLength) {code} and {code} index + increase >= limitLength {code} make sense. About adding {code} index + increase >= strBytes.length {code} I don't disagree but I wonder are there any possibility that index + increase >= strBytes.length. If the size of strArray is less than 50, e.g. 49, encodedString.length() is 49, means encodingString.length() <= 50 so trimURLEncodingString method will return in {code} if (encodedString.length() < limitLength) { return encodedString; } {code} If I miss something, please correct me. > Job history entry missing when JOB name is of > mapreduce.jobhistory.jobname.limit length > --- > > Key: MAPREDUCE-6693 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6693 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Bibin A Chundatt >Assignee: Ajith S >Priority: Critical > > Job history entry missing when JOB name is of > {{mapreduce.jobhistory.jobname.limit}} character > {noformat} > 2016-05-10 06:51:00,674 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Interrupting > Event Handling thread > 2016-05-10 06:51:00,674 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Waiting for > Event Handling thread to complete > 2016-05-10 06:51:00,674 ERROR [eventHandlingThread] > org.apache.hadoop.yarn.YarnUncaughtExceptionHandler: Thread > Thread[eventHandlingThread,5,main] threw an Exception. > java.lang.ArrayIndexOutOfBoundsException: 50 > at > org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.trimURLEncodedString(FileNameIndexUtils.java:326) > at > org.apache.hadoop.mapreduce.v2.jobhistory.FileNameIndexUtils.getDoneFileName(FileNameIndexUtils.java:86) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processDoneFiles(JobHistoryEventHandler.java:1147) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleEvent(JobHistoryEventHandler.java:635) > at > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$1.run(JobHistoryEventHandler.java:341) > at java.lang.Thread.run(Thread.java:745) > 2016-05-10 06:51:00,675 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down > timer for Job MetaInfo for job_1462840033869_0009 history file > hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist > 2016-05-10 06:51:00,675 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Shutting down > timer Job MetaInfo for job_1462840033869_0009 history file > hdfs://hacluster:9820/staging-dir/dsperf/.staging/job_1462840033869_0009/job_1462840033869_0009_1.jhist > 2016-05-10 06:51:00,676 DEBUG [Thread-73] > org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Closing Writer > {noformat} > Looks like 50 character check is going wrong -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6616) Fail to create jobhistory file if there are some multibyte characters in the job name
[ https://issues.apache.org/jira/browse/MAPREDUCE-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6616: -- Attachment: MAPREDUCE-6616.1.patch Fixed style issues and brushed up. > Fail to create jobhistory file if there are some multibyte characters in the > job name > - > > Key: MAPREDUCE-6616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Akira AJISAKA >Assignee: Kousuke Saruta > Labels: i18n > Attachments: MAPREDUCE-6616-test.patch, MAPREDUCE-6616.0.patch, > MAPREDUCE-6616.1.patch > > > When creating jobhistory file, job name is trimmed within 50 characters by > default, and the name is URL-encoded *after* the job name is trimmed. > Therefore, if there are some multibyte characters in the job name, the > encoded job name can be longer than 50 characters. Eventually it can break > the limit of the file name (Usually 255 characters). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Attachment: MAPREDUCE-6615.0.patch Fixed findbugs issue. > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Attachments: MAPREDUCE-6615.0.patch > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Attachment: (was: MAPREDUCE-6277.patch.0) > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6616) Fail to create jobhistory file if there are some multibyte characters in the job name
[ https://issues.apache.org/jira/browse/MAPREDUCE-6616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6616: -- Attachment: MAPREDUCE-6616.0.patch I've added the initial patch. > Fail to create jobhistory file if there are some multibyte characters in the > job name > - > > Key: MAPREDUCE-6616 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6616 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver >Reporter: Akira AJISAKA >Assignee: Kousuke Saruta > Labels: i18n > Attachments: MAPREDUCE-6616-test.patch, MAPREDUCE-6616.0.patch > > > When creating jobhistory file, job name is trimmed within 50 characters by > default, and the name is URL-encoded *after* the job name is trimmed. > Therefore, if there are some multibyte characters in the job name, the > encoded job name can be longer than 50 characters. Eventually it can break > the limit of the file name (Usually 255 characters). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6617) flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6617: -- Attachment: MAPREDUCE-6617.0.patch I've attached the initial patch. > flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask > - > > Key: MAPREDUCE-6617 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6617 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-6617.0.patch > > > In JobHistoryEventHandler, flushTask is not purged after it's canceled so GC > never sweep flushTask. > It can cause memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6617) flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask
[ https://issues.apache.org/jira/browse/MAPREDUCE-6617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6617: -- Status: Patch Available (was: Open) > flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask > - > > Key: MAPREDUCE-6617 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6617 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-6617.0.patch > > > In JobHistoryEventHandler, flushTask is not purged after it's canceled so GC > never sweep flushTask. > It can cause memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6617) flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask
Kousuke Saruta created MAPREDUCE-6617: - Summary: flushTimer in JobHistoryEventHandler should purge canceled flushTimerTask Key: MAPREDUCE-6617 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6617 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta In JobHistoryEventHandler, flushTask is not purged after it's canceled so GC never sweep flushTask. It can cause memory leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Priority: Minor (was: Major) > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Minor > Attachments: MAPREDUCE-6277.patch.0 > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Attachment: (was: MAPREDUCE-6615.patch.0) > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-6277.patch.0 > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Attachment: MAPREDUCE-6277.patch.0 I've fixed the compile error. > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-6277.patch.0 > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Status: Patch Available (was: Open) > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-6615.patch.0 > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
[ https://issues.apache.org/jira/browse/MAPREDUCE-6615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-6615: -- Attachment: MAPREDUCE-6615.patch.0 I've attached the initial patch. > Remove useless boxing/unboxing code (Hadoop MapReduce) > -- > > Key: MAPREDUCE-6615 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: performance >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta > Attachments: MAPREDUCE-6615.patch.0 > > > There are lots of places where useless boxing/unboxing occur. > To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (MAPREDUCE-6615) Remove useless boxing/unboxing code (Hadoop MapReduce)
Kousuke Saruta created MAPREDUCE-6615: - Summary: Remove useless boxing/unboxing code (Hadoop MapReduce) Key: MAPREDUCE-6615 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6615 Project: Hadoop Map/Reduce Issue Type: Improvement Components: performance Affects Versions: 3.0.0 Reporter: Kousuke Saruta Assignee: Kousuke Saruta There are lots of places where useless boxing/unboxing occur. To avoid performance issue, let's remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (MAPREDUCE-5895) FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-5895: -- Attachment: MAPREDUCE-5895.patch I understood. In the new patch, bos is closed and two cleanup method calls are into one method call. > FileAlreadyExistsException was thrown : Temporary Index File can not be > cleaned up because OutputStream doesn't close properly > -- > > Key: MAPREDUCE-5895 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta > Attachments: MAPREDUCE-5895.patch, MAPREDUCE-5895.patch > > > In TaskLog.java, Temporary Index File is created by following code. > {code} > BufferedOutputStream bos = > new BufferedOutputStream( > SecureIOUtils.createForWrite(tmpIndexFile, 0644)); > DataOutputStream dos = new DataOutputStream(bos); > {code} > The code is surrounded by try-finally so if some Exception/ERROR is thrown > between constructing bos and dos, temporary file is not cleaned up. > I met the situation that when a thread ran, OOM was thrown after bos created > and temporary file is not cleaned up. At different time, another thread > executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5895) FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009771#comment-14009771 ] Kousuke Saruta commented on MAPREDUCE-5895: --- Thank you for your comment [~devaraj.k]! I think dos.close calls bos.close internally right? > FileAlreadyExistsException was thrown : Temporary Index File can not be > cleaned up because OutputStream doesn't close properly > -- > > Key: MAPREDUCE-5895 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta > Attachments: MAPREDUCE-5895.patch > > > In TaskLog.java, Temporary Index File is created by following code. > {code} > BufferedOutputStream bos = > new BufferedOutputStream( > SecureIOUtils.createForWrite(tmpIndexFile, 0644)); > DataOutputStream dos = new DataOutputStream(bos); > {code} > The code is surrounded by try-finally so if some Exception/ERROR is thrown > between constructing bos and dos, temporary file is not cleaned up. > I met the situation that when a thread ran, OOM was thrown after bos created > and temporary file is not cleaned up. At different time, another thread > executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5895) FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-5895: -- Summary: FileAlreadyExistsException was thrown : Temporary Index File can not be cleaned up because OutputStream doesn't close properly (was: Temporary Index File can not be cleaned up because OutputStream doesn't close properly) > FileAlreadyExistsException was thrown : Temporary Index File can not be > cleaned up because OutputStream doesn't close properly > -- > > Key: MAPREDUCE-5895 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta > Attachments: MAPREDUCE-5895.patch > > > In TaskLog.java, Temporary Index File is created by following code. > {code} > BufferedOutputStream bos = > new BufferedOutputStream( > SecureIOUtils.createForWrite(tmpIndexFile, 0644)); > DataOutputStream dos = new DataOutputStream(bos); > {code} > The code is surrounded by try-finally so if some Exception/ERROR is thrown > between constructing bos and dos, temporary file is not cleaned up. > I met the situation that when a thread ran, OOM was thrown after bos created > and temporary file is not cleaned up. At different time, another thread > executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006327#comment-14006327 ] Kousuke Saruta commented on MAPREDUCE-5895: --- Who can review the patch? > Temporary Index File can not be cleaned up because OutputStream doesn't close > properly > -- > > Key: MAPREDUCE-5895 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta > Attachments: MAPREDUCE-5895.patch > > > In TaskLog.java, Temporary Index File is created by following code. > {code} > BufferedOutputStream bos = > new BufferedOutputStream( > SecureIOUtils.createForWrite(tmpIndexFile, 0644)); > DataOutputStream dos = new DataOutputStream(bos); > {code} > The code is surrounded by try-finally so if some Exception/ERROR is thrown > between constructing bos and dos, temporary file is not cleaned up. > I met the situation that when a thread ran, OOM was thrown after bos created > and temporary file is not cleaned up. At different time, another thread > executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-5895: -- Status: Patch Available (was: Open) > Temporary Index File can not be cleaned up because OutputStream doesn't close > properly > -- > > Key: MAPREDUCE-5895 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta > Attachments: MAPREDUCE-5895.patch > > > In TaskLog.java, Temporary Index File is created by following code. > {code} > BufferedOutputStream bos = > new BufferedOutputStream( > SecureIOUtils.createForWrite(tmpIndexFile, 0644)); > DataOutputStream dos = new DataOutputStream(bos); > {code} > The code is surrounded by try-finally so if some Exception/ERROR is thrown > between constructing bos and dos, temporary file is not cleaned up. > I met the situation that when a thread ran, OOM was thrown after bos created > and temporary file is not cleaned up. At different time, another thread > executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly
[ https://issues.apache.org/jira/browse/MAPREDUCE-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-5895: -- Attachment: MAPREDUCE-5895.patch I've attached a patch for this issue. > Temporary Index File can not be cleaned up because OutputStream doesn't close > properly > -- > > Key: MAPREDUCE-5895 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 3.0.0 >Reporter: Kousuke Saruta > Attachments: MAPREDUCE-5895.patch > > > In TaskLog.java, Temporary Index File is created by following code. > {code} > BufferedOutputStream bos = > new BufferedOutputStream( > SecureIOUtils.createForWrite(tmpIndexFile, 0644)); > DataOutputStream dos = new DataOutputStream(bos); > {code} > The code is surrounded by try-finally so if some Exception/ERROR is thrown > between constructing bos and dos, temporary file is not cleaned up. > I met the situation that when a thread ran, OOM was thrown after bos created > and temporary file is not cleaned up. At different time, another thread > executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (MAPREDUCE-5895) Temporary Index File can not be cleaned up because OutputStream doesn't close properly
Kousuke Saruta created MAPREDUCE-5895: - Summary: Temporary Index File can not be cleaned up because OutputStream doesn't close properly Key: MAPREDUCE-5895 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5895 Project: Hadoop Map/Reduce Issue Type: Bug Components: client Affects Versions: 3.0.0 Reporter: Kousuke Saruta In TaskLog.java, Temporary Index File is created by following code. {code} BufferedOutputStream bos = new BufferedOutputStream( SecureIOUtils.createForWrite(tmpIndexFile, 0644)); DataOutputStream dos = new DataOutputStream(bos); {code} The code is surrounded by try-finally so if some Exception/ERROR is thrown between constructing bos and dos, temporary file is not cleaned up. I met the situation that when a thread ran, OOM was thrown after bos created and temporary file is not cleaned up. At different time, another thread executed same logic and fail because of FileAlreadyExistsException. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-5600) ConcurrentModificationException on /tasktracker.jsp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13820550#comment-13820550 ] Kousuke Saruta commented on MAPREDUCE-5600: --- I think branch-1 has the same issue. When TaskTracker#getTasksFromRunningJobs is called, we'll get a iterator of "runningJobs" {code} /** * Get the list of tasks from running jobs on this task tracker. * @return a copy of the list of TaskStatus objects */ synchronized List getTasksFromRunningJobs() { List result = new ArrayList(tasks.size()); for (Map.Entry item : runningJobs.entrySet()) { ... } } return result; } {code} On the other hand, TaskTracker#addTaskToJob can be called during iteration. addTaskToJob method modifies "runningJob" so it can cause ConcurrentModificationException. {code} private RunningJob addTaskToJob(JobID jobId, TaskInProgress tip) { synchronized (runningJobs) { ... runningJobs.put(jobId, rJob); ... } } {code} When we call getTasksFromRunningJobs, we get a synchronized monitor for the instance of TaskTracker, but, when we call addTaskToJob, we get a synchronized monitor for "runningJob". So, we may need to modify TaskTracker to get an appropriate monitor when calling getTasksFromRunningJobs. > ConcurrentModificationException on /tasktracker.jsp > --- > > Key: MAPREDUCE-5600 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5600 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Found this in the MR1 bundled with > hadoop-2.0.0-mr1-cdh4.1.3 (which I think is based on some 0.20 version). >Reporter: Benoit Sigoure > > If you request {{/tasktracker.jsp}} frequently on a TaskTracker that's busy, > every once in a while you'll get this: > {code} > 2013-10-29 13:25:55,524 ERROR org.mortbay.log: /tasktracker.jsp > java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1117) > at java.util.TreeMap$EntryIterator.next(TreeMap.java:1153) > at java.util.TreeMap$EntryIterator.next(TreeMap.java:1148) > at > org.apache.hadoop.mapred.TaskTracker.getTasksFromRunningJobs(TaskTracker.java:3991) > at > org.apache.hadoop.mapred.tasktracker_jsp._jspService(tasktracker_jsp.java:98) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1056) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5600) ConcurrentModificationException on /tasktracker.jsp
[ https://issues.apache.org/jira/browse/MAPREDUCE-5600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819851#comment-13819851 ] Kousuke Saruta commented on MAPREDUCE-5600: --- Hi Benoit, Can you reproduce that? > ConcurrentModificationException on /tasktracker.jsp > --- > > Key: MAPREDUCE-5600 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5600 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Environment: Found this in the MR1 bundled with > hadoop-2.0.0-mr1-cdh4.1.3 (which I think is based on some 0.20 version). >Reporter: Benoit Sigoure > > If you request {{/tasktracker.jsp}} frequently on a TaskTracker that's busy, > every once in a while you'll get this: > {code} > 2013-10-29 13:25:55,524 ERROR org.mortbay.log: /tasktracker.jsp > java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1117) > at java.util.TreeMap$EntryIterator.next(TreeMap.java:1153) > at java.util.TreeMap$EntryIterator.next(TreeMap.java:1148) > at > org.apache.hadoop.mapred.TaskTracker.getTasksFromRunningJobs(TaskTracker.java:3991) > at > org.apache.hadoop.mapred.tasktracker_jsp._jspService(tasktracker_jsp.java:98) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) > at > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221) > at > org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1056) > at > org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) > at > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) > at > org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) > at > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) > at > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) > at > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) > at > org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) > at > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) > at org.mortbay.jetty.Server.handle(Server.java:326) > at > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) > at > org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) > at > org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410) > at > org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) > {code} -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (MAPREDUCE-5503) TestMRJobClient.testJobClient is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13773173#comment-13773173 ] Kousuke Saruta commented on MAPREDUCE-5503: --- Oh, I missunderstood. I got it. > TestMRJobClient.testJobClient is failing > > > Key: MAPREDUCE-5503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5503 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: MAPREDUCE-5503.patch > > > TestMRJobClient.testJobClient is failing on trunk and causing precommit > builds to complain: > {noformat} > testJobClient(org.apache.hadoop.mapreduce.TestMRJobClient) Time elapsed: > 26.361 sec <<< FAILURE! > junit.framework.AssertionFailedError: expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at junit.framework.Assert.assertEquals(Assert.java:199) > at junit.framework.Assert.assertEquals(Assert.java:205) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobList(TestMRJobClient.java:474) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:112) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5503) TestMRJobClient.testJobClient is failing
[ https://issues.apache.org/jira/browse/MAPREDUCE-5503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772926#comment-13772926 ] Kousuke Saruta commented on MAPREDUCE-5503: --- Hey Akira, I think TestMRJobClient expect to test operations for running MapReduce jobs. So, modifying expected value doesn't make sense, it's not test. Current implementation of TestMRJobClient has some wrong preconditions ( e.g. "mapred job -list" after job completion, as you said), and some part of test code of which the result depends on timing. We should consider that and modify. > TestMRJobClient.testJobClient is failing > > > Key: MAPREDUCE-5503 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5503 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 >Affects Versions: 3.0.0, 2.1.0-beta >Reporter: Jason Lowe >Priority: Blocker > Attachments: MAPREDUCE-5503.patch > > > TestMRJobClient.testJobClient is failing on trunk and causing precommit > builds to complain: > {noformat} > testJobClient(org.apache.hadoop.mapreduce.TestMRJobClient) Time elapsed: > 26.361 sec <<< FAILURE! > junit.framework.AssertionFailedError: expected:<1> but was:<0> > at junit.framework.Assert.fail(Assert.java:50) > at junit.framework.Assert.failNotEquals(Assert.java:287) > at junit.framework.Assert.assertEquals(Assert.java:67) > at junit.framework.Assert.assertEquals(Assert.java:199) > at junit.framework.Assert.assertEquals(Assert.java:205) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobList(TestMRJobClient.java:474) > at > org.apache.hadoop.mapreduce.TestMRJobClient.testJobClient(TestMRJobClient.java:112) > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types
[ https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-5504: -- Target Version/s: 3.0.0, 0.23.10 (was: 0.23.10) Status: Patch Available (was: Open) > mapred queue -info inconsistent with types > -- > > Key: MAPREDUCE-5504 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.23.9 >Reporter: Thomas Graves > Attachments: MAPREDUCE-5504.patch > > > $ mapred queue -info default > == > Queue Name : default > Queue State : running > Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: > 0.9309831 > The capacity is displayed in % as 4, however maximum capacity is displayed as > an absolute number 0.67 instead of 67%. > We should make these consistent with the type we are displaying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (MAPREDUCE-5504) mapred queue -info inconsistent with types
[ https://issues.apache.org/jira/browse/MAPREDUCE-5504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated MAPREDUCE-5504: -- Attachment: MAPREDUCE-5504.patch Hi, I've created a patch for this issue. > mapred queue -info inconsistent with types > -- > > Key: MAPREDUCE-5504 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5504 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: client >Affects Versions: 0.23.9 >Reporter: Thomas Graves > Attachments: MAPREDUCE-5504.patch > > > $ mapred queue -info default > == > Queue Name : default > Queue State : running > Scheduling Info : Capacity: 4.0, MaximumCapacity: 0.67, CurrentCapacity: > 0.9309831 > The capacity is displayed in % as 4, however maximum capacity is displayed as > an absolute number 0.67 instead of 67%. > We should make these consistent with the type we are displaying -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13709588#comment-13709588 ] Kousuke Saruta commented on MAPREDUCE-5247: --- Thanks Stan. I have asked in HADOOP-7771. > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701539#comment-13701539 ] Kousuke Saruta commented on MAPREDUCE-5247: --- May I recreate this jira as HDFS issue? > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers
[ https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13695641#comment-13695641 ] Kousuke Saruta commented on MAPREDUCE-5153: --- Radim, what kind of workload do you want to use combiners without reducers? We consider whether the feature is really needed or not . > Support for running combiners without reducers > -- > > Key: MAPREDUCE-5153 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153 > Project: Hadoop Map/Reduce > Issue Type: New Feature >Reporter: Radim Kolar > > scenario: Workflow mapper -> sort -> combiner -> hdfs > No api change is need, if user set combiner class and reducers = 0 then run > combiner and sent output to HDFS. > Popular libraries such as scalding and cascading are offering this > functionality, but they use caching entire mapper output in memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691008#comment-13691008 ] Kousuke Saruta commented on MAPREDUCE-5247: --- As Devaraj said, we can use "mapred.input.pathFilter.class" but, as far as I know, the name of the temporary file is undocumented and I think changes of the specification or implementation of HDFS should not affect users who have ever used HDFS. So, I think we should consider the name of the temporary file. It may good that the name of the temporary file starts with "." or "_". > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688650#comment-13688650 ] Kousuke Saruta commented on MAPREDUCE-5247: --- I found the jira where the code which "._COPYING_" temporary file being created is added. https://issues.apache.org/jira/browse/HADOOP-7771 In that jira, they discussed NPE problem when using copyToLocal and the reason why the "._COPYING_" is created is to copy file persistently. So, I think the temporary file is not necessarily assigned the name with "._COPYING_" suffix. > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688335#comment-13688335 ] Kousuke Saruta commented on MAPREDUCE-5247: --- Stan, I think we should modify FsShell to create a file assigned the name with underscore prefix for a file being created so that FileInputFormat can ignore the file rather than modify FileInputFormat to handle a ._COPYING_ suffix file as a hidden file. It's just HDFS matter and I think the specification change shouldn't affects MapReduce. How do you think? > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13686274#comment-13686274 ] Kousuke Saruta commented on MAPREDUCE-5247: --- I succeeded to reproduce in branch-2.1-beta. I saw a temporary file with prefix "._COPYING_" during putting a file into HDFS. As you say, it causes something bad that we run MapReduce jobs when there are ._COPYING_ files in the directory where MapReduce jobs use for input path. > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680872#comment-13680872 ] Kousuke Saruta commented on MAPREDUCE-5247: --- OK. I think we should change Affects Version/s to trunc. > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (MAPREDUCE-5247) FileInputFormat should filter files with '._COPYING_' sufix
[ https://issues.apache.org/jira/browse/MAPREDUCE-5247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13680830#comment-13680830 ] Kousuke Saruta commented on MAPREDUCE-5247: --- We should discuss just community version, not specific distribution. But what you say seems to affect Hadoop-2 and trunc. > FileInputFormat should filter files with '._COPYING_' sufix > --- > > Key: MAPREDUCE-5247 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5247 > Project: Hadoop Map/Reduce > Issue Type: Bug >Reporter: Stan Rosenberg > > FsShell copy/put creates staging files with '._COPYING_' suffix. These files > should be considered hidden by FileInputFormat. (A simple fix is to add the > following conjunct to the existing hiddenFilter: > {code} > !name.endsWith("._COPYING_") > {code} > After upgrading to CDH 4.2.0 we encountered this bug. We have a legacy data > loader which uses 'hadoop fs -put' to load data into hourly partitions. We > also have intra-hourly jobs which are scheduled to execute several times per > hour using the same hourly partition as input. Thus, as the new data is > continuously loaded, these staging files (i.e., ._COPYING_) are breaking our > jobs (since when copy/put completes staging files are moved). > As a workaround, we've defined a custom input path filter and loaded it with > "mapred.input.pathFilter.class". -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira