from:"JIRA"

[jira] [Resolved] (MAPREDUCE-7478) [Decommission]Show Info Log for Repeated Useless refreshNode Operation

2024-07-04 Thread wuchang (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuchang resolved MAPREDUCE-7478.

Resolution: Abandoned

> [Decommission]Show Info Log for Repeated Useless refreshNode Operation
> --
>
> Key: MAPREDUCE-7478
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7478
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: wuchang
>Priority: Major
>
> https://github.com/apache/hadoop/pull/6921



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7478) [Decommission]Show Info Log for Repeated Useless refreshNode Operation

2024-07-04 Thread wuchang (Jira)

wuchang created MAPREDUCE-7478:
--

 Summary: [Decommission]Show Info Log for Repeated Useless 
refreshNode Operation
 Key: MAPREDUCE-7478
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7478
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: wuchang


https://github.com/apache/hadoop/pull/6921



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7475) Fix non-idempotent unit tests

2024-05-19 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved MAPREDUCE-7475.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Fix non-idempotent unit tests
> -
>
> Key: MAPREDUCE-7475
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7475
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
> Environment: Ubuntu 22.04, Java 17
>Reporter: Kaiyao Ke
>Assignee: Kaiyao Ke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> 2 tests are not idempotent and fails upon repeated execution within the same 
> JVM instance due to self-induced state pollution. Specifically, these tests 
> try to make the directory TEST_ROOT_DIR and write to it. The tests do not 
> clean up (remove) the directory after execution. Therefore, in the second 
> execution, TEST_ROOT_DIR would already exist and the exception `Could not 
> create test dir` would be thrown. Below are the 2 non-idempotent tests:
>  * org.apache.hadoop.mapred.TestOldCombinerGrouping.testCombiner
>  * org.apache.hadoop.mapreduce.TestNewCombinerGrouping.testCombiner



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7474) [ABFS] Improve commit resilience and performance in Manifest Committer

2024-05-15 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7474.
---
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> [ABFS] Improve commit resilience and performance in Manifest Committer
> --
>
> Key: MAPREDUCE-7474
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7474
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> * Manifest committer is not resilient to rename failures on task commit 
> without HADOOP-18012 rename recovery enabled. 
> * large burst of delete calls noted: are they needed?
> relates to HADOOP-19093 but takes a more minimal approach with goal of 
> changes in manifest committer only.
> Initial proposed changes
> * retry recovery on task commit rename, always (repeat save, delete, rename)
> * audit delete use and see if it can be pruned



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7476) Follow up of https://issues.apache.org/jira/browse/MAPREDUCE-7475 - detected 5 more non-idempotent tests(pass in the first run but fails in repeated runs in the same

2024-05-03 Thread Kaiyao Ke (Jira)

Kaiyao Ke created MAPREDUCE-7476:


 Summary: Follow up of 
https://issues.apache.org/jira/browse/MAPREDUCE-7475 - detected 5 more 
non-idempotent tests(pass in the first run but fails in repeated runs in the 
same JVM)
 Key: MAPREDUCE-7476
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7476
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Kaiyao Ke


Similar with https://issues.apache.org/jira/browse/MAPREDUCE-7475 , 5 more 
non-idempotent unit tests are detected.

The following two tests below do not reset `NotificationServlet.counter`, so 
repeated runs throw assertion failures due to accumulation:
 * org.apache.hadoop.mapred.TestClusterMRNotification#testMR
 * apache.hadoop.mapred.TestLocalMRNotification#testMR

The following test does not remove the key `AMParams.ATTEMPT_STATE`, so 
repeated runs of the test will not be missing the attempt-state at all:
 * org.apache.hadoop.mapreduce.v2.app.webapp.TestAppController.testAttempts

The following test fully deletes `TEST_ROOT_DIR` after execution, so repeated 
runs will throw a`DiskErrorException`:
 * org.apache.hadoop.mapred.TestMapTask#testShufflePermissions

The following test does not restore the static variable `statusUpdateTimes` 
after execution, so consecutive runs throws `AssertionError`:
 * org.apache.hadoop.mapred.TestTaskProgressReporter#testTaskProgress



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7475) 2 tests are non-idempotent (passes in the first run but fails in repeated runs in the same JVM):

2024-04-30 Thread Kaiyao Ke (Jira)

Kaiyao Ke created MAPREDUCE-7475:


 Summary: 2 tests are non-idempotent (passes in the first run but 
fails in repeated runs in the same JVM):
 Key: MAPREDUCE-7475
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7475
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Kaiyao Ke






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7474) [ABFS] Improve commit resilience and performance in Manifest Committer

2024-04-03 Thread Steve Loughran (Jira)

Steve Loughran created MAPREDUCE-7474:
-

 Summary: [ABFS] Improve commit resilience and performance in 
Manifest Committer
 Key: MAPREDUCE-7474
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7474
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran



* Manifest committer is not resilient to rename failures on task commit without 
HADOOP-18012 rename recovery enabled. 
* large burst of delete calls noted: are they needed


relates to HADOOP-19093 but takes a more minimal approach with goal of changes 
in manifest committer only.

Initial proposed changes
* retry recovery on task commit rename, always (repeat save, delete, rename)
* audit delete use and see if it can be pruned
* maybe: rate limit some IO internally, but not delegate to abfs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7469) NNBench createControlFiles should use thread pool to improve performance.

2024-03-22 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved MAPREDUCE-7469.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> NNBench createControlFiles should use thread pool to improve performance.
> -
>
> Key: MAPREDUCE-7469
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7469
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> NNBench is a good tool for NN performance test. And with multiples maps it 
> will wait long time in createControlFiles.  This can use thread pool to 
> increase concurrency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7470) multi-thread mapreduce committer

2024-03-20 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7470.
---
Resolution: Duplicate

> multi-thread mapreduce committer
> 
>
> Key: MAPREDUCE-7470
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: mrv2
>Reporter: TianyiMa
>Priority: Major
>  Labels: mapreduce, pull-request-available
> Attachments: MAPREDUCE-7470.0.patch
>
>
> In cloud environment, such as aws, aliyun etc., the internet delay is 
> non-trival when we commit thounds of files.
> In our situation, the ping delay is about 0.03ms in IDC, but when move to 
> Coud, the ping delay is about 3ms, which is roughly 100x slower. We found 
> that, committing tens thounds of files will cost a few tens of minutes. The 
> more files there are, the logger it takes.
> So we propose a new committer algorithm, which is a variant of committer 
> algorithm version 1, called 3. In this new algorithm 3, in order to decrease 
> the committer time, we use a thread pool to commit job's final output.
> Our test result in Cloud production shows that, the new algorithm 3 has 
> decrease the committer time by serveral tens of times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7473) Entity id/type not updated for HistoryEvent NORMALIZED_RESOURCE

2024-03-20 Thread Bilwa S T (Jira)

Bilwa S T created MAPREDUCE-7473:


 Summary: Entity id/type not updated for HistoryEvent 
NORMALIZED_RESOURCE
 Key: MAPREDUCE-7473
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7473
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Bilwa S T
Assignee: Bilwa S T


Getting below exception in MR AM logs:

2024-03-09 16:23:30,329 ERROR [Job ATS Event Dispatcher] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error putting 
entity null to TimelineServer
org.apache.hadoop.yarn.exceptions.YarnException: Incomplete entity without 
entity id/type
at 
org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:88)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:187)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1129)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:745)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:241)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:156)
at java.base/java.lang.Thread.run(Thread.java:840)
2024-03-09 16:23:30,332 ERROR [Job ATS Event Dispatcher] 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler: Error putting 
entity null to TimelineServer
org.apache.hadoop.yarn.exceptions.YarnException: Incomplete entity without 
entity id/type
at 
org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:88)
at 
org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:187)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.processEventForTimelineServer(JobHistoryEventHandler.java:1129)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.handleTimelineEvent(JobHistoryEventHandler.java:745)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler.access$1200(JobHistoryEventHandler.java:93)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1795)
at 
org.apache.hadoop.mapreduce.jobhistory.JobHistoryEventHandler$ForwardingEventHandler.handle(JobHistoryEventHandler.java:1791)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:241)
at 
org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:156)
at java.base/java.lang.Thread.run(Thread.java:840)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-7402) In the merge phase, if configuration.set("mapreduce.task.io.sort.factor", "1"), it may lead to an infinite loop

2024-03-11 Thread Jian Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian Zhang reopened MAPREDUCE-7402:
---

> In the merge phase, if configuration.set("mapreduce.task.io.sort.factor", 
> "1"), it may lead to an infinite loop
> ---
>
> Key: MAPREDUCE-7402
>     URL: https://issues.apache.org/jira/browse/MAPREDUCE-7402
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Jian Zhang
>Priority: Minor
>
> function with infinite loop:Merger$MergeQueue#computeBytesInMerges(int 
> factor, int inMem)，



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7472) decode value of hive.query.string for the job Confguration which was encoded by hive

2024-02-01 Thread wangzhongwei (Jira)

wangzhongwei created MAPREDUCE-7472:
---

 Summary: decode value of hive.query.string for the job 
Confguration which was encoded by hive
 Key: MAPREDUCE-7472
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7472
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.3
Reporter: wangzhongwei
Assignee: wangzhongwei
 Attachments: image-2024-02-02-09-44-57-503.png

 the value of  hive.query.string in job Configuratio is URLEncoded by hive and 
written to hdfs,which shoud be decoded before rendered 

!image-2024-02-02-09-44-57-503.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7471) Hadoop mapred minicluster command line fail with class not found

2024-01-27 Thread Duo Zhang (Jira)

Duo Zhang created MAPREDUCE-7471:


 Summary: Hadoop mapred minicluster command line fail with class 
not found
 Key: MAPREDUCE-7471
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7471
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.5
Reporter: Duo Zhang


If you run

./bin/mapred minicluster

It will fail with
{noformat}
Exception in thread "Listener at localhost/35325" 
java.lang.NoClassDefFoundError: org/mockito/stubbing/Answer
at 
org.apache.hadoop.hdfs.MiniDFSCluster.isNameNodeUp(MiniDFSCluster.java:2648)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.isClusterUp(MiniDFSCluster.java:2662)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.waitClusterUp(MiniDFSCluster.java:1510)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:989)
at org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:588)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:530)
at 
org.apache.hadoop.mapreduce.MiniHadoopClusterManager.start(MiniHadoopClusterManager.java:160)
at 
org.apache.hadoop.mapreduce.MiniHadoopClusterManager.run(MiniHadoopClusterManager.java:132)
at 
org.apache.hadoop.mapreduce.MiniHadoopClusterManager.main(MiniHadoopClusterManager.java:320)
Caused by: java.lang.ClassNotFoundException: org.mockito.stubbing.Answer
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 9 more
{noformat}

This line

https://github.com/apache/hadoop/blob/835403d872506c4fa76eb2d721f2d91f413473d5/hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/MiniDFSCluster.java#L2648

This is because we rely on mockito in NameNodeAdapter but we do not have 
mockito on our classpath, at least in our published hadoop-3.3.5 binary.

And there is another problem that, if we do not run the above command in the 
HADOOP_HOME directory, i.e, in another directory by typing the absolute path of 
the mapred command, it will fail with

{noformat}
Exception in thread "main" java.lang.NoClassDefFoundError: org/junit/Assert
at 
org.apache.hadoop.test.GenericTestUtils.assertExists(GenericTestUtils.java:336)
at 
org.apache.hadoop.test.GenericTestUtils.getTestDir(GenericTestUtils.java:280)
at 
org.apache.hadoop.test.GenericTestUtils.getTestDir(GenericTestUtils.java:289)
at 
org.apache.hadoop.hdfs.MiniDFSCluster.getBaseDirectory(MiniDFSCluster.java:3069)
at 
org.apache.hadoop.hdfs.MiniDFSCluster$Builder.(MiniDFSCluster.java:239)
at 
org.apache.hadoop.mapreduce.MiniHadoopClusterManager.start(MiniHadoopClusterManager.java:157)
at 
org.apache.hadoop.mapreduce.MiniHadoopClusterManager.run(MiniHadoopClusterManager.java:132)
at 
org.apache.hadoop.mapreduce.MiniHadoopClusterManager.main(MiniHadoopClusterManager.java:320)
Caused by: java.lang.ClassNotFoundException: org.junit.Assert
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 8 mor
{noformat}

This simply because this line

https://github.com/apache/hadoop/blob/835403d872506c4fa76eb2d721f2d91f413473d5/hadoop-common-project/hadoop-common/src/main/bin/hadoop-functions.sh#L601

We should add the $HADOOP_TOOLS_HOME prefix for the default value of 
HADOOP_TOOLS_DIR.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7470) hadoop MR multi-thread committer

2024-01-19 Thread TianyiMa (Jira)

TianyiMa created MAPREDUCE-7470:
---

 Summary: hadoop MR multi-thread committer
 Key: MAPREDUCE-7470
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7470
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: mrv2
Reporter: TianyiMa


In cloud environment, such as aws, aliyun etc., the internet delay is 
non-trival when we commit thounds of files.

In our situation, the ping delay is about 0.03ms in IDC, but when move to Coud, 
the ping delay is about 3ms, which is roughly 100x slower. We found that, 
committing tens thounds of files will cost a few tens of minutes. The more 
files there are, the logger it takes.

So we propose a new committer algorithm, which is a variant of committer 
algorithm version 1, called 3. In this new algorithm 3, in order to decrease 
the committer time, we use a thread pool to commit job's final output.

Our test result in Cloud production shows that, the new algorithm 3 has 
decrease the committer time by serveral tens of times.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7469) NNBench's createControlFiles should use thread pool to improve performance.

2024-01-18 Thread liuguanghua (Jira)

liuguanghua created MAPREDUCE-7469:
--

 Summary: NNBench's createControlFiles should use thread pool to 
improve performance.
 Key: MAPREDUCE-7469
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7469
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: liuguanghua


NNBench is a good tool for NN performance test. And with multiples maps it will 
wait long time in createControlFiles.  This can use thread pool to increase 
concurrency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7468) Change add-opens flag's default value from true to false

2024-01-15 Thread Benjamin Teke (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke resolved MAPREDUCE-7468.
--
Fix Version/s: 3.4.0
   3.3.7
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Change add-opens flag's default value from true to false
> 
>
> Key: MAPREDUCE-7468
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7468
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.4.0, 3.3.7
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.7
>
>
> To support avoid issues when a newer JobClient is used with Hadoop versions 
> without MAPREDUCE-7449 the default value of 
> mapreduce.jvm.add-opens-as-default should be false. Currently it's true, this 
> can cause if a newer JobClient is used to submit apps, as the placeholder 
> replacement won't happen during a container launch, resulting in a failed 
> submission.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7468) Change add-opens flag's default value from true to false

2024-01-10 Thread Benjamin Teke (Jira)

Benjamin Teke created MAPREDUCE-7468:


 Summary: Change add-opens flag's default value from true to false
 Key: MAPREDUCE-7468
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7468
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Benjamin Teke
Assignee: Benjamin Teke


To support avoid issues when a newer JobClient is used with Hadoop versions 
without MAPREDUCE-7449 the default value of mapreduce.jvm.add-opens-as-default 
should be false. Currently it's true, this can cause if a newer JobClient is 
used to submit apps, as the placeholder replacement won't happen during a 
container launch, resulting in a failed submission.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7466) WordMedian example fails to compute the right median

2023-12-28 Thread Matthew Rossi (Jira)

Matthew Rossi created MAPREDUCE-7466:


 Summary: WordMedian example fails to compute the right median
 Key: MAPREDUCE-7466
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7466
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: examples, test
Reporter: Matthew Rossi


The WordMedian example does not correctly handle the case where the median 
falls exactly between two different values (e.g., the median word legth of 
"Hello Hadoop" should be 5.5).

This affects both the example and its test (i.e., TestWordStats).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7465) performance problem in FileOutputCommiter for big list processed by single thread

2023-12-23 Thread Arnaud Nauwynck (Jira)

Arnaud Nauwynck created MAPREDUCE-7465:
--

 Summary: performance problem in FileOutputCommiter for big list 
processed  by single thread
 Key: MAPREDUCE-7465
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7465
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: performance
Affects Versions: 3.3.6, 3.3.4, 3.3.3, 3.3.5, 3.2.4, 3.3.2, 3.2.3
Reporter: Arnaud Nauwynck


when commiting a big hadoop job (for example via Spark) having many partitions,
the class FileOutputCommiter process thousands of dirs/files to rename with a 
single Thread. This is performance issue, caused by lot of waits on FileStystem 
storage operations.


Notice that sub-class instances of FileOutputCommiter are supposed to be 
created at runtime dependending of a configurable property 
([https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/PathOutputCommitterFactory.java|PathOutputCommitterFactory.java]).

But for example in Parquet + Spark, this is buggy and can not be changed at 
runtime. 
There is an ongoing Jira and PR to fix it in Parquet + Spark: 
[https://issues.apache.org/jira/browse/PARQUET-2416|https://issues.apache.org/jira/browse/PARQUET-2416]





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7464) Make priority of mapreduce containers configurable

2023-12-21 Thread liu bin (Jira)

liu bin created MAPREDUCE-7464:
--

 Summary: Make priority of mapreduce containers configurable
 Key: MAPREDUCE-7464
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7464
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: liu bin


When maps and reduces run simultaneously, if resources are insufficient, 
reduces will be preempted.

Because the priority of reduce is higher than map, the preempted reduces will 
first obtain resources when rerun, and then be preempted again, falling into a 
loop.

We need to configure the priority of map higher than reduce to avoid this 
situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7463) Modify HistoryServerRest.html content,add comma for the /ws/v1/history/mapreduce/jobs Response Body

2023-12-10 Thread wangzhongwei (Jira)

wangzhongwei created MAPREDUCE-7463:
---

 Summary: Modify HistoryServerRest.html content,add  comma for the 
/ws/v1/history/mapreduce/jobs Response Body
 Key: MAPREDUCE-7463
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7463
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: documentation
Affects Versions: 3.3.6, 3.3.3
Reporter: wangzhongwei
 Attachments: image-2023-12-11-15-55-09-196.png

 the /ws/v1/history/mapreduce/jobs Response Body is missing a comma 

!image-2023-12-11-15-55-09-196.png|width=448,height=336!

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7462) Use thread pool to improve the speed of creating control files in TestDFSIO

2023-11-23 Thread farmmamba (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

farmmamba resolved MAPREDUCE-7462.
--
Resolution: Invalid

> Use thread pool to improve the speed of creating control files in TestDFSIO
> ---
>
> Key: MAPREDUCE-7462
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7462
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: examples, test
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Priority: Major
>
> When we use TestDFSIO tool to test the throughouts of HDFS clusters, we found 
> it is so slow in the creating controll files stage. 
> After refering to the source code, we found that method createControlFile try 
> to create control files serially. It can be improved by using thread pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7462) Use thread pool to improve the speed of creating control files in TestDFSIO

2023-11-23 Thread farmmamba (Jira)

farmmamba created MAPREDUCE-7462:


 Summary: Use thread pool to improve the speed of creating control 
files in TestDFSIO
 Key: MAPREDUCE-7462
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7462
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: examples, test
Affects Versions: 3.3.6
Reporter: farmmamba


When we use TestDFSIO tool to test the throughouts of HDFS clusters, we found 
it is so slow in the creating controll files stage. 

After refering to the source code, we found that method createControlFile try 
to create control files serially. It can be improved by using thread pool.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7459) Fixed TestHistoryViewerPrinter flakiness during string comparison

2023-11-03 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved MAPREDUCE-7459.
-
Fix Version/s: 3.4.0
   (was: 3.3.6)
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fixed TestHistoryViewerPrinter flakiness during string comparison 
> --
>
> Key: MAPREDUCE-7459
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7459
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.3.6
> Environment: Java version: openjdk 11.0.20.1
> Maven version: Apache Maven 3.6.3
>Reporter: Rajiv Ramachandran
>Assignee: Rajiv Ramachandran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The test 
> {{_org.apache.hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter#testHumanPrinterAll_}}
> can fail due to flakiness. These flakiness occurs because the test utilizes 
> Hashmaps values and converts the values to string to perform the comparision 
> and the order of the objects returned may not be necessarily maintained. 
> The stack trace is as follows:
> testHumanPrinterAll(org.apache.hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter)
>   Time elapsed: 0.297 s  <<< FAILURE!
> org.junit.ComparisonFailure:
> expected:<...8501754_0001_m_0[7    6-Oct-2011 19:15:09    6-Oct-2011 
> 19:15:16 (7sec)
> SUCCEEDED MAP task list for job_1317928501754_0001
> TaskId        StartTime    FinishTime    Error    InputSplits
> 
> task_1317928501754_0001_m_06    6-Oct-2011 19:15:08    6-Oct-2011 
> 19:15:14 (6sec)
> ...
> /tasklog?attemptid=attempt_1317928501754_0001_m_03]_1
> REDUCE task list...> but was:<...8501754_0001_m_0[5    6-Oct-2011 
> 19:15:07    6-Oct-2011 19:15:12 (5sec)
> SUCCEEDED MAP task list for job_1317928501754_0001
> TaskId        StartTime    FinishTime    Error    InputSplits
> 
> task_1317928501754_0001_m_06    6-Oct-2011 19:15:08    6-Oct-2011 
> 19:15:14 (6sec)
> SUCCEEDED MAP task list for job_1317928501754_0001
> TaskId        StartTime    FinishTime    Error    InputSplits
> 
> task_1317928501754_0001_m_04    6-Oct-2011 19:15:06    6-Oct-2011 
> 19:15:10 (4sec)
> SUCCEEDED MAP task list for job_1317928501754_0001
> TaskId        StartTime    FinishTime    Error    InputSplits
> 
> task_1317928501754_0001_m_07    6-Oct-2011 19:15:09    6-Oct-2011 
> 19:15:16 (7sec)
> ...
> /tasklog?attemptid=attempt_1317928501754_0001_m_06]_1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7461) Fixed assertionComparision failure by resolving xml paths for child elements correctly

2023-11-02 Thread Rajiv Ramachandran (Jira)

Rajiv Ramachandran created MAPREDUCE-7461:
-

 Summary: Fixed assertionComparision failure by resolving xml paths 
for child elements correctly
 Key: MAPREDUCE-7461
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7461
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.3.6
Reporter: Rajiv Ramachandran
 Fix For: 3.3.6


The following tests depend on underlying implementation orders which are not 
guarenteed, while comparing the contents of the generated XML response.

_org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs#testJobIdXML_
_org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs#testJobsXML_

The test attempts to send a HTTP GET request to a specific URL and expects a 
response in XML format. However, XML response order is not necessarily 
guaranteed. When comparing the the XML contents,  The `` tag occurs in 
multiple places inside  field. However, the root element within  is 
not always compared due to non-deterministic order.  The  [getXmlString 
utility|https://github.com/kavvya97/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/test/java/org/apache/hadoop/yarn/webapp/WebServicesTestUtils.java#L78]
 always takes the first  tag irrespective of whether it is nested or in 
root which causes the test to become flaky.




[ERROR] Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.377 s 
<<< FAILURE! - in 
org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs
[ERROR] 
testJobIdXML(org.apache.hadoop.mapreduce.v2.app.webapp.TestAMWebServicesJobs)  
Time elapsed: 8.361 s  <<< FAILURE!
java.lang.AssertionError:
[name]
Expecting:
 "mapreduce.job.acl-view-job"
to match pattern:
 "RandomWriter"




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7457) Limit number of spill files getting created

2023-10-30 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved MAPREDUCE-7457.
---
   Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Target Version/s: 3.4.0
  Resolution: Fixed

> Limit number of spill files getting created
> ---
>
> Key: MAPREDUCE-7457
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7457
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mudit Sharma
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Hi,
>  
> We have been facing some issues where many of our cluster node disks go full 
> because of some rogue applications creating a lot of spill data
> We wanted to fail the app if more than a threshold amount of spill files are 
> written
> Please let us know if any such capability is supported
>  
> If the capability is not there, we are proposing it to support it via a 
> config, we have added a PR for the same: 
> [https://github.com/apache/hadoop/pull/6155]  please let us know your 
> thoughts on it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7460) When "yarn.nodemanager.resource.memory-mb" and "mapreduce.map.memory.mb" work together, the mapreduce sample program blocks

2023-10-25 Thread ECFuzz (Jira)

ECFuzz created MAPREDUCE-7460:
-

 Summary: When "yarn.nodemanager.resource.memory-mb" and 
"mapreduce.map.memory.mb" work together, the mapreduce sample program blocks 
 Key: MAPREDUCE-7460
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7460
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.3.6
Reporter: ECFuzz


My Hadoop version is 3.3.6.

The core-site.xml and hdfs-site.xml are set as default.

yarn-site.xml like below.
{code:java}
 {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-7449) Add add-opens flag to container launch commands on JDK17 nodes

2023-10-19 Thread Benjamin Teke (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke reopened MAPREDUCE-7449:
--

> Add add-opens flag to container launch commands on JDK17 nodes
> --
>
> Key: MAPREDUCE-7449
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7449
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> To allow containers to launch on JDK17 nodes the add-opens flag should be 
> added to the container launch commands if the node has JDK17+, and it 
> shouldn't on previous JDKs. This behaviour should be configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7458) Race condition in TaskReportPBImpl#getProto when generating task reports process in concurrency scenarios

2023-10-10 Thread Tao Yang (Jira)

Tao Yang created MAPREDUCE-7458:
---

 Summary: Race condition in TaskReportPBImpl#getProto when 
generating task reports process in concurrency scenarios
 Key: MAPREDUCE-7458
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7458
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: jobhistoryserver
Reporter: Tao Yang


There is a rare race condition in *TaskReportPBImpl.getProto*  when 
JobHistoryServer getting concurrent getTaskReports requests for the same job at 
the same time.

Exception scenario:
 # client calls JobClient#getTaskReports in parallel for the same job at the 
same time.
 # JobHistoryServer gets these requests and then generating response based on 
*cached* task reports according to 
HistoryClientService$HSClientProtocolHandler#getTaskReports.
 # When the same task report is processed concurrently, we may see 
UnsupportedOperationException exceptions with different stacks as following.

ExceptionStack-1:  TaskReportPBImpl#convertToProtoFormat
{noformat}
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:148)
at java.util.AbstractList.add(AbstractList.java:108)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:330)
at 
org.apache.hadoop.mapreduce.v2.proto.MRProtos$CounterGroupProto$Builder.addAllCounters(MRProtos.java:4393)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.addContersToProto(CounterGroupPBImpl.java:182)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.mergeLocalToBuilder(CounterGroupPBImpl.java:63)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.mergeLocalToProto(CounterGroupPBImpl.java:70)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CounterGroupPBImpl.getProto(CounterGroupPBImpl.java:55)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl.convertToProtoFormat(CountersPBImpl.java:195)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl.access$100(CountersPBImpl.java:38)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl$1$1.next(CountersPBImpl.java:162)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl$1$1.next(CountersPBImpl.java:150)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:329)
at 
org.apache.hadoop.mapreduce.v2.proto.MRProtos$CountersProto$Builder.addAllCounterGroups(MRProtos.java:5102)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl.addCounterGroupsToProto(CountersPBImpl.java:172)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl.mergeLocalToBuilder(CountersPBImpl.java:64)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl.mergeLocalToProto(CountersPBImpl.java:71)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.CountersPBImpl.getProto(CountersPBImpl.java:56)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.TaskReportPBImpl.convertToProtoFormat(TaskReportPBImpl.java:401)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.TaskReportPBImpl.mergeLocalToBuilder(TaskReportPBImpl.java:76)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.TaskReportPBImpl.mergeLocalToProto(TaskReportPBImpl.java:92)
at 
org.apache.hadoop.mapreduce.v2.api.records.impl.pb.TaskReportPBImpl.getProto(TaskReportPBImpl.java:64)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl.convertToProtoFormat(GetTaskReportsResponsePBImpl.java:173)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl.access$100(GetTaskReportsResponsePBImpl.java:36)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl$1$1.next(GetTaskReportsResponsePBImpl.java:138)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl$1$1.next(GetTaskReportsResponsePBImpl.java:127)
at 
com.google.protobuf.AbstractMessageLite$Builder.addAll(AbstractMessageLite.java:329)
at 
org.apache.hadoop.mapreduce.v2.proto.MRServiceProtos$GetTaskReportsResponseProto$Builder.addAllTaskReports(MRServiceProtos.java:7049)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl.addTaskReportsToProto(GetTaskReportsResponsePBImpl.java:150)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl.mergeLocalToBuilder(GetTaskReportsResponsePBImpl.java:62)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl.mergeLocalToProto(GetTaskReportsResponsePBImpl.java:69)
at 
org.apache.hadoop.mapreduce.v2.api.protocolrecords.impl.pb.GetTaskReportsResponsePBImpl.getProto(GetTaskReportsResponsePBImpl.java:54

[jira] [Created] (MAPREDUCE-7457) Limit number of spill files getting created

2023-10-06 Thread Mudit Sharma (Jira)

Mudit Sharma created MAPREDUCE-7457:
---

 Summary: Limit number of spill files getting created
 Key: MAPREDUCE-7457
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7457
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mudit Sharma


Hi,

 

We have been facing some issues where many of our cluster node disks go full 
because of some rogue applications creating a lot of spill data

We wanted to fail the app if more than a threshold amount of spill files are 
written

Please let us know if any such capability is supported



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7456) Extend add-opens flag to container launch commands on JDK17 nodes

2023-09-27 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved MAPREDUCE-7456.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Extend add-opens flag to container launch commands on JDK17 nodes
> -
>
> Key: MAPREDUCE-7456
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7456
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>Affects Versions: 3.4.0
>Reporter: Peter Szucs
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There was a previous ticket for adding add-opens flag to container launch to 
> be able to run them on JDK17 nodes: 
> https://issues.apache.org/jira/browse/MAPREDUCE-7449
> As testing discovered, this should be extended with 
> "{_}-add-exports=java.base/sun.net.dns=ALL-UNNAMED{_}" and 
> "{_}-add-exports=java.base/sun.net.util=ALL-UNNAMED{_}" options to be able to 
> run containers on Isilon.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7455) org.apache.hadoop.mapred.SpillRecord crashes due to overflow in buffer size computation

2023-09-16 Thread ConfX (Jira)

ConfX created MAPREDUCE-7455:


 Summary: org.apache.hadoop.mapred.SpillRecord crashes due to 
overflow in buffer size computation
 Key: MAPREDUCE-7455
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7455
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.3.6
Reporter: ConfX


A large `mapreduce.job.reduces` can cause overflow while computing the byte 
buffer in `org.apache.hadoop.mapred.SpillRecord#SpillRecord(int)`, since the 
byte buffer size equals to `mapreduce.job.reduces` * 
MapTask.MAP_OUTPUT_INDEX_RECORD_LENGTH



To reproduce:
1. set `mapreduce.job.reduces` to 509103844
2. run `mvn surefire:test 
-Dtest=org.apache.hadoop.mapred.TestMapTask#testShufflePermissions`



We create a PR that provides a fix by checking the computed buffer size is 
positive.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7453) Container logs are missing when yarn.app.container.log.filesize is set to default value 0.

2023-09-15 Thread Shilun Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved MAPREDUCE-7453.
---
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Container logs are missing when yarn.app.container.log.filesize is set to 
> default value 0.
> --
>
> Key: MAPREDUCE-7453
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7453
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.3.6
>Reporter: zhengchenyu
>Assignee: zhengchenyu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Since HADOOP-18649, in container-log4j.properties, 
> log4j.appender.\{APPENDER}.MaxFileSize is set to 
> ${yarn.app.container.log.filesize}, but yarn.app.container.log.filesize is 0 
> in default. So log is missing. This log is always rolling and only show the 
> latest log.
> This is the running log like below:
> {code:java}
> Log Type: syslog
> Log Upload Time: Fri Sep 08 11:36:09 +0800 2023
> Log Length: 0
> Log Type: syslog.1
> Log Upload Time: Fri Sep 08 11:36:09 +0800 2023
> Log Length: 179
> 2023-09-08 11:31:34,494 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.util.RackResolver: Got an error when resolve 
> hostNames. Falling back to /default-rack for all.  {code}
> Note: log4j.appender.\{APPENDER}.MaxFileSize is not set before, then use 
> default value 10M, so no problem before HADOOP-18649



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7454) missing checking for null when acquiring appId for a null jobId

2023-09-10 Thread ConfX (Jira)

ConfX created MAPREDUCE-7454:


 Summary: missing checking for null when acquiring appId for a null 
jobId
 Key: MAPREDUCE-7454
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7454
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: ConfX
 Attachments: reproduce.sh

h2. What happened?

null pointer exception is triggered when trying to acquire appId for a null 
jobId 
h2. Where's the bug?

In line 90 of JobResourceUploader.java:
{code:java}
private ApplicationId jobIDToAppId(JobID jobId) {
    return ApplicationId.newInstance(Long.parseLong(jobId.getJtIdentifier()),
        jobId.getId());
  }
{code}
Here the jobId is not checked before generating the `ApplicationId` for it.
h2. How to reproduce?

1. set {{mapreduce.job.sharedcache.mode=archives, 
mapreduce.framework.name=yarn, yarn.sharedcache.enabled=true}}
2. run 
{{org.apache.hadoop.mapreduce.TestJobResourceUploader#testErasureCodingDisabled}}
and observe this exception:
{code:java}
java.lang.NullPointerException
at 
org.apache.hadoop.mapreduce.JobResourceUploader.jobIDToAppId(JobResourceUploader.java:91)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.initSharedCache(JobResourceUploader.java:79)
at 
org.apache.hadoop.mapreduce.JobResourceUploader.uploadResources(JobResourceUploader.java:134)
at 
org.apache.hadoop.mapreduce.TestJobResourceUploader.testErasureCodingSetting(TestJobResourceUploader.java:442)
at 
org.apache.hadoop.mapreduce.TestJobResourceUploader.testErasureCodingDisabled(TestJobResourceUploader.java:380)
{code}
For an easy reproduction, run the {{reproduce.sh}} in the attachment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7453) Container logs are missing when yarn.app.container.log.filesize is set to default value 0.

2023-09-08 Thread zhengchenyu (Jira)

zhengchenyu created MAPREDUCE-7453:
--

 Summary: Container logs are missing when 
yarn.app.container.log.filesize is set to default value 0.
 Key: MAPREDUCE-7453
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7453
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.3.6
Reporter: zhengchenyu
Assignee: zhengchenyu


Since HADOOP-18649, in container-log4j.properties, 
log4j.appender.\{APPENDER}.MaxFileSize is set to 
${yarn.app.container.log.filesize}, but yarn.app.container.log.filesize is 0 in 
default. So log is missing. This log is always rolling and only show the latest 
log.

This is the running log like below:
{code:java}
Log Type: syslog
Log Upload Time: Fri Sep 08 11:36:09 +0800 2023
Log Length: 0

Log Type: syslog.1
Log Upload Time: Fri Sep 08 11:36:09 +0800 2023
Log Length: 179
2023-09-08 11:31:34,494 INFO [AsyncDispatcher event handler] 
org.apache.hadoop.yarn.util.RackResolver: Got an error when resolve hostNames. 
Falling back to /default-rack for all.  {code}
Note: log4j.appender.\{APPENDER}.MaxFileSize is not set before, then use 
default value 10M, so no problem before HADOOP-18649



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7451) review TrackerDistributedCacheManager.checkPermissionOfOther

2023-08-21 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7451.
---
Resolution: Won't Fix

The class TrackerDistributedCacheManager only exists in hadoop releases <= 1.2. 
no need to look at it

> review TrackerDistributedCacheManager.checkPermissionOfOther
> 
>
> Key: MAPREDUCE-7451
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7451
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 0.22.0, 1.2.1
>Reporter: Yiheng Cao
>Priority: Major
>
> TrackerDistributedCacheManager.checkPermissionOfOther() doesn't seem to work 
> reliably



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7452) ManifestCommitter to support / as a destination

2023-08-21 Thread Steve Loughran (Jira)

Steve Loughran created MAPREDUCE-7452:
-

 Summary: ManifestCommitter to support / as a destination
 Key: MAPREDUCE-7452
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7452
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 3.3.6
Reporter: Steve Loughran


you can't commit work to the root of an object store through the manifest 
committer, as it will fail if the destination path exists, which always holds 
for root.

proposed
* check for dest / in job setup; if the path is not root, use 
createNewDirectory() as today
* if the path is root, delete all children but not the dir.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7451) Security Vulnerability - Action Required: “Incorrect Permission Assignment for Critical Resource” vulnerability in the newest version of hadoop

2023-08-17 Thread Yiheng Cao (Jira)

Yiheng Cao created MAPREDUCE-7451:
-

 Summary: Security Vulnerability - Action Required: “Incorrect 
Permission Assignment for Critical Resource” vulnerability in the newest 
version of hadoop
 Key: MAPREDUCE-7451
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7451
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Yiheng Cao


I think the method 
{{org.apache.hadoop.filecache.TrackerDistributedCacheManager.checkPermissionOfOther(FileSystem
 fs, Path path, FsAction action)}} may have an “Incorrect Permission Assignment 
for Critical Resource”vulnerability which is vulnerable in the newest version 
of hadoop. It shares similarities to a recent CVE disclosure _CVE-2017-3166_ in 
the same project _"apache/hadoop"_ project.

    The vulnerability is present in the class 
org.apache.hadoop.filecache.TrackerDistributedCacheManager of method 
checkPermissionOfOther(FileSystem fs, Path path, FsAction action), which is 
responsible for Checking whether the file system object (FileSystem) at the 
specified path has additional user permissions for the specified 
operation(action). {*}But t{*}{*}he check snippet is similar to the vulnerable 
snippet for CVE-2017-3166{*} and may have the same consequence as 
CVE-2017-3166:  {*}a file in an encryption zone with access permissions  will 
be stored in a world-readable location and can be freely shared with any 
application that requests the file to be localized{*}. Therefore, maybe you 
need to fix the vulnerability with much the same fix code as the CVE-2017-3166 
patch. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7449) Add add-opens flag to container launch commands on JDK17 nodes

2023-08-10 Thread Benjamin Teke (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke resolved MAPREDUCE-7449.
--
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Add add-opens flag to container launch commands on JDK17 nodes
> --
>
> Key: MAPREDUCE-7449
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7449
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: mr-am, mrv2
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Assignee: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> To allow containers to launch on JDK17 nodes the add-opens flag should be 
> added to the container launch commands if the node has JDK17+, and it 
> shouldn't on previous JDKs. This behaviour should be configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7450) Set the record delimiter for the input file based on its path

2023-08-09 Thread lvhu (Jira)

lvhu created MAPREDUCE-7450:
---

 Summary: Set the record delimiter for the input file based on its 
path
 Key: MAPREDUCE-7450
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7450
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Affects Versions: 3.3.6
 Environment: Any
Reporter: lvhu
 Fix For: MR-3902


In the mapreduce program, when reading files, we can easily set the record 
delimiter based on the parameter textinputformat.record.delimiter.
This parameter can also be easily set, including Spark, for example:
spark.sparkContext.hadoopConfiguration.set("textinputformat.record.delimiter", 
"|@|")
val rdd = spark.sparkContext.newAPIHadoopFile(...)
But once the textinputformat.record.delimiter parameter is modified, it will 
take effect for all files. In actual scenarios, different files often have 
different delimiters.

In Hive, as Hive does not support programming, we cannot modify the record 
delimiter through the above methods. If modified through a configuration file, 
it will take effect on all Hive tables.
The only way to modify record delimiter in hive is to rewrite a TextInputFormat 
class.
The current method of hive is as follows:
package abc.hive.MyFstTextInputFormat
public class MyFstTextInputFormat extends FileInputFormat 
implements JobConfigurable {
 ...
}
create table test  (  
    id string,  
    name string  
)  stored as  
INPUTFORMAT 'abc.hive.MyFstTextInputFormat'  
If there are multiple different record delimiters, multiple TextInputFormats 
need to be rewritten.

My idea is to modify TextInputFormat class to support setting record delimiter 
for input files based on the prefix of the file path.
The specific idea is to make the following modifications to TextInputFormat:
public class TextInputFormat extends FileInputFormat
  implements JobConfigurable {
  
  public RecordReader getRecordReader(
                                          InputSplit genericSplit, JobConf job,
                                          Reporter reporter)
    throws IOException {
    
    reporter.setStatus(genericSplit.toString());
    // default delimiter
    String delimiter = job.get("textinputformat.record.delimiter");
    //Obtain the path of the file
    String filePath = genericSplit.getPath().toUri().getPath();
    //Obtain a list of file paths and delimiter relationships based on 
configuration file parameters
    Map pathToDelimiterMap = //Obtain by parsing the configuration file
    for(Map.Entry entry: pathToDelimiterMap.entrySet()){
       //config path
       String configPath = entry.getKey();
       //if configPath is the prefix of filePath
       if(filePath.startsWith(configPath)){
         //Set delimiter corresponding to the file path
         delimiter = entry.getValue();
       }   
    });
    byte[] recordDelimiterBytes = null;
    if (null != delimiter) {
      recordDelimiterBytes = delimiter.getBytes(Charsets.UTF_8);
    }
    return new LineRecordReader(job, (FileSplit) genericSplit,
        recordDelimiterBytes);
  }
}

After implementing the record delimiter function of setting input files 
according to the path, not only does it save code to modify the delimiter, but 
it is also very convenient for Hadoop and Spark, without frequent parameter 
configuration modifications.

If you accept my idea, I hope you can assign the task to me. My Github account 
is: lvhu-goodluck
I really hope to contribute code to the community.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7449) Add add-opens flag to container launch commands on JDK17

2023-08-09 Thread Benjamin Teke (Jira)

Benjamin Teke created MAPREDUCE-7449:


 Summary: Add add-opens flag to container launch commands on JDK17
 Key: MAPREDUCE-7449
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7449
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: mr-am, mrv2
Affects Versions: 3.4.0
Reporter: Benjamin Teke
Assignee: Benjamin Teke


To allow containers to launch on JDK17 nodes the add-opens flag should be added 
to the container launch commands if the node has JDK17+, and it shouldn't on 
previous JDKs. This behaviour should be configurable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7446) NegativeArraySizeException when running MR jobs with large data size

2023-08-01 Thread Peter Szucs (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Szucs resolved MAPREDUCE-7446.

Resolution: Fixed

> NegativeArraySizeException when running MR jobs with large data size
> 
>
> Key: MAPREDUCE-7446
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7446
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Peter Szucs
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> We are using bit shifting to double the byte array in IFile's 
> [nextRawValue|https://github.infra.cloudera.com/CDH/hadoop/blob/bef14a39c7616e3b9f437a6fb24fc7a55a676b57/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java#L437]
>  method to store the byte values in it. With large dataset it can easily 
> happen that we shift the leftmost bit when we are calculating the size of the 
> array, which can lead to a negative number as the array size, causing the 
> NegativeArraySizeException.
> It would be safer to expand the backing array with a 1.5x factor, and have a 
> check not to extend Integer's max value during that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Reopened] (MAPREDUCE-7446) NegativeArraySizeException when running MR jobs with large data size

2023-08-01 Thread Benjamin Teke (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke reopened MAPREDUCE-7446:
--

Reopening for 3.2/3.3 backport.

> NegativeArraySizeException when running MR jobs with large data size
> 
>
> Key: MAPREDUCE-7446
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7446
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Reporter: Peter Szucs
>Assignee: Peter Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> We are using bit shifting to double the byte array in IFile's 
> [nextRawValue|https://github.infra.cloudera.com/CDH/hadoop/blob/bef14a39c7616e3b9f437a6fb24fc7a55a676b57/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java#L437]
>  method to store the byte values in it. With large dataset it can easily 
> happen that we shift the leftmost bit when we are calculating the size of the 
> array, which can lead to a negative number as the array size, causing the 
> NegativeArraySizeException.
> It would be safer to expand the backing array with a 1.5x factor, and have a 
> check not to extend Integer's max value during that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7448) Inconsistent Behavior for FileOutputCommitter V1 to commit successfully many times

2023-07-27 Thread ConfX (Jira)

ConfX created MAPREDUCE-7448:


 Summary: Inconsistent Behavior for FileOutputCommitter V1 to 
commit successfully many times
 Key: MAPREDUCE-7448
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7448
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: ConfX
 Attachments: reproduce.sh

h2. What happened

I turned on {{mapreduce.fileoutputcommitter.cleanup.skipped=true}} and then the 
version 1 of {{FileOutputCommitter}} can commit several times, which is 
unexpected.
h2. Where's the problem

In {{{}FileOutputCommitter.commitJobInternal{}}},
{noformat}
if (algorithmVersion == 1) {
        for (FileStatus stat: getAllCommittedTaskPaths(context)) {
          mergePaths(fs, stat, finalOutput, context);
        }
      }      if (skipCleanup) {
        LOG.info("Skip cleanup the _temporary folders under job's output " +
            "directory in commitJob.");
...{noformat}
Here if we skip cleanup, the _temporary folder would not be deleted and the 
_SUCCESS file would also not be created, which cause the {{mergePaths}} next 
time to not fail.
h2. How to reproduce
 # set {{{}mapreduce.fileoutputcommitter.cleanup.skipped{}}}={{{}true{}}}
 # run 
{{org.apache.hadoop.mapred.TestFileOutputCommitter#testCommitterWithDuplicatedCommitV1}}
you should observe
{noformat}
java.lang.AssertionError: Duplicate commit successful: wrong behavior for 
version 1.
    at org.junit.Assert.fail(Assert.java:89)
    at 
org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitInternal(TestFileOutputCommitter.java:295)
    at 
org.apache.hadoop.mapred.TestFileOutputCommitter.testCommitterWithDuplicatedCommitV1(TestFileOutputCommitter.java:269){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7447) Unnecessary NPE encountered when starting CryptoOutputStream with encrypted-intermediate-data

2023-07-27 Thread ConfX (Jira)

ConfX created MAPREDUCE-7447:


 Summary: Unnecessary NPE encountered when starting 
CryptoOutputStream with encrypted-intermediate-data
 Key: MAPREDUCE-7447
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7447
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: ConfX
 Attachments: reproduce.sh

h2. What happened?

Got NullPointerException when initializing a {{{}CryptoOutputStream{}}}.
h2. Where's the bug?

In line 106 of {{{}CryptoOutputStream{}}},the code lacks a check to verify 
whether the key parameter is null or not.
{noformat}
public CryptoOutputStream(OutputStream out, CryptoCodec codec,
      int bufferSize, byte[] key, byte[] iv, long streamOffset,
      boolean closeOutputStream)
      throws IOException {
     ...
    this.key = key.clone();{noformat}
As a result, when the configuration provides a null key, the key.clone() 
operation will throw a NullPointerException.
It is essential to add a null check for the key parameter before using it.
h2. How to reproduce?

(1) set {{mapreduce.job.encrypted-intermediate-data}} to {{true}}
(2) run 
{{org.apache.hadoop.mapreduce.task.reduce.TestMergeManager#testLargeMemoryLimits}}
h2. Stacktrace
h2. Stacktrace
{noformat}
java.lang.NullPointerException
    at 
org.apache.hadoop.crypto.CryptoOutputStream.(CryptoOutputStream.java:106)
    at 
org.apache.hadoop.fs.crypto.CryptoFSDataOutputStream.(CryptoFSDataOutputStream.java:38)
    at 
org.apache.hadoop.mapreduce.CryptoUtils.wrapIfNecessary(CryptoUtils.java:141)
    at 
org.apache.hadoop.mapreduce.security.IntermediateEncryptedStream.wrapIfNecessary(IntermediateEncryptedStream.java:46)
    at 
org.apache.hadoop.mapreduce.task.reduce.OnDiskMapOutput.(OnDiskMapOutput.java:87)
    at 
org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:274)
    at 
org.apache.hadoop.mapreduce.task.reduce.TestMergeManager.verifyReservedMapOutputType(TestMergeManager.java:309)
    at 
org.apache.hadoop.mapreduce.task.reduce.TestMergeManager.testLargeMemoryLimits(TestMergeManager.java:303){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7446) NegativeArraySizeException when running MR jobs with large data size

2023-07-27 Thread Peter Szucs (Jira)

Peter Szucs created MAPREDUCE-7446:
--

 Summary: NegativeArraySizeException when running MR jobs with 
large data size
 Key: MAPREDUCE-7446
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7446
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Peter Szucs
Assignee: Peter Szucs


We are using bit shifting to double the byte array in IFile's 
[nextRawValue|https://github.infra.cloudera.com/CDH/hadoop/blob/bef14a39c7616e3b9f437a6fb24fc7a55a676b57/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java#L437]
 method to store the byte values in it. With large dataset it can easily happen 
that we shift the leftmost bit when we are calculating the size of the array, 
which can lead to a negative number as the array size, causing the 
NegativeArraySizeException.

It would be safer to expand the backing array with a 1.5x factor, and have a 
check not to extend Integer's max value during that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7445) ShuffleSchedulerImpl causes ArithmeticException due to improper detailsInterval value checking

2023-07-24 Thread ConfX (Jira)

ConfX created MAPREDUCE-7445:


 Summary: ShuffleSchedulerImpl causes ArithmeticException due to 
improper detailsInterval value checking
 Key: MAPREDUCE-7445
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7445
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: ConfX
 Attachments: reproduce.sh

h2. What happened

There is no value checking for parameter 
{{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}. This may cause improper 
calculations and crashes the system like division by 0.
h2. Buggy code

In {{{}ShuffleSchedulerImpl.java{}}}, there is no value checking for 
{{maxFetchFailuresBeforeReporting}} and this variable is directly passed to 
method {{{}checkAndInformMRAppMaster{}}}. When 
{{maxFetchFailuresBeforeReporting }} is mistakenly set to 0, the code would 
cause division by 0 and throw ArithmeticException to crash the system.

 
{noformat}
private void checkAndInformMRAppMaster(
     ...
    if (connectExcpt || (reportReadErrorImmediately && readError)
        || ((failures % maxFetchFailuresBeforeReporting) == 0) || hostFailed) {
      ...
  }{noformat}
h2. How to reproduce

(1) set {{{}mapreduce.reduce.shuffle.maxfetchfailures{}}}={{{}0{}}}, 
{{{}mapreduce.reduce.shuffle.notify.readerror{}}}={{{}false{}}}
(2) run {{mvn surefire:test 
-Dtest=org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler#TestSucceedAndFailedCopyMap}}
h2. Stacktrace
{noformat}
java.lang.ArithmeticException: / by zero
    at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.checkAndInformMRAppMaster(ShuffleSchedulerImpl.java:347)
    at 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl.copyFailed(ShuffleSchedulerImpl.java:308)
    at 
org.apache.hadoop.mapreduce.task.reduce.TestShuffleScheduler.TestSucceedAndFailedCopyMap(TestShuffleScheduler.java:285){noformat}
For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7442) exception message is not intusive when accessing the job configuration web UI

2023-07-24 Thread Hui Fei (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved MAPREDUCE-7442.

Resolution: Fixed

> exception message is not intusive when accessing the job configuration web UI
> -
>
> Key: MAPREDUCE-7442
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7442
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: applicationmaster
> Environment: 
>Reporter: Jiandan Yang 
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-07-14-11-23-10-762.png
>
>
> I launched a Teragen job on hadoop-3.3.4 cluster. 
> The web occured an error when I clicked the link of Configuration of Job. The 
> error page said "HTTP ERROR 500 java.lang.IllegalArgumentException: RFC6265 
> Cookie values may not contain character: [ ]", and I can't find any solution 
> by this error message.
> I found some additional stacks in the log of AM, and those stacks reflect 
> yarn did not have the permission of stagging directory. When I give 
> permission to yarn I can access configuration page.
> I think the problem is that the error page does not provide useful or 
> meaningful prompts.
> It's better if there are  message about "yarn does not have hdfs permission" 
> in the error page.
> The snapshot of error page is as follows:
> !image-2023-07-14-11-23-10-762.png!
> The error logs of am are as folllows:
> {code:java}
> 2023-07-14 11:20:08,218 ERROR [qtp1379757019-43] 
> org.apache.hadoop.yarn.webapp.View: Error while reading 
> hdfs://dmp/user/ubd_dmp_test/.staging/job_1689296289020_0006/job.xml
> org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=yarn, access=EXECUTE, 
> inode="/user/ubd_dmp_test/.staging":ubd_dmp_test:ubd_dmp_test:drwx--
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:506)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:422)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:333)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:370)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:240)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:713)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1892)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1910)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:727)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:154)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2089)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:762)
>   at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:458)
>   at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.r

[jira] [Created] (MAPREDUCE-7444) LineReader improperly sets AdditionalRecord, causing extra line read

2023-07-20 Thread ConfX (Jira)

ConfX created MAPREDUCE-7444:


 Summary: LineReader improperly sets AdditionalRecord, causing 
extra line read
 Key: MAPREDUCE-7444
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7444
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: ConfX
 Attachments: reproduce.sh

h2. What happened:

After setting {{io.file.buffer.size}} precisely to some values, the LineReader 
sometimes mistakenly reads an extra line after the split.

Note that this bug is highly critical. A malicious party can create a file that 
makes the while loop run forever given knowledge of the buffer size setting and 
control over the file being read.
h2. Buggy code:

Consider reading a record file with multiple lines. In MapReduce, this is done 
using 
[org.apache.hadoop.mapred.LineRecordReader|https://github.com/confuzz-org/fuzz-hadoop/blob/confuzz/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/LineRecordReader.java#L249],
 which then calls 
[org.apache.hadoop.util.LineReader.readCustomLine|https://github.com/confuzz-org/fuzz-hadoop/blob/confuzz/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/LineReader.java#L268]
 if you specify your own delimiter ('\n' or "\r\n").

Intrinsically, if the file is compressed, the reading of the file is 
implemented using [org.apache.hadoop.mapreduce.lib.input. 
CompressedSplitLineReader|https://github.com/confuzz-org/fuzz-hadoop/blob/confuzz/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CompressedSplitLineReader.java].
 Notice that in its 
[fillBuffer|https://github.com/confuzz-org/fuzz-hadoop/blob/confuzz/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/input/CompressedSplitLineReader.java#L128]
 function to refill the buffer, when {{inDelimiter}} is set to {{{}true{}}}, if 
haven't reached EOF the switch for additional record is set to true (suppose 
not using CRLF).

Now, we sometimes want to read the file in different splits (or up until a 
specific position). Consider the case where the end of the split is a 
delimiter. This is where interesting thing happen. We use an easy example to 
illustrate this. Suppose that the delimiter is 10 bytes long, and the buffer 
size is set precise enough such that the end of the second to last buffer ends 
right in the exact middle of the delimiter. In the final round of loop in 
{{{}readCustomLine{}}}, the buffer would be refilled first:
{noformat}
...
if (bufferPosn >= bufferLength) {
    startPosn = bufferPosn = 0;
    bufferLength = fillBuffer(in, buffer, ambiguousByteCount > 0);
...{noformat}
note that {{needAdditionalRecord}} would be set to {{true}} in 
{{{}CompressedSplitLineReader{}}}.
Next, after some reading and string comparing the second half of the final 
delimiter, the function would try to calculate the bytes read:
{noformat}
int readLength = bufferPosn - startPosn;
bytesConsumed += readLength;
int appendLength = readLength - delPosn;
...
if (appendLength >= 0 && ambiguousByteCount > 0) {
...
    unsetNeedAdditionalRecordAfterSplit();
}{noformat}
note that here the {{readLength}} would be 5 (half of the delimiter length) and 
{{appendLength}} would be {{{}5-10=-5{}}}. Thus, the condition for the {{if}} 
clause would be false and the {{needAdditionalRecord}} would not be unset.
In fact, the {{needAdditionalRecord}} switch would never be unset all the way 
until {{readCustomLine}} ends.
However, it should be set to {{{}false{}}}, or the next time the user calls 
{{next}} the reader would read at least another line since the {{while}} 
condition in {{{}next{}}}:
{noformat}
while (getFilePosition() <= end || in.needAdditionalRecordAfterSplit()) 
{{noformat}
is true due to not reaching EOF and {{needAdditionalRecord}} set to 
{{{}true{}}}.
This can lead to severe problems if every time after the split the start of the 
final buffer falls in the delimiter.
h2. How to reproduce:

(1) Set {{io.file.buffer.size }} to {{188}}
(2) Run test: 
{{org.apache.hadoop.mapred.TestLineRecordReader#testBzipWithMultibyteDelimiter}}

For an easy reproduction please run the {{reproduce.sh}} included in the 
attachment.
h2. StackTrace:
{noformat}
java.lang.AssertionError: Unexpected number of records in split expected:<60> 
but was:<61>
    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.failNotEquals(Assert.java:835)
    at org.junit.Assert.assertEquals(Assert.java:647)
    at 
org.apache.hadoop.mapred.TestLineRecordReader.testSplitRecordsForFile(TestLineRecordReader.java:110)
    at 
org.apache.hadoop.mapred.TestLineRecordReader.testBzipWithMultibyteDelimiter(TestLineRecordReader.java:684){noformat}
For an easy reproduction, run the

[jira] [Created] (MAPREDUCE-7443) state polluter for system file permissions

2023-07-19 Thread ConfX (Jira)

ConfX created MAPREDUCE-7443:


 Summary:  state polluter for system file permissions
 Key: MAPREDUCE-7443
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7443
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: ConfX
 Attachments: reproduce.sh

h2. What happened:

After setting {{fs.permissions.umask-mode}} to disable write permission of a 
file, the file's write permission is also disabled in host permission.
h2. Buggy code:

When creating {{target/test-dir/output}} the RawLocalFileSystem directly 
manipulate the system permission (line 978 of {{{}RawLocalFileSystem.java{}}}):
{noformat}
      String perm = String.format("%04o", permission.toShort());
      Shell.execCommand(Shell.getSetPermissionCommand(perm, false,
        FileUtil.makeShellPath(pathToFile(p), true)));{noformat}
If the permission turns off the write permission to the folder, the test would 
fail due to permission denied. However, the test does not clean the folder 
properly (by chmod and clean in an [@after|https://github.com/after] method), 
causing all the subsequent runs to be polluted.
h2. StackTrace:
{noformat}
java.io.IOException: Mkdirs failed to create 
file:/home/ctestfuzz/fuzz-hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/target/test-dir/output/_temporary/1/_temporary/attempt_200707121733_0001_m_00_0
 (exists=false, 
cwd=file:/home/ctestfuzz/fuzz-hadoop/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core),
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:515),
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:500),
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1195),
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1081),
at 
org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:125),
at 
org.apache.hadoop.mapred.TestFileOutputCommitter.testRecoveryInternal(TestFileOutputCommitter.java:109),
at 
org.apache.hadoop.mapred.TestFileOutputCommitter.testRecoveryUpgradeV1V2(TestFileOutputCommitter.java:171){noformat}
h2. How to reproduce:

There are two ways to reproduce:
 # (1) Set {{fs.permissions.umask-mode}} to {{243}}
(2) Run test: 
{{org.apache.hadoop.mapred.TestFileOutputCommitter#testRecoveryUpgradeV1V2}} 
and observe an IOException
(3) Check to see that current user has lost writing access to 
{{target/test-dir/output}}
 # (1) Add an {{assertTrue(False);}} to line 112 of 
{{TestFileOutputCommitter.java}} to simulate the test failing in the middle
(2) Run test: 
{{org.apache.hadoop.mapred.TestFileOutputCommitter#testRecoveryUpgradeV1V2}} 
and observe an AssertionError
(3) Set {{fs.permissions.umask-mode}} to {{243}}
(4) Run test: 
{{org.apache.hadoop.mapred.TestFileOutputCommitter#testRecoveryUpgradeV1V2}} 
and observe an IOException.
(5) Check to see that the current user has lost writing access to 
{{test-dir/output/_temporary/1/_temporary/attempt_200707121733_0001_m_00_0}}
 

For an easy reproduction, run the reproduce.sh in the attachment.

We are happy to provide a patch if this issue is confirmed.
{{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7442) exception message is not intusive when accessing the job configuration web UI

2023-07-13 Thread Jiandan Yang (Jira)

Jiandan Yang  created MAPREDUCE-7442:


 Summary: exception message is not intusive when accessing the job 
configuration web UI
 Key: MAPREDUCE-7442
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7442
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: applicationmaster
 Environment: 

Reporter: Jiandan Yang 
 Attachments: image-2023-07-14-11-23-10-762.png

I launched a Teragen job on hadoop-3.3.4 cluster. The web occured an error when 
I clicked the link of Configuration of Job. The error page said "HTTP ERROR 500 
java.lang.IllegalArgumentException: RFC6265 Cookie values may not contain 
character: [ ]", and I can't find a solution by this error message.
I found some additional stacks in the log of AM, and those stacks reflect yarn 
did not have the permission of stagging directory. When I give permission to 
yarn I can access configuration page.
I think the problem is that the error page does not provide useful or 
meaningful prompts.
It's better if there are  message about "yarn does not have hdfs permission" in 
the error page.

The snapshot of error page is as follows:
!image-2023-07-14-11-23-10-762.png!

The error logs of am are as folllows:
{code:java}
2023-07-14 11:20:08,218 ERROR [qtp1379757019-43] 
org.apache.hadoop.yarn.webapp.View: Error while reading 
hdfs://dmp/user/ubd_dmp_test/.staging/job_1689296289020_0006/job.xml
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=yarn, access=EXECUTE, 
inode="/user/ubd_dmp_test/.staging":ubd_dmp_test:ubd_dmp_test:drwx--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:506)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:422)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:333)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermissionWithContext(FSPermissionChecker.java:370)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:240)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:713)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1892)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkTraverse(FSDirectory.java:1910)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.resolvePath(FSDirectory.java:727)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:154)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2089)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:762)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:458)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:604)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:572)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:556)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1093)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1043)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:971)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2976)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:902)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:889)
at org.apache.hadoop.hdfs.DFSClient.g

[jira] [Resolved] (MAPREDUCE-7432) Make Manifest Committer the default for abfs and gcs

2023-06-27 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7432.
---
Fix Version/s: 3.4.0
   3.3.9
 Release Note: By default, the mapreduce manifest committer is used for 
jobs working with abfs and gcs.. Hadoop mapreduce jobs will pick this up 
automatically; for Spark it is a bit complicated: read the docs  to see the 
steps required.
   Resolution: Fixed

> Make Manifest Committer the default for abfs and gcs
> 
>
> Key: MAPREDUCE-7432
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7432
> Project: Hadoop Map/Reduce
>  Issue Type: New Feature
>  Components: client
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> Switch to the manifest committer as default for abfs and gcs
> * abfs: needed for performance, scale and resilience under some failure modes
> * gcs: provides correctness through atomic task commit and better job commit 
> performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7441) Race condition in closing FadvisedFileRegion

2023-06-23 Thread Szilard Nemeth (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szilard Nemeth resolved MAPREDUCE-7441.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Race condition in closing FadvisedFileRegion
> 
>
> Key: MAPREDUCE-7441
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7441
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.4.0
>Reporter: Benjamin Teke
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> This issue is similar to the one described in MAPREDUCE-7095, just for 
> FadvisedFileRegion.transferSuccessful. There are warning messages when 
> multiple threads are calling the transferSuccessful method:
> {code:java}
> 2023-05-25 08:41:57,288 WARN org.apache.hadoop.mapred.FadvisedFileRegion: 
> Failed to manage OS cache for 
> /hadoop/data04/yarn/nm/usercache/hive/appcache/application_1684916804740_8245/output/attempt_1684916804740_8245_1_00_001154_0_10003/file.out
> EBADF: Bad file descriptor
> at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:271)
> at 
> org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:148)
> at 
> org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:163)
> at 
> org.apache.hadoop.mapred.ShuffleChannelHandler.lambda$sendMapOutput$0(ShuffleChannelHandler.java:516)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7441) Race condition in closing FadvisedFileRegion

2023-06-23 Thread Benjamin Teke (Jira)

Benjamin Teke created MAPREDUCE-7441:


 Summary: Race condition in closing FadvisedFileRegion
 Key: MAPREDUCE-7441
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7441
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.4.0
Reporter: Benjamin Teke


This issue is similar to the one described in MAPREDUCE-7095, just for 
FadvisedFileRegion.transferSuccessful. There are warning messages when multiple 
threads are calling the transferSuccessful method:

{code:java}
2023-05-25 08:41:57,288 WARN org.apache.hadoop.mapred.FadvisedFileRegion: 
Failed to manage OS cache for 
/hadoop/data04/yarn/nm/usercache/hive/appcache/application_1684916804740_8245/output/attempt_1684916804740_8245_1_00_001154_0_10003/file.out
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:271)
at 
org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:148)
at 
org.apache.hadoop.mapred.FadvisedFileRegion.transferSuccessful(FadvisedFileRegion.java:163)
at 
org.apache.hadoop.mapred.ShuffleChannelHandler.lambda$sendMapOutput$0(ShuffleChannelHandler.java:516)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)


{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7435) ManifestCommitter OOM on azure job

2023-06-12 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7435.
---
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> ManifestCommitter OOM on azure job
> --
>
> Key: MAPREDUCE-7435
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7435
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: client
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> I've got some reports of spark jobs OOM if the manifest committer is used 
> through abfs.
> either the manifests are using too much memory, or something is not working 
> with azure stream memory use (or both).
> before proposing a solution, first step should be to write a test to load 
> many, many manifests, each with lots of dirs and files to see what breaks.
> note: we did have OOM issues with the s3a committer, on teragen but those 
> structures have to include every etag of every block, so the manifest size is 
> O(blocks); the new committer is O(files + dirs).
> {code}
> java.lang.OutOfMemoryError: Java heap space
> at 
> org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readOneBlock(AbfsInputStream.java:314)
> at 
> org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:267)
> at java.io.DataInputStream.read(DataInputStream.java:149)
> at 
> com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:539)
> at 
> com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:133)
> at 
> com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:256)
> at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1656)
> at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1085)
> at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3585)
> at 
> org.apache.hadoop.util.JsonSerialization.fromJsonStream(JsonSerialization.java:164)
> at org.apache.hadoop.util.JsonSerialization.load(JsonSerialization.java:279)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.TaskManifest.load(TaskManifest.java:361)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem.loadTaskManifest(ManifestStoreOperationsThroughFileSystem.java:133)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.lambda$loadManifest$6(AbstractJobOrTaskStage.java:493)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage$$Lambda$231/1813048085.apply(Unknown
>  Source)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda$217/489150849.apply(Unknown
>  Source)
> at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.loadManifest(AbstractJobOrTaskStage.java:492)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.fetchTaskManifest(LoadManifestsStage.java:170)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.processOneManifest(LoadManifestsStage.java:138)
> at 
> org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage$$Lambda$229/137752948.run(Unknown
>  Source)
> at 
> org.apache.hadoop.util.functional.TaskPool$Builder.lambda$runParallel$0(TaskPool.java:410)
> at 
> org.apache.hadoop.util.functional.TaskPool$Builder$$Lambda$230/467893357.run(Unknown
>  Source)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7440) Enhancing Security in Hadoop Delegation Tokens: Phasing out DIGEST-MD5 Auth mechanism

2023-06-06 Thread Saurabh Rai (Jira)

Saurabh Rai created MAPREDUCE-7440:
--

 Summary: Enhancing Security in Hadoop Delegation Tokens: Phasing 
out DIGEST-MD5 Auth mechanism
 Key: MAPREDUCE-7440
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7440
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: security
Reporter: Saurabh Rai


SASL secured connections are commonly configured to negotiate confidential 
(encrypted) connections, known as the "auth-conf" quality of protection. This 
ensures both authentication and data encryption, enhancing the security of wire 
communication. The use of AES encryption, negotiated on "auth-conf" connections 
with Kerberos/GSSAPI, meets the requirements of modern commercial and 
governmental cryptographic regulations and policies.

However, when deploying a YARN job that incorporates a network client expecting 
to negotiate the same level of security ({color:#1d1c1d}for example an HBase 
client, but any code that integrates Hadoop's UGI and related and the JRE's 
SASLClient will be affected{color}). The problem arises from the fact that 
delegation tokens, the only hard-coded option available for tasks, rely on the 
Digest-MD5 SASL mechanism. Unfortunately, the Digest-MD5 negotiation standard 
supports only five outdated and slow ciphers for SASL confidentiality: RC4 (40 
bits key length), RC4 (56 bits key length), RC4 (128 bits key length), DES, and 
Triple DES. Notably, the use of RC4 has been prohibited by the IETF since 2015, 
and DES was compromised in 1999 and subsequently withdrawn as a standard by 
NIST.

The limitations of the Digest-MD5 mechanism have significant implications for 
compliance with modern cryptographic regulations and policies that mandate wire 
encryption. As a result, YARN applications utilizing Digest-MD5 for 
confidentiality negotiation cannot adhere to these requirements. It is worth 
noting that this issue is not documented in the Hadoop documentation or logs, 
potentially leading developers and operators to remain unaware of the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7439) JHS MoveIntermediateToDone thread pool should override CallerContext

2023-05-22 Thread dzcxzl (Jira)

dzcxzl created MAPREDUCE-7439:
-

 Summary: JHS MoveIntermediateToDone thread pool should override 
CallerContext
 Key: MAPREDUCE-7439
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7439
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver
Reporter: dzcxzl


Now the job history server provides RPC services. If the Client passes the 
CallerContext, the RPC will create a MoveIntermediateToDone thread. The 
CallerContext supports parent-child thread variable inheritance, so the 
MoveIntermediateToDone thread may always be the CallerContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7438) Support removal of only selective node states in untracked removal flow

2023-05-21 Thread Mudit Sharma (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mudit Sharma resolved MAPREDUCE-7438.
-
Resolution: Invalid

> Support removal of only selective node states in untracked removal flow
> ---
>
> Key: MAPREDUCE-7438
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7438
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: Mudit Sharma
>Priority: Major
>  Labels: pull-request-available
>
> Currently inactive nodes are removed from the Yarn local memory irrespective 
> of which state they are in. This makes the node removal process not too much 
> configurable
> After this patch: https://issues.apache.org/jira/browse/YARN-10854
> If autoscaling is enabled, lot many nodes go into DECOMMISSIONED state but 
> still other states like LOST, SHUTDOWN are very less and systems might want 
> them to be still visible on UI for better tracking
> The proposal is to introduce a new config, which when set, will allow only 
> selective node states to be removed after going into untracked state.
>  
> Attaching PR for reference: https://github.com/apache/hadoop/pull/5680
> Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7438) Support removal of only selective node states in untracked removal flow

2023-05-21 Thread Mudit Sharma (Jira)

Mudit Sharma created MAPREDUCE-7438:
---

 Summary: Support removal of only selective node states in 
untracked removal flow
 Key: MAPREDUCE-7438
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7438
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Mudit Sharma


Currently inactive nodes are removed from the Yarn local memory irrespective of 
which state they are in. This makes the node removal process not too much 
configurable

After this patch: https://issues.apache.org/jira/browse/YARN-10854

If autoscaling is enabled, lot many nodes go into DECOMMISSIONED state but 
still other states like LOST, SHUTDOWN are very less and systems might want 
them to be still visible on UI for better tracking

The proposal is to introduce a new config, which when set, will allow only 
selective node states to be removed after going into untracked state.

Any thoughts/suggestions/feedbacks are welcome!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7437) spotbugs complaining about .Fetcher's update of a nonatomic static counter

2023-04-25 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7437.
---
Fix Version/s: 3.4.0
   3.3.9
   Resolution: Fixed

> spotbugs complaining about .Fetcher's update of a nonatomic static counter
> --
>
> Key: MAPREDUCE-7437
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7437
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build, client
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> I'm having to do this to get MAPREDUCE-7435 through the build; spotbugs is 
> complaining about the Fetcher constructor incrementing a non-static shared 
> counter. Which is true, just odd it has only just surfaced.
> going to fix as a standalone patch but include that in the commit chain of 
> that PR too



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7437) spotbugs complaining about .Fetcher's update of a nonatomic static counter

2023-04-21 Thread Steve Loughran (Jira)

Steve Loughran created MAPREDUCE-7437:
-

 Summary: spotbugs complaining about .Fetcher's update of a 
nonatomic static counter
 Key: MAPREDUCE-7437
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7437
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: build, client
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


I'm having to do this to get MAPREDUCE-7435 through the build; spotbugs is 
complaining about the Fetcher constructor incrementing a non-static shared 
counter. Which is true, just odd it has only just surfaced.

going to fix as a standalone patch but include that in the commit chain of that 
PR too



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7436) Fix few testcase failure for org.apache.hadoop.mapreduce.v2

2023-03-28 Thread Susheel Gupta (Jira)

Susheel Gupta created MAPREDUCE-7436:


 Summary: Fix few testcase failure for 
org.apache.hadoop.mapreduce.v2
 Key: MAPREDUCE-7436
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7436
 Project: Hadoop Map/Reduce
  Issue Type: Test
  Components: mr-am
Reporter: Susheel Gupta


1) TestUberAM#testThreadDumpOnTaskTimeout 

 
{noformat}
java.lang.AssertionError: No AppMaster log found! 
Expected :1
Actual   :0
    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.failNotEquals(Assert.java:835)
    at org.junit.Assert.assertEquals(Assert.java:647)
    at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1281)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.lang.Thread.run(Thread.java:750){noformat}
2) TestMRJobs#testThreadDumpOnTaskTimeout
{noformat}
java.lang.AssertionError: No thread dump    at 
org.junit.Assert.fail(Assert.java:89)     at 
org.junit.Assert.assertTrue(Assert.java:42)     at 
org.apache.hadoop.mapreduce.v2.TestMRJobs.testThreadDumpOnTaskTimeout(TestMRJobs.java:1273)
     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
  at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)     at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
     at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
     at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
     at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
     at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
     at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
     at java.util.concurrent.FutureTask.run(FutureTask.java:266)     at 
java.lang.Thread.run(Thread.java:750)
{noformat}
3) TestMRJobsWithProfiler#testDefaultProfiler
{noformat}
java.lang.AssertionError: 
Expected :4
Actual   :0

    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.failNotEquals(Assert.java:835)
    at org.junit.Assert.assertEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:633)
    at 
org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler.testProfilerInternal(TestMRJobsWithProfiler.java:225)
    at 
org.apache.hadoop.mapreduce.v2.TestMRJobsWithProfiler.testDefaultProfiler(TestMRJobsWithProfiler.java:116)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
    at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
    at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
    at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.lang.Thread.run(Thread.java:750){noformat}
4) TestMRJobsWithProfiler#testDifferentProfilers
{noformat}
java.lang.AssertionError: 
Expected :4
Actual   :0
    at org.junit.Assert.fail(Assert.java:89)
    at org.junit.Assert.failNotEquals(Assert.java:835)
    at org.junit.Assert.assertEquals(Assert.java:647)
    at org.junit.Assert.assertEquals(Assert.java:633

[jira] [Created] (MAPREDUCE-7435) ManifestCommitter OOM on azure job

2023-03-27 Thread Steve Loughran (Jira)

Steve Loughran created MAPREDUCE-7435:
-

 Summary: ManifestCommitter OOM on azure job
 Key: MAPREDUCE-7435
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7435
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: client
Affects Versions: 3.3.5
Reporter: Steve Loughran
Assignee: Steve Loughran


I've got some reports of spark jobs OOM if the manifest committer is used 
through abfs.

either the manifests are using too much memory, or something is not working 
with azure stream memory use (or both).

before proposing a solution, first step should be to write a test to load many, 
many manifests, each with lots of dirs and files to see what breaks.

note: we did have OOM issues with the s3a committer, on teragen but those 
structures have to include every etag of every block, so the manifest size is 
O(blocks); the new committer is O(files + dirs).

{code}
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.readOneBlock(AbfsInputStream.java:314)
at 
org.apache.hadoop.fs.azurebfs.services.AbfsInputStream.read(AbfsInputStream.java:267)
at java.io.DataInputStream.read(DataInputStream.java:149)
at 
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:539)
at 
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:133)
at 
com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:256)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1656)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:1085)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3585)
at 
org.apache.hadoop.util.JsonSerialization.fromJsonStream(JsonSerialization.java:164)
at org.apache.hadoop.util.JsonSerialization.load(JsonSerialization.java:279)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.TaskManifest.load(TaskManifest.java:361)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.impl.ManifestStoreOperationsThroughFileSystem.loadTaskManifest(ManifestStoreOperationsThroughFileSystem.java:133)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.lambda$loadManifest$6(AbstractJobOrTaskStage.java:493)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage$$Lambda$231/1813048085.apply(Unknown
 Source)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding$$Lambda$217/489150849.apply(Unknown
 Source)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.AbstractJobOrTaskStage.loadManifest(AbstractJobOrTaskStage.java:492)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.fetchTaskManifest(LoadManifestsStage.java:170)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage.processOneManifest(LoadManifestsStage.java:138)
at 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.stages.LoadManifestsStage$$Lambda$229/137752948.run(Unknown
 Source)
at 
org.apache.hadoop.util.functional.TaskPool$Builder.lambda$runParallel$0(TaskPool.java:410)
at 
org.apache.hadoop.util.functional.TaskPool$Builder$$Lambda$230/467893357.run(Unknown
 Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7428) Fix failures related to Junit 4 to Junit 5 upgrade in org.apache.hadoop.mapreduce.v2.app.webapp

2023-02-23 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved MAPREDUCE-7428.
-
Resolution: Fixed

> Fix failures related to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp
> ---
>
> Key: MAPREDUCE-7428
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7428
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Akira Ajisaka
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Few test are getting failed due to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp 
> [https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1071/testReport/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7434) Fix testFailure TestShuffleHandler.testMapFileAccess

2023-02-21 Thread Tamas Domok (Jira)

Tamas Domok created MAPREDUCE-7434:
--

 Summary: Fix testFailure TestShuffleHandler.testMapFileAccess
 Key: MAPREDUCE-7434
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7434
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Affects Versions: 3.4.0
Reporter: Tamas Domok
Assignee: Tamas Domok



https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1143/testReport/junit/org.apache.hadoop.mapred/TestShuffleHandler/testMapFileAccess/

{code}
Error Message
Server returned HTTP response code: 500 for URL: 
http://127.0.0.1:13562/mapOutput?job=job_1_0001=0=attempt_1_0001_m_01_0
Stacktrace
java.io.IOException: Server returned HTTP response code: 500 for URL: 
http://127.0.0.1:13562/mapOutput?job=job_1_0001=0=attempt_1_0001_m_01_0
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1902)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1500)
at 
org.apache.hadoop.mapred.TestShuffleHandler.testMapFileAccess(TestShuffleHandler.java:292)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)
Standard Output
12:04:17.466 [Time-limited test] DEBUG o.a.h.m.lib.MutableMetricsFactory - 
field org.apache.hadoop.metrics2.lib.MutableGaugeInt 
org.apache.hadoop.mapred.ShuffleHandler$ShuffleMetrics.shuffleConnections with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, 
sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[# of 
current shuffle connections])
12:04:17.466 [Time-limited test] DEBUG o.a.h.m.lib.MutableMetricsFactory - 
field org.apache.hadoop.metrics2.lib.MutableCounterLong 
org.apache.hadoop.mapred.ShuffleHandler$ShuffleMetrics.shuffleOutputBytes with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, 
sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, 
value=[Shuffle output in bytes])
12:04:17.466 [Time-limited test] DEBUG o.a.h.m.lib.MutableMetricsFactory - 
field org.apache.hadoop.metrics2.lib.MutableCounterInt 
org.apache.hadoop.mapred.ShuffleHandler$ShuffleMetrics.shuffleOutputsFailed 
with annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, 
sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[# of 
failed shuffle outputs])
12:04:17.466 [Time-limited test] DEBUG o.a.h.m.lib.MutableMetricsFactory - 
field org.apache.hadoop.metrics2.lib.MutableCounterInt 
org.apache.hadoop.mapred.ShuffleHandler$ShuffleMetrics.shuffleOutputsOK with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, 
sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[# of 
succeeeded shuffle outputs])
12:04:17.466 [Time-limited test] DEBUG o.a.h.m.impl.MetricsSystemImpl - 
ShuffleMetrics, Shuffle output metrics
12:04:17.467 [Time-limited test] DEBUG o.a.hadoop.service.AbstractService - 
Service: mapreduce_shuffle entered state INITED
12:04:17.477 [Time-limited test] DEBUG o.a.hadoop.service.AbstractService - 
Config has been overridden during init
12:04:17.478 [Time-limited test] INFO  org.apache.hadoop.mapred.IndexCache - 
IndexCache created with max memory = 10485760
12:04:17.479 [Time-limited test] DEBUG o.a.h.m.lib.MutableMetricsFactory - 
field org.apache.hadoop.metrics2.lib.MutableGaugeInt 
org.apache.hadoop.mapred.ShuffleHandler$ShuffleMetrics.shuffleConnections with 
annotation @org.apache.hadoop.metrics2.annotation.Metric(always=false, 
sampleName=Ops, valueName=Time, about=, interval=10, type=DEFAULT, value=[# of 
current shuffle connections])
12:04:17.479 [Time-limited test] DEBUG o.a.h.m.lib.MutableMetricsFactory - 
field org.apache.hadoop.metrics2.lib.MutableCounterLong 
org.apache.hadoop.mapred.ShuffleHandler$ShuffleMetrics.shuffleOutputBytes with 
annotation @org.apache.hadoop.metrics2

[jira] [Reopened] (MAPREDUCE-7428) Fix failures related to Junit 4 to Junit 5 upgrade in org.apache.hadoop.mapreduce.v2.app.webapp

2023-02-21 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reopened MAPREDUCE-7428:
-

> Fix failures related to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp
> ---
>
> Key: MAPREDUCE-7428
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7428
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Akira Ajisaka
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Few test are getting failed due to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp 
> [https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1071/testReport/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7433) Remove unused mapred/LoggingHttpResponseEncoder.java

2023-02-13 Thread Benjamin Teke (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Teke resolved MAPREDUCE-7433.
--
Hadoop Flags: Reviewed
Target Version/s: 3.4.0
  Resolution: Fixed

> Remove unused mapred/LoggingHttpResponseEncoder.java
> 
>
> Key: MAPREDUCE-7433
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7433
> Project: Hadoop Map/Reduce
>  Issue Type: Task
>Reporter: Tamas Domok
>Assignee: Tamas Domok
>Priority: Major
>  Labels: pull-request-available
>
> It's no longer needed after MAPREDUCE-7431 (I forgot to include the removal 
> in the previous PR).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7433) Remove unused mapred/LoggingHttpResponseEncoder.java

2023-02-13 Thread Tamas Domok (Jira)

Tamas Domok created MAPREDUCE-7433:
--

 Summary: Remove unused mapred/LoggingHttpResponseEncoder.java
 Key: MAPREDUCE-7433
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7433
 Project: Hadoop Map/Reduce
  Issue Type: Task
Reporter: Tamas Domok
Assignee: Tamas Domok


It's no longer needed after MAPREDUCE-7431 (I forgot to include the removal in 
the previous PR).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7432) Make Manifest Committer the default for abfs and gcs

2023-02-09 Thread Steve Loughran (Jira)

Steve Loughran created MAPREDUCE-7432:
-

 Summary: Make Manifest Committer the default for abfs and gcs
 Key: MAPREDUCE-7432
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7432
 Project: Hadoop Map/Reduce
  Issue Type: New Feature
  Components: client
Affects Versions: 3.3.5
Reporter: Steve Loughran


Switch to the manifest committer as default for abfs and gcs

* abfs: needed for performance, scale and resilience under some failure modes
* gcs: provides correctness through atomic task commit and better job commit 
performance



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7375) JobSubmissionFiles don't set right permission after mkdirs

2023-01-12 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-7375.
--
Fix Version/s: 3.4.0
   3.2.5
   3.3.9
   Resolution: Fixed

> JobSubmissionFiles don't set right permission after mkdirs
> --
>
> Key: MAPREDUCE-7375
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7375
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: mrv2
>Affects Versions: 3.3.2
>Reporter: Zhang Dongsheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.5, 3.3.9
>
> Attachments: MAPREDUCE-7375.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> JobSubmissionFiles provide getStagingDir to get Staging Directory.If 
> stagingArea missing, method will create new directory with this.
> {quote}fs.mkdirs(stagingArea, new FsPermission(JOB_DIR_PERMISSION));{quote}
> It seems create new directory with JOB_DIR_PERMISSION,but this permission 
> will be apply by umask.If umask too strict , this permission may be 000(if 
> umask is 700).So we should change permission after create.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7431) ShuffleHandler is not working correctly in SSL mode after the Netty 4 upgrade

2023-01-09 Thread Tamas Domok (Jira)

Tamas Domok created MAPREDUCE-7431:
--

 Summary: ShuffleHandler is not working correctly in SSL mode after 
the Netty 4 upgrade
 Key: MAPREDUCE-7431
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7431
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Affects Versions: 3.4.0
Reporter: Tamas Domok
 Attachments: sendMapPipeline.png

HADOOP-15327 introduced some regressions in the ShuffleHandler.
h3. 1. a memory leak
{code:java}
ERROR io.netty.util.ResourceLeakDetector: LEAK: ByteBuf.release() was not 
called before it's garbage-collected. See 
https://netty.io/wiki/reference-counted-objects.html for more information.
{code}
 
The Shuffle's channelRead didn't release the message properly, the fix would be 
this:
{code:java}
  try {
// 
  } finally {
ReferenceCountUtil.release(msg);
  }
{code}
Or even simpler:
{code:java}
extends SimpleChannelInboundHandler
{code}
h3. 1. a bug in SSL mode with more than 1 reducers

It manifested in multiple errors:
{code:java}
ERROR org.apache.hadoop.mapred.ShuffleHandler: Future is unsuccessful. Cause:
java.io.IOException: Broken pipe

ERROR org.apache.hadoop.mapred.ShuffleHandler: Future is unsuccessful. Cause:
java.nio.channels.ClosedChannelException

// if the reducer memory was not enough, then even this:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in 
shuffle in fetcher#2
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:136)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:377)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
Caused by: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.hadoop.io.compress.BlockDecompressorStream.getCompressedData(BlockDecompressorStream.java:123)
at 
org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:98)
at 
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:105)
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:210)
at 
org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.doShuffle(InMemoryMapOutput.java:91)
{code}
*Configuration* - mapred-site.xml
{code:java}
mapreduce.shuffle.ssl.enabled=true
{code}
Alternative is to build a custom jar where *FadvisedFileRegion* is replaced 
with *FadvisedChunkedFile* in {*}sendMapOutput{*}.

*Reproduction*
{code:java}
hdfs dfs -rm -r -skipTrash /tmp/sort_input
hdfs dfs -rm -r -skipTrash /tmp/sort_output
yarn jar 
hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar
 randomwriter "-Dmapreduce.randomwriter.totalbytes=100" /tmp/sort_input
yarn jar 
hadoop-3.4.0-SNAPSHOT/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.4.0-SNAPSHOT.jar
 sort -Dmapreduce.job.reduce.slowstart.completedmaps=1 -r 40 /tmp/sort_input 
/tmp/sort_output | tee sort_app_output.txt
{code}
h3. ShuffleHandler's protocol
{code:java}
// HTTP Request
GET 
/mapOutput?job=job_1672901779104_0001=0=attempt_1672901779104_0001_m_03_0,attempt_1672901779104_0001_m_02_0,attempt_1672901779104_0001_m_01_0,attempt_1672901779104_0001_m_00_0,attempt_1672901779104_0001_m_05_0,attempt_1672901779104_0001_m_12_0,attempt_1672901779104_0001_m_09_0,attempt_1672901779104_0001_m_10_0,attempt_1672901779104_0001_m_07_0,attempt_1672901779104_0001_m_11_0,attempt_1672901779104_0001_m_08_0,attempt_1672901779104_0001_m_13_0,attempt_1672901779104_0001_m_14_0,attempt_1672901779104_0001_m_15_0,attempt_1672901779104_0001_m_19_0,attempt_1672901779104_0001_m_18_0,attempt_1672901779104_0001_m_16_0,attempt_1672901779104_0001_m_17_0,attempt_1672901779104_0001_m_20_0,attempt_1672901779104_0001_m_23_0
 HTTP/1.1
+ keep alive headers

// HTTP Response Headers
content-length=sum(serialised ShuffleHeader in bytes + MapOutput size)
+ keep alive headers

// Response Data (transfer-encoding=chunked)
serialised ShuffleHeader
content of the MapOutput file (start offset - length)
serialised ShuffleHeader
content of the MapOutput file (start offset - length)
serialised ShuffleHeader
content of the MapOutput file (start offset - length)
serialised ShuffleHeader
content of the MapOutput file (start offset - length)
...
LastHttpContent
// close socket if no keep-alive
{code}
h3. Issues
 - {*}setResponseHeaders{*}: did not always set the the content-length, also 
the transfer-encoding=chunked header was missing.
 - {*}ReduceMapFileCount.operationComplete{*}: messed up t

[jira] [Created] (MAPREDUCE-7430) FileSystemCount enumeration changes will cause mapreduce application failure during upgrade

2022-12-22 Thread Daniel Ma (Jira)

Daniel Ma created MAPREDUCE-7430:


 Summary: FileSystemCount enumeration changes will cause mapreduce 
application failure during upgrade
 Key: MAPREDUCE-7430
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7430
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: Daniel Ma






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7429) In IPV

2022-12-17 Thread Daniel Ma (Jira)

Daniel Ma created MAPREDUCE-7429:


 Summary: In IPV
 Key: MAPREDUCE-7429
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7429
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Daniel Ma






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7428) Fix failures related to Junit 4 to Junit 5 upgrade in org.apache.hadoop.mapreduce.v2.app.webapp

2022-12-14 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7428.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix failures related to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp
> ---
>
> Key: MAPREDUCE-7428
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7428
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Few test are getting failed due to Junit 4 to Junit 5 upgrade in 
> org.apache.hadoop.mapreduce.v2.app.webapp 
> [https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1071/testReport/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7428) Fix failures related to Junit 4 to Junit 5 upgrade in org.apache.hadoop.mapreduce.v2.app.webapp

2022-12-12 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7428:
-

 Summary: Fix failures related to Junit 4 to Junit 5 upgrade in 
org.apache.hadoop.mapreduce.v2.app.webapp
 Key: MAPREDUCE-7428
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7428
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: test
Affects Versions: 3.4.0
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Few test are getting failed due to Junit 4 to Junit 5 upgrade in 
org.apache.hadoop.mapreduce.v2.app.webapp 

[https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1071/testReport/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7401) Optimize liststatus for better performance by using recursive listing

2022-11-29 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7401.
---
Resolution: Won't Fix

> Optimize liststatus for better performance by using recursive listing
> -
>
> Key: MAPREDUCE-7401
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7401
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.3.3
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This change adds recursive listing APIs to FileSystem. The purpose is to 
> enable different FileSystem implementations optimize on the listStatus calls 
> if they can. Default implementation is provided for normal FileSystem 
> implementation which does level by level listing for each directory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7427) Parent directory could be wrong while create done_intermediate directory

2022-11-21 Thread Zhang Dongsheng (Jira)

Zhang Dongsheng created MAPREDUCE-7427:
--

 Summary: Parent directory could be wrong while create 
done_intermediate directory
 Key: MAPREDUCE-7427
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7427
 Project: Hadoop Map/Reduce
  Issue Type: Bug
Reporter: Zhang Dongsheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7390) Remove WhiteBox in mapreduce module.

2022-11-13 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved MAPREDUCE-7390.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Committed to trunk.

> Remove WhiteBox in mapreduce module.
> 
>
> Key: MAPREDUCE-7390
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7390
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> WhiteBox is deprecated, try to remove this method in hadoop-mapreduce.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7386) Maven parallel builds (skipping tests) fail

2022-11-04 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7386.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

in trunk, backport once we are happy that it is stable

> Maven parallel builds (skipping tests) fail
> ---
>
> Key: MAPREDUCE-7386
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7386
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.0, 3.3.5
> Environment: The problem occurred while using the Hadoop development 
> environment (Ubuntu)
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Running a parallel build fails during assembly with the following error when 
> running either package or install:
> {code:java}
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-assembly-plugin:2.4:single 
> (package-mapreduce) on project hadoop-mapreduce: Failed to create assembly: 
> Artifact: org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.4.0-SNAPSHOT 
> (included by module) does not have an artifact with a file. Please ensure the 
> package phase is run before the assembly is generated. {code}
> {code:java}
> Caused by: org.apache.maven.plugin.MojoExecutionException: Failed to create 
> assembly: Artifact: 
> org.apache.hadoop:hadoop-mapreduce-client-core:jar:3.4.0-SNAPSHOT (included 
> by module) does not have an artifact with a file. Please ensure the package 
> phase is run before the assembly is generated.  {code}
> The command executed was:
> {code:java}
> $ mvn -nsu clean install -Pdist,native -DskipTests -Dtar 
> -Dmaven.javadoc.skip=true -T 2C {code}
> Adding dependencies to the assembly plugin configuration addresses the issue 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7425) Document Fix for yarn.app.mapreduce.client-am.ipc.max-retries

2022-11-01 Thread Chris Nauroth (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth resolved MAPREDUCE-7425.
--
Fix Version/s: 3.4.0
   3.3.5
   3.2.5
 Assignee: teng wang
   Resolution: Fixed

> Document Fix for yarn.app.mapreduce.client-am.ipc.max-retries
> -
>
> Key: MAPREDUCE-7425
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7425
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: yarn
>Affects Versions: 3.3.4
>Reporter: teng wang
>Assignee: teng wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5, 3.2.5
>
>
> The document of *yarn.app.mapreduce.client-am.ipc.max-retries* and 
> *yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts* is not detailed 
> and complete. *yarn.app.mapreduce.client-am.ipc.max-retries* is used to 
> *overwrite ipc.client.connect.max.retries* in ClientServiceDelegate.java. So, 
> the document is suggested to fix as: (refer to yarn.client.failover-retries)
>  
> {code:java}
> // mapred-default.xml
> 
>   yarn.app.mapreduce.client-am.ipc.max-retries
>   3
>   The number of client retries to the AM - before reconnecting
> -    to the RM to fetch Application Status.
> +    to the RM to fetch Application Status. 
> +    In other words, it is the ipc.client.connect.max.retries to be used 
> during
> +    reconnecting to the RM and fetching Application Status.
>  {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7426) Fix typo in class StartEndTImesBase

2022-10-29 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved MAPREDUCE-7426.
--
Fix Version/s: 3.4.0
 Assignee: Samrat Deb
   Resolution: Fixed

Merged the PR into trunk.

> Fix typo in class StartEndTImesBase
> ---
>
> Key: MAPREDUCE-7426
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7426
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>Affects Versions: 3.3.4
>Reporter: Samrat Deb
>Assignee: Samrat Deb
>Priority: Trivial
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>
> While going through the code , found some typo in the code related to naming 
> variables 
> - +slowTaskRelativeTresholds+ spells wrong can be fixed to 
> +slowTaskRelativeThresholds+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7411) Use secure XML parser utils in MapReduce

2022-10-26 Thread Steve Loughran (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved MAPREDUCE-7411.
---
Fix Version/s: 3.4.0
   3.3.5
   Resolution: Fixed

merged back to branch-3.3.5

> Use secure XML parser utils in MapReduce
> 
>
> Key: MAPREDUCE-7411
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7411
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Reporter: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.5
>
>
> Uptake of HADOOP-18469



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7425) Document Fix for yarn.app.mapreduce.client-am.ipc.max-retries

2022-10-25 Thread teng wang (Jira)

teng wang created MAPREDUCE-7425:


 Summary: Document Fix for 
yarn.app.mapreduce.client-am.ipc.max-retries
 Key: MAPREDUCE-7425
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7425
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: yarn
Affects Versions: 3.3.4
Reporter: teng wang


The document of *yarn.app.mapreduce.client-am.ipc.max-retries* and 
*yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts* is not detailed and 
complete. *yarn.app.mapreduce.client-am.ipc.max-retries* is used to *overwrite 
ipc.client.connect.max.retries* in ClientServiceDelegate.java. So, the document 
is suggested to fix as: (refer to yarn.client.failover-retries)

 
{code:java}
// mapred-default.xml

  yarn.app.mapreduce.client-am.ipc.max-retries
  3
  The number of client retries to the AM - before reconnecting
-    to the RM to fetch Application Status.
+    to the RM to fetch Application Status. 
+    In other words, it is the ipc.client.connect.max.retries to be used during
+    reconnecting to the RM and fetching Application Status.
 {code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7424) Document Fix: the dependency between mapreduce.job.sharedcache.mode and yarn.sharedcache.enabled

2022-10-19 Thread teng wang (Jira)

teng wang created MAPREDUCE-7424:


 Summary: Document Fix: the dependency between 
mapreduce.job.sharedcache.mode and yarn.sharedcache.enabled
 Key: MAPREDUCE-7424
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7424
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: job submission
Affects Versions: 3.3.4
Reporter: teng wang


Suggestions to fix the document (description of mapreduce.job.sharedcache.mode 
in mapred-default.xml):

There is  one dependency between mapreduce.job.sharedcache.mode and 
yarn.sharedcache.enabled in source code. That is, 
mapreduce.job.sharedcache.mode can work if the shared cache 
(yarn.sharedcache.enabled) is enabled. However, the document 
(mapred-default.xml) does not mention it, which could affects the use of this 
configuration.

 

The dependency code:

```

/* /apache/hadoop/mapreduce/SharedCacheConfig.java */

public void init(Configuration conf) {

    if(!conf.getBoolean(YarnConfiguration.SHARED_CACHE_ENABLED,
        YarnConfiguration.DEFAULT_SHARED_CACHE_ENABLED)) {
      return;
    }


    Collection configs = StringUtils.getTrimmedStringCollection(
        conf.get(MRJobConfig.SHARED_CACHE_MODE,
            MRJobConfig.SHARED_CACHE_MODE_DEFAULT));
    if (configs.contains("files")) {
      this.sharedCacheFilesEnabled = true;
    }

```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7423) The check for maxtaskfailures.per.tracker is missing

2022-10-19 Thread teng wang (Jira)

teng wang created MAPREDUCE-7423:


 Summary: The check for maxtaskfailures.per.tracker is missing
 Key: MAPREDUCE-7423
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7423
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: tasktracker
Affects Versions: 3.3.4
Reporter: teng wang


In the conf file mapred-default.xml, the description of 
*mapreduce.job.maxtaskfailures.per.tracker* is:

The number of task-failures on a node manager of a given job after which new 
tasks of that job aren't assigned to it. It *MUST be less* than 
mapreduce.map.maxattempts and mapreduce.reduce.maxattempts otherwise the failed 
task will never be tried on a different node.

However, there is no checking implement in source code. Violations of the 
dependency will prevent failed tasks from trying on different nodes. So, it is 
suggested to check the dependency of the two configuration parameters before 
using maxtaskfailures.per.tracker.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7422) Upgrade Junit 4 to 5 in hadoop-mapreduce-examples

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7422:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-examples
 Key: MAPREDUCE-7422
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7422
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-examples



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7421) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-jobclient

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7421:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-jobclient
 Key: MAPREDUCE-7421
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7421
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-jobclient



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7420) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-core

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7420:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-core
 Key: MAPREDUCE-7420
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7420
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-core



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7418) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-app

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7418:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-app
 Key: MAPREDUCE-7418
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7418
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-app



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7419) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-common

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7419:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-common
 Key: MAPREDUCE-7419
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7419
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-common



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7416) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-shuffle

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7416:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-shuffle
 Key: MAPREDUCE-7416
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7416
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-shuffle



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7417) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-uploader

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7417:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-uploader
 Key: MAPREDUCE-7417
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7417
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4, 3.3.3
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-uploader



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7415) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-nativetask

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7415:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-nativetask
 Key: MAPREDUCE-7415
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7415
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-nativetask



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7414) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-hs

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7414:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-hs
 Key: MAPREDUCE-7414
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7414
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4, 3.3.3
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-hs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7413) Upgrade Junit 4 to 5 in hadoop-mapreduce-client-hs-plugins

2022-10-13 Thread Ashutosh Gupta (Jira)

Ashutosh Gupta created MAPREDUCE-7413:
-

 Summary: Upgrade Junit 4 to 5 in hadoop-mapreduce-client-hs-plugins
 Key: MAPREDUCE-7413
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7413
 Project: Hadoop Map/Reduce
  Issue Type: Sub-task
  Components: test
Affects Versions: 3.3.4, 3.3.3
Reporter: Ashutosh Gupta
Assignee: Ashutosh Gupta


Upgrade Junit 4 to 5 in hadoop-mapreduce-client-hs-plugins



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7412) I oder headset but I didn't get.

2022-10-06 Thread Ashutosh Gupta (Jira)



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Gupta resolved MAPREDUCE-7412.
---
Resolution: Invalid

> I oder headset but I didn't get. 
> -
>
> Key: MAPREDUCE-7412
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7412
> Project: Hadoop Map/Reduce
>  Issue Type: Bug
>  Components: performance
>Affects Versions: MR-2454
>Reporter: Jovithavl v
>Priority: Major
>  Labels: documentation
> Fix For: MR-3902
>
> Attachments: Screenshot_2022-10-07-03-50-10-098_com.phonepe.app.jpg
>
>   Original Estimate: 2,147,483,647h 21,810,231,541,955m
>  Remaining Estimate: 2,147,483,647h 21,810,231,541,955m
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7412) I oder headset but I didn't get.

2022-10-06 Thread Jovithavl v (Jira)

Jovithavl v created MAPREDUCE-7412:
--

 Summary: I oder headset but I didn't get. 
 Key: MAPREDUCE-7412
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7412
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: performance
Affects Versions: MR-2454
Reporter: Jovithavl v
 Fix For: MR-3902
 Attachments: Screenshot_2022-10-07-03-50-10-098_com.phonepe.app.jpg





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7411) Use secure XML parser utils

2022-10-06 Thread PJ Fanning (Jira)

PJ Fanning created MAPREDUCE-7411:
-

 Summary: Use secure XML parser utils
 Key: MAPREDUCE-7411
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7411
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
Reporter: PJ Fanning


Uptake of HADOOP-18469



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Created] (MAPREDUCE-7410) Expose API to get task ids and individual task report given task Id from org.apache.hadoop.mapreduce.Job

2022-10-06 Thread Ujjawal Kumar (Jira)

Ujjawal Kumar created MAPREDUCE-7410:


 Summary: Expose API to get task ids and individual task report 
given task Id from org.apache.hadoop.mapreduce.Job
 Key: MAPREDUCE-7410
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7410
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: jobhistoryserver, yarn
Reporter: Ujjawal Kumar
 Attachments: Screenshot 2022-10-06 at 4.46.48 PM.png

Currently org.apache.hadoop.mapreduce.Job exposes getTaskReports(TaskType) API 
to fetch task reports of either mapper or reducer. However for MR jobs with 
large number of tasks this causes OOM issues while fetching all task reports as 
seen with JHS (HistoryClientService.getTaskReports), HistoryClientService also 
exposes an API getTaskReport() where a TaskId can be provided within the 
GetTaskReportRequest. org.apache.hadoop.mapreduce.Job can expose 2 API so that 
individual task report can be fetched after listing them from client side
 # Job.getTasks(TaskType) -> List - This would return TaskId of all 
tasks with given Type to the client
 # Job.getTaskReport(TaskId) -> TaskReport - This would return task report for 
single task to the client

For JHS since JobHistoryParser.parse already parses full history file by 
default and maintains the list of tasks within JobHistoryParser.JobInfo's 
tasksMap, this info should be easy to get

One additional thing that needs to be seen is if this can be supported for 
requests which are redirected to MRClientService (within MRAppMaster) for 
running jobs



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

[jira] [Resolved] (MAPREDUCE-7407) Avoid stopContainer() on dead node

2022-09-15 Thread Jira



 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-7407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved MAPREDUCE-7407.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Avoid stopContainer() on dead node
> --
>
> Key: MAPREDUCE-7407
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-7407
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>Affects Versions: 3.3.4
>Reporter: Ashutosh Gupta
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> If a container failed to launch earlier due to terminated instances, it has 
> already been removed from the container hash map. Avoiding the kill() for 
> CONTAINER_REMOTE_CLEANUP will avoid wasting 15min per container on 
> retries/timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: mapreduce-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-h...@hadoop.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 7278 matches

Mail list logo