[jira] [Commented] (HADOOP-13986) UGI.UgiMetrics.renewalFailureTotal is not printable

2021-02-24 Thread Jim Huang (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-13986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290735#comment-17290735
 ] 

Jim Huang commented on HADOOP-13986:


Hi, 

I am running into this exact same error message with this version of Hadoop:
{code:java}
$ hadoop version
Hadoop 2.7.3.2.6.5.4-1
Subversion g...@github.com:hortonworks/hadoop.git -r 
3091053c59a62c82d82c9f778c48bde5ef0a89a1
Compiled by jenkins on 2018-05-16T11:28Z
Compiled with protoc 2.5.0
>From source with checksum abed71da5bc89062f6f6711179f2058
This command was run using 
/usr/hdp/2.6.5.4-1/hadoop/hadoop-common-2.7.3.2.6.5.4-1.jar{code}
 

It seems this ticket has been open for awhile.  Is there any work around (or 
additional references) for this issue while a patch is being planned?


For reference, I am also getting the same WARN messages (retries) and finally 
an ERROR:

 
{code:java}
21/02/25 05:09:01 WARN security.UserGroupInformation: Exception encountered 
while running the renewal command for f...@example.com. (TGT end 
time:148425260, renewalFailures: 
org.apache.hadoop.metrics2.lib.MutableGaugeInt@7b84d23f,renewalFailuresTotal: 
org.apache.hadoop.metrics2.lib.MutableGaugeLong@3e86532b)
ExitCodeException exitCode=1: kinit: Ticket expired while renewing credentials
21/02/25 05:09:01 ERROR security.UserGroupInformation: TGT is expired. Aborting 
renew thread for f...@example.com.
{code}
 

 

 

> UGI.UgiMetrics.renewalFailureTotal is not printable
> ---
>
> Key: HADOOP-13986
> URL: https://issues.apache.org/jira/browse/HADOOP-13986
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.8.0, 3.0.0-alpha2
>Reporter: Wei-Chiu Chuang
>Priority: Minor
>
> The metrics (renewalFailures and renewalFailuresTotal) in the  following code 
> snippet are not printable.
> {code:title=UserGroupInformation.java}
> metrics.renewalFailuresTotal.incr();
> final long tgtEndTime = tgt.getEndTime().getTime();
> LOG.warn("Exception encountered while running the renewal "
> + "command for {}. (TGT end time:{}, renewalFailures: {},"
> + "renewalFailuresTotal: {})", getUserName(), tgtEndTime,
> metrics.renewalFailures, metrics.renewalFailuresTotal, ie);
> {code}
> The output of the code is like the following:
> {quote}
> 2017-01-12 12:23:14,062 WARN  security.UserGroupInformation 
> (UserGroupInformation.java:run(1012)) - Exception encountered while running 
> the renewal command for f...@example.com. (TGT end time:148425260, 
> renewalFailures: 
> org.apache.hadoop.metrics2.lib.MutableGaugeInt@323aa7f9,renewalFailuresTotal: 
> org.apache.hadoop.metrics2.lib.MutableGaugeLong@c8af058)
> ExitCodeException exitCode=1: kinit: krb5_cc_get_principal: No credentials 
> cache file found
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17546) Update Description of hadoop-http-auth-signature-secret in HttpAuthentication.md

2021-02-24 Thread Ravuri Sushma sree (Jira)
Ravuri Sushma sree created HADOOP-17546:
---

 Summary: Update Description of hadoop-http-auth-signature-secret 
in HttpAuthentication.md
 Key: HADOOP-17546
 URL: https://issues.apache.org/jira/browse/HADOOP-17546
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Ravuri Sushma sree
Assignee: Ravuri Sushma sree


The HttpAuthentication.md document says "The same secret should be used for all 
nodes in the cluster, ResourceManager, NameNode, DataNode and NodeManager"  but 
the secret should be different for each service. This description is updated in 
[core-default.xml|https://github.com/apache/hadoop/commit/d82009599a2e9f48050e0c41440b36c759ec068f#diff-268b9968a4db21ac6eeb7bcaef10e4db744d00ba53989fc7251bb3e8d9eac7df]
 but has to be updated in HttpAuthentication.md as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17545) Provide snapshot builds on Maven central

2021-02-24 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290702#comment-17290702
 ] 

Akira Ajisaka commented on HADOOP-17545:


Created a job to publish the snapshots: 
https://ci-hadoop.apache.org/job/Hadoop-branch-3.3-Commit/

> Provide snapshot builds on Maven central
> 
>
> Key: HADOOP-17545
> URL: https://issues.apache.org/jira/browse/HADOOP-17545
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Roberts
>Priority: Minor
>
> Hey everyone, I'm looking to build the shaded Hadoop/Flink jar using the very 
> latest Hadoop code that isn't yet in a Maven repository AFAIK (I'm looking at 
> [https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/).] Entirely 
> possible that's not the right place...
>  
> Are there plans or is anyone working on, or have I just missed, binaries 
> being available (say, Hadoop 3.3-snapshot)?
>  
> I remember working on Spark and that was a thing, IIRC you could set a flag 
> to true in any of the pom.xmls to accept snapshots and not published things.
>  
> It's entirely possible I've just forgot how to do it and can't see it well 
> documented anywhere, and I don't believe I have to go through the steps of 
> setting up a Maven repository somewhere (I want to do the build in Docker, 
> and in the pom.xml I would love to just say: use Hadoop version 3.3-snapshot).
>  
> To give some context, I would like to build the Hadoop/Flink shaded jar using 
> the yet to be released Hadoop 3.3 branch's code, so I can then go ahead and 
> security scan that and test it out.
>  
> Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17545) Provide snapshot builds on Maven central

2021-02-24 Thread Akira Ajisaka (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290698#comment-17290698
 ] 

Akira Ajisaka commented on HADOOP-17545:


The snapshot is available here: 
[https://repository.apache.org/content/groups/snapshots/org/apache/hadoop/].
 It seems that 3.3.1-SNAPSHOT is not published. I'm trying to publish the 
snapshot.

> Provide snapshot builds on Maven central
> 
>
> Key: HADOOP-17545
> URL: https://issues.apache.org/jira/browse/HADOOP-17545
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Roberts
>Priority: Minor
>
> Hey everyone, I'm looking to build the shaded Hadoop/Flink jar using the very 
> latest Hadoop code that isn't yet in a Maven repository AFAIK (I'm looking at 
> [https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/).] Entirely 
> possible that's not the right place...
>  
> Are there plans or is anyone working on, or have I just missed, binaries 
> being available (say, Hadoop 3.3-snapshot)?
>  
> I remember working on Spark and that was a thing, IIRC you could set a flag 
> to true in any of the pom.xmls to accept snapshots and not published things.
>  
> It's entirely possible I've just forgot how to do it and can't see it well 
> documented anywhere, and I don't believe I have to go through the steps of 
> setting up a Maven repository somewhere (I want to do the build in Docker, 
> and in the pom.xml I would love to just say: use Hadoop version 3.3-snapshot).
>  
> To give some context, I would like to build the Hadoop/Flink shaded jar using 
> the yet to be released Hadoop 3.3 branch's code, so I can then go ahead and 
> security scan that and test it out.
>  
> Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17527) ABFS: Fix boundary conditions in InputStream seek and skip

2021-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17527?focusedWorklogId=557736&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-557736
 ]

ASF GitHub Bot logged work on HADOOP-17527:
---

Author: ASF GitHub Bot
Created on: 25/Feb/21 05:54
Start Date: 25/Feb/21 05:54
Worklog Time Spent: 10m 
  Work Description: sumangala-patki commented on a change in pull request 
#2698:
URL: https://github.com/apache/hadoop/pull/2698#discussion_r582563511



##
File path: 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsInputStreamStatistics.java
##
@@ -100,28 +100,31 @@ public void testSeekStatistics() throws IOException {
 AbfsOutputStream out = null;
 AbfsInputStream in = null;
 
+int readBufferSize = getConfiguration().getReadBufferSize();

Review comment:
   modified to use original buffer and reduced readBufferSize





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 557736)
Time Spent: 4h 20m  (was: 4h 10m)

> ABFS: Fix boundary conditions in InputStream seek and skip
> --
>
> Key: HADOOP-17527
> URL: https://issues.apache.org/jira/browse/HADOOP-17527
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sumangala Patki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Modify AbfsInputStream seek method to throw EOF exception on seek to 
> contentLength for a non-empty file. With this change, it will no longer be 
> possible for the inputstream position (as obtained by getPos() API) to be 
> moved to contentlength manually, except post reading the last byte.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] sumangala-patki commented on a change in pull request #2698: HADOOP-17527. ABFS: Fix boundary conditions in InputStream seek and skip

2021-02-24 Thread GitBox


sumangala-patki commented on a change in pull request #2698:
URL: https://github.com/apache/hadoop/pull/2698#discussion_r582563511



##
File path: 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAbfsInputStreamStatistics.java
##
@@ -100,28 +100,31 @@ public void testSeekStatistics() throws IOException {
 AbfsOutputStream out = null;
 AbfsInputStream in = null;
 
+int readBufferSize = getConfiguration().getReadBufferSize();

Review comment:
   modified to use original buffer and reduced readBufferSize





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17527) ABFS: Fix boundary conditions in InputStream seek and skip

2021-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17527?focusedWorklogId=557735&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-557735
 ]

ASF GitHub Bot logged work on HADOOP-17527:
---

Author: ASF GitHub Bot
Created on: 25/Feb/21 05:53
Start Date: 25/Feb/21 05:53
Worklog Time Spent: 10m 
  Work Description: sumangala-patki commented on a change in pull request 
#2698:
URL: https://github.com/apache/hadoop/pull/2698#discussion_r582563181



##
File path: 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRandomRead.java
##
@@ -402,6 +400,18 @@ public void testSkipAndAvailableAndPosition() throws 
Exception {
   inputStream.getPos());
   assertEquals(testFileLength - inputStream.getPos(),
   inputStream.available());
+
+  skipped = inputStream.skip(testFileLength + 1); //goes to last byte
+  assertEquals(1, inputStream.available());
+  bytesRead = inputStream.read(buffer);
+  assertEquals(1, bytesRead);
+  assertEquals(testFileLength, inputStream.getPos());

Review comment:
   getPos() will return contentlength post a read to (incl) EOF; however, 
seek/skip to invalid position is not supported, so getPos after any of these 
ops will return valid position





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 557735)
Time Spent: 4h 10m  (was: 4h)

> ABFS: Fix boundary conditions in InputStream seek and skip
> --
>
> Key: HADOOP-17527
> URL: https://issues.apache.org/jira/browse/HADOOP-17527
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.0
>Reporter: Sumangala Patki
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Modify AbfsInputStream seek method to throw EOF exception on seek to 
> contentLength for a non-empty file. With this change, it will no longer be 
> possible for the inputstream position (as obtained by getPos() API) to be 
> moved to contentlength manually, except post reading the last byte.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] sumangala-patki commented on a change in pull request #2698: HADOOP-17527. ABFS: Fix boundary conditions in InputStream seek and skip

2021-02-24 Thread GitBox


sumangala-patki commented on a change in pull request #2698:
URL: https://github.com/apache/hadoop/pull/2698#discussion_r582563181



##
File path: 
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRandomRead.java
##
@@ -402,6 +400,18 @@ public void testSkipAndAvailableAndPosition() throws 
Exception {
   inputStream.getPos());
   assertEquals(testFileLength - inputStream.getPos(),
   inputStream.available());
+
+  skipped = inputStream.skip(testFileLength + 1); //goes to last byte
+  assertEquals(1, inputStream.available());
+  bytesRead = inputStream.read(buffer);
+  assertEquals(1, bytesRead);
+  assertEquals(testFileLength, inputStream.getPos());

Review comment:
   getPos() will return contentlength post a read to (incl) EOF; however, 
seek/skip to invalid position is not supported, so getPos after any of these 
ops will return valid position





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] Neilxzn commented on pull request #2719: YARN-10649. fix RMNodeImpl.updateExistContainers leak

2021-02-24 Thread GitBox


Neilxzn commented on pull request #2719:
URL: https://github.com/apache/hadoop/pull/2719#issuecomment-785600706


   > There is no regression test, so the error occurred.
   
   modify TestRMNodeTransitions.testDisappearingContainer



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] Hexiaoqiao commented on a change in pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

2021-02-24 Thread GitBox


Hexiaoqiao commented on a change in pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721#discussion_r582534061



##
File path: hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml
##
@@ -4352,6 +4352,17 @@
   
 
 
+
+  dfs.client.packet.recovery.max.times

Review comment:
   dfs.client.pipeline.recovery.max-retries?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17532) Yarn Job execution get failed when LZ4 Compression Codec is used

2021-02-24 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290663#comment-17290663
 ] 

L. C. Hsieh commented on HADOOP-17532:
--

I saw lz4-java is excluded from org.apache.kafka in the patch. Does the another 
version come from it? Isn't it enough?

[~csun] What do you think? I remember we wanted it to be provided scope. I 
don't think we should make it as compile scope now.

> Yarn Job execution get failed when LZ4 Compression Codec is used
> 
>
> Key: HADOOP-17532
> URL: https://issues.apache.org/jira/browse/HADOOP-17532
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bhavik Patel
>Priority: Major
> Attachments: HADOOP-17532.001.patch, LZ4.png, lz4-test.jpg
>
>
> When we try to compress a file using the LZ4 codec compression type then the 
> yarn job gets failed with the error message :
> {code:java}
> net.jpountz.lz4.LZ4Compressorcompres(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17532) Yarn Job execution get failed when LZ4 Compression Codec is used

2021-02-24 Thread Bhavik Patel (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290659#comment-17290659
 ] 

Bhavik Patel commented on HADOOP-17532:
---

Yes, but there is another version(1.2.0)  of the same jar present and due to 
that, we are getting this error.

> Yarn Job execution get failed when LZ4 Compression Codec is used
> 
>
> Key: HADOOP-17532
> URL: https://issues.apache.org/jira/browse/HADOOP-17532
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bhavik Patel
>Priority: Major
> Attachments: HADOOP-17532.001.patch, LZ4.png, lz4-test.jpg
>
>
> When we try to compress a file using the LZ4 codec compression type then the 
> yarn job gets failed with the error message :
> {code:java}
> net.jpountz.lz4.LZ4Compressorcompres(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on a change in pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


amahussein commented on a change in pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722#discussion_r582374895



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
##
@@ -229,6 +229,31 @@ public static int uniqueSequenceId() {
 return sequence.incrementAndGet();
   }
 
+  /**
+   * Creates a directory for the data/logs of the unit test.
+   * It first delete the directory if it exists.
+   *
+   * @param testClass the unit test class.
+   * @param defaultTargetRootDir the directory where the class directory is
+   * created.
+   * @return the Path of the root directory.
+   */
+  public static Path setupTestRootDir(Class testClass,
+  String defaultTargetRootDir) {
+// setup the test root directory
+String targetTestDir =
+System.getProperty(SYSPROP_TEST_DATA_DIR, defaultTargetRootDir);
+Path testRootDirPath =
+new Path(targetTestDir, testClass.getSimpleName());
+System.setProperty(GenericTestUtils.SYSPROP_TEST_DATA_DIR,
+testRootDirPath.toString());

Review comment:
   Also, if I do not set this property, the diff will be very large to get 
around it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on a change in pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


amahussein commented on a change in pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722#discussion_r582373743



##
File path: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java
##
@@ -43,8 +45,21 @@
  * The DFS filesystem is formated before the testcase starts and after it ends.
  */
 public abstract class ClusterMapReduceTestCase {
+  private static final String TEST_ROOT_DEFAULT_PATH =
+  System.getProperty("test.build.data", "target/test-dir");
+  private static Path testRootDir;
+
   private MiniDFSCluster dfsCluster = null;
-  private MiniMRCluster mrCluster = null;
+  private MiniMRClientCluster mrCluster = null;
+
+  protected static void setupClassBase(Class testClass) throws Exception {
+// setup the test root directory
+testRootDir = GenericTestUtils.setupTestRootDir(testClass,
+TEST_ROOT_DEFAULT_PATH);
+System.setProperty(GenericTestUtils.SYSPROP_TEST_DATA_DIR,

Review comment:
   It will be set for the JVM executing the unittest. Each unit test should 
then have its own property. Then they should not overlap.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on a change in pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


amahussein commented on a change in pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722#discussion_r582373195



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
##
@@ -229,6 +229,31 @@ public static int uniqueSequenceId() {
 return sequence.incrementAndGet();
   }
 
+  /**
+   * Creates a directory for the data/logs of the unit test.
+   * It first delete the directory if it exists.
+   *
+   * @param testClass the unit test class.
+   * @param defaultTargetRootDir the directory where the class directory is
+   * created.
+   * @return the Path of the root directory.
+   */
+  public static Path setupTestRootDir(Class testClass,
+  String defaultTargetRootDir) {
+// setup the test root directory
+String targetTestDir =
+System.getProperty(SYSPROP_TEST_DATA_DIR, defaultTargetRootDir);
+Path testRootDirPath =
+new Path(targetTestDir, testClass.getSimpleName());
+System.setProperty(GenericTestUtils.SYSPROP_TEST_DATA_DIR,
+testRootDirPath.toString());

Review comment:
   There has to be a global parameter so that all the modules will use the 
desired folder. The property is being read everywhere by different modules and 
this ends up picking different directories.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein commented on a change in pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


amahussein commented on a change in pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722#discussion_r582372117



##
File path: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java
##
@@ -43,8 +45,21 @@
  * The DFS filesystem is formated before the testcase starts and after it ends.
  */
 public abstract class ClusterMapReduceTestCase {
+  private static final String TEST_ROOT_DEFAULT_PATH =
+  System.getProperty("test.build.data", "target/test-dir");

Review comment:
   The point that if the property "test.build.data" is set globally to the 
unit tests, then the `ClusterMapReduceTestCase.java` should also reads this 
value. Otherwise, all unit tests would not consume the global properties set to 
the JVM. It will be a mess in the folders





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] jbrennan333 commented on a change in pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


jbrennan333 commented on a change in pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722#discussion_r582364631



##
File path: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java
##
@@ -43,8 +45,21 @@
  * The DFS filesystem is formated before the testcase starts and after it ends.
  */
 public abstract class ClusterMapReduceTestCase {
+  private static final String TEST_ROOT_DEFAULT_PATH =
+  System.getProperty("test.build.data", "target/test-dir");

Review comment:
   Ahh. I see it now.  They are different.  But I don't think you need to 
call getProperty.  Just pass "target/test-dir" as the default for 
setupTestRootDir.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] jbrennan333 commented on a change in pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


jbrennan333 commented on a change in pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722#discussion_r582272551



##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
##
@@ -229,6 +229,31 @@ public static int uniqueSequenceId() {
 return sequence.incrementAndGet();
   }
 
+  /**
+   * Creates a directory for the data/logs of the unit test.
+   * It first delete the directory if it exists.
+   *
+   * @param testClass the unit test class.
+   * @param defaultTargetRootDir the directory where the class directory is
+   * created.
+   * @return the Path of the root directory.
+   */
+  public static Path setupTestRootDir(Class testClass,
+  String defaultTargetRootDir) {
+// setup the test root directory
+String targetTestDir =
+System.getProperty(SYSPROP_TEST_DATA_DIR, defaultTargetRootDir);
+Path testRootDirPath =
+new Path(targetTestDir, testClass.getSimpleName());
+System.setProperty(GenericTestUtils.SYSPROP_TEST_DATA_DIR,
+testRootDirPath.toString());

Review comment:
   I'm a little uneasy about overwriting this property.   I think I see why 
you are doing it, so getTestDir will do the right thing, but I worry that it 
might lead to unexpected behavior because getTestDir is used everywhere.  At a 
minimum, I think the comment on the function should clearly state that this 
changes the property.

##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
##
@@ -245,6 +270,18 @@ public static File getTestDir() {
 return dir;
   }
 
+  /**
+   * Cleans-up the root directory from the property
+   * {@link #SYSPROP_TEST_DATA_DIR}.
+   *
+   * @return the absolute file of the test root directory.
+   */
+  public static File clearTestRootDir() {

Review comment:
   This is only called from setupTestRootDir(), so I don't think it needs 
to be a separate public function.  I would just do the fullyDelete() call 
in-line in setup TestRootDir.

##
File path: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java
##
@@ -43,8 +45,21 @@
  * The DFS filesystem is formated before the testcase starts and after it ends.
  */
 public abstract class ClusterMapReduceTestCase {
+  private static final String TEST_ROOT_DEFAULT_PATH =
+  System.getProperty("test.build.data", "target/test-dir");
+  private static Path testRootDir;
+
   private MiniDFSCluster dfsCluster = null;
-  private MiniMRCluster mrCluster = null;
+  private MiniMRClientCluster mrCluster = null;
+
+  protected static void setupClassBase(Class testClass) throws Exception {
+// setup the test root directory
+testRootDir = GenericTestUtils.setupTestRootDir(testClass,
+TEST_ROOT_DEFAULT_PATH);
+System.setProperty(GenericTestUtils.SYSPROP_TEST_DATA_DIR,

Review comment:
   I don't think we should set the property here if we are doing it in 
setupTestRootDir().

##
File path: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/src/test/java/org/apache/hadoop/mapred/ClusterMapReduceTestCase.java
##
@@ -43,8 +45,21 @@
  * The DFS filesystem is formated before the testcase starts and after it ends.
  */
 public abstract class ClusterMapReduceTestCase {
+  private static final String TEST_ROOT_DEFAULT_PATH =
+  System.getProperty("test.build.data", "target/test-dir");
+  private static Path testRootDir;
+
   private MiniDFSCluster dfsCluster = null;
-  private MiniMRCluster mrCluster = null;
+  private MiniMRClientCluster mrCluster = null;
+
+  protected static void setupClassBase(Class testClass) throws Exception {
+// setup the test root directory
+testRootDir = GenericTestUtils.setupTestRootDir(testClass,
+TEST_ROOT_DEFAULT_PATH);
+System.setProperty(GenericTestUtils.SYSPROP_TEST_DATA_DIR,

Review comment:
   One question is whether we should get the original value and then reset 
it in an afterClass method?  Is every unit test run in a separate jvm instance? 
  Wouldn't want a following test to get the wrong setting.

##
File path: 
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/test/GenericTestUtils.java
##
@@ -229,6 +229,31 @@ public static int uniqueSequenceId() {
 return sequence.incrementAndGet();
   }
 
+  /**
+   * Creates a directory for the data/logs of the unit test.
+   * It first delete the directory if it exists.
+   *
+   * @param testClass the unit test class.
+   * @param defaultTargetRootDir the directory where the class directory is

Review comment:
   This description is a little unclear.  Maybe default relative path to 
use if SYSPROP_TEST_DATA_DIR is not set? 
   

##
File path: 
hadoop-mapreduce-project/hadoop-mapreduce-client/ha

[jira] [Commented] (HADOOP-17532) Yarn Job execution get failed when LZ4 Compression Codec is used

2021-02-24 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290239#comment-17290239
 ] 

L. C. Hsieh commented on HADOOP-17532:
--

Why remove {{provided}}? I think we intentionally make it as 
provided scope.

> Yarn Job execution get failed when LZ4 Compression Codec is used
> 
>
> Key: HADOOP-17532
> URL: https://issues.apache.org/jira/browse/HADOOP-17532
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bhavik Patel
>Priority: Major
> Attachments: HADOOP-17532.001.patch, LZ4.png, lz4-test.jpg
>
>
> When we try to compress a file using the LZ4 codec compression type then the 
> yarn job gets failed with the error message :
> {code:java}
> net.jpountz.lz4.LZ4Compressorcompres(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein opened a new pull request #2722: MAPREDUCE-7320. organize test directories for ClusterMapReduceTestCase

2021-02-24 Thread GitBox


amahussein opened a new pull request #2722:
URL: https://github.com/apache/hadoop/pull/2722


   [MAPREDUCE-7320: ClusterMapReduceTestCase does not clean 
directories](https://issues.apache.org/jira/browse/MAPREDUCE-7320)
   
   Running Junits that extend ClusterMapReduceTestCase generate lots of 
directories and folders all over the place.
   
   This PR addresses organizing the directories generated by the unit test, 
cleaning them up at the beginning of the execution if necessary.
   
   - It touches `MiniYARNCluster.java` in order to change the base dir to 
`target/test-dir/$TEST_CLASS_NAME`
   - test classes affected
   
 - TestMRJobClient,
 - TestStreamingBadRecords,
 - TestClusterMapReduceTestCase,
 - TestBadRecords.
 - TestMRCJCJobClient,
 - TestJobName
   
   I tested the TestUnits that use `MiniYARNCluster.java` such as:
   
   ```bash
   Class
   MiniYARNCluster
   
   TestOSSMiniYarnCluster  (3 usages found)
   TestMRTimelineEventHandling  (4 usages found)
   TestJobHistoryEventHandler  (3 usages found)
   TestHadoopArchiveLogs  (3 usages found)
   TestHadoopArchiveLogsRunner  (3 usages found)
   TestDynamometerInfra  (3 usages found)
   TestDSTimelineV10
   TestDSTimelineV20
   TestDSTimelineV15
   TestUnmanagedAMLauncher  (3 usages found)
   TestApplicationMasterServiceProtocolForTimelineV2
   TestFederationRMFailoverProxyProvider  (3 usages found)
   TestHedgingRequestRMFailoverProxyProvider  (4 usages found)
   TestNoHaRMFailoverProxyProvider  (5 usages found)
   TestRMFailover  (4 usages found)
   TestAMRMClient
   TestAMRMClientPlacementConstraints
   TestAMRMProxy  (5 usages found)
   TestNMClient  (3 usages found)
   TestOpportunisticContainerAllocationE2E  (3 usages found)
   TestYarnClient  (3 usages found)
   TestYarnClientWithReservation  (12 usages found)
   TestYarnCLI  (7 usages found)
   TestContainerManagerSecurity  (2 usages found)
   TestDiskFailures  (2 usages found)
   TestMiniYarnCluster  (9 usages found)
   TestMiniYARNClusterForHA  (2 usages found)
   TestMiniYarnClusterNodeUtilization  (3 usages found)
   TestEncryptedShuffle
   ```
   
   
   MAPREDUCE-7320. fix TestEncryptedShuffle paths
   MAPREDUCE-7320. move cleaning up root directory to setup
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] amahussein closed pull request #2705: MAPREDUCE-7320. cleanup test data after ClusterMapReduceTestCase

2021-02-24 Thread GitBox


amahussein closed pull request #2705:
URL: https://github.com/apache/hadoop/pull/2705


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] aajisaka commented on pull request #2719: YARN-10649. fix RMNodeImpl.updateExistContainers leak

2021-02-24 Thread GitBox


aajisaka commented on pull request #2719:
URL: https://github.com/apache/hadoop/pull/2719#issuecomment-785230019


   This is the report: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2719/1/artifact/out/report.html
   There is no regression test, so the error occurred.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] Neilxzn commented on pull request #2719: YARN-10649. fix RMNodeImpl.updateExistContainers leak

2021-02-24 Thread GitBox


Neilxzn commented on pull request #2719:
URL: https://github.com/apache/hadoop/pull/2719#issuecomment-785181543


   cc @jojochuang  Can you help me review this patch?And it failed to build on 
jenkins but I can't find any errors about it.   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17545) Provide snapshot builds on Maven central

2021-02-24 Thread Adam Roberts (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Roberts updated HADOOP-17545:
--
Description: 
Hey everyone, I'm looking to build the shaded Hadoop/Flink jar using the very 
latest Hadoop code that isn't yet in a Maven repository AFAIK (I'm looking at 
[https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/).] Entirely 
possible that's not the right place...

 

Are there plans or is anyone working on, or have I just missed, binaries being 
available (say, Hadoop 3.3-snapshot)?

 

I remember working on Spark and that was a thing, IIRC you could set a flag to 
true in any of the pom.xmls to accept snapshots and not published things.

 

It's entirely possible I've just forgot how to do it and can't see it well 
documented anywhere, and I don't believe I have to go through the steps of 
setting up a Maven repository somewhere (I want to do the build in Docker, and 
in the pom.xml I would love to just say: use Hadoop version 3.3-snapshot).

 

To give some context, I would like to build the Hadoop/Flink shaded jar using 
the yet to be released Hadoop 3.3 branch's code, so I can then go ahead and 
security scan that and test it out.

 

Thanks in advance!

  was:
Hey everyone, I'm looking to build the shaded Hadoop/Flink jar using the very 
latest Hadoop code that isn't yet in an artifactory repository AFAIK (I'm 
looking at [https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/).] 

 

Are there plans or is anyone working on, or have I just missed, binaries being 
available (say, Hadoop 3.3-snapshot)?

 

I remember working on Spark and that was a thing, IIRC you could set a flag to 
true in any of the pom.xmls to accept snapshots and not published things.

 

It's entirely possible I've just forgot how to do it and can't see it well 
documented anywhere, and I don't believe I have to go through the steps of 
setting up a Maven repository somewhere (I want to do the build in Docker, and 
in the pom.xml I would love to just say: use Hadoop version 3.3-snapshot).

 

To give some context, I would like to build the Hadoop/Flink shaded jar using 
the yet to be released Hadoop 3.3 branch's code, so I can then go ahead and 
security scan that and test it out.

 

Thanks in advance!


> Provide snapshot builds on Maven central
> 
>
> Key: HADOOP-17545
> URL: https://issues.apache.org/jira/browse/HADOOP-17545
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Adam Roberts
>Priority: Minor
>
> Hey everyone, I'm looking to build the shaded Hadoop/Flink jar using the very 
> latest Hadoop code that isn't yet in a Maven repository AFAIK (I'm looking at 
> [https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/).] Entirely 
> possible that's not the right place...
>  
> Are there plans or is anyone working on, or have I just missed, binaries 
> being available (say, Hadoop 3.3-snapshot)?
>  
> I remember working on Spark and that was a thing, IIRC you could set a flag 
> to true in any of the pom.xmls to accept snapshots and not published things.
>  
> It's entirely possible I've just forgot how to do it and can't see it well 
> documented anywhere, and I don't believe I have to go through the steps of 
> setting up a Maven repository somewhere (I want to do the build in Docker, 
> and in the pom.xml I would love to just say: use Hadoop version 3.3-snapshot).
>  
> To give some context, I would like to build the Hadoop/Flink shaded jar using 
> the yet to be released Hadoop 3.3 branch's code, so I can then go ahead and 
> security scan that and test it out.
>  
> Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17545) Provide snapshot builds on Maven central

2021-02-24 Thread Adam Roberts (Jira)
Adam Roberts created HADOOP-17545:
-

 Summary: Provide snapshot builds on Maven central
 Key: HADOOP-17545
 URL: https://issues.apache.org/jira/browse/HADOOP-17545
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Reporter: Adam Roberts


Hey everyone, I'm looking to build the shaded Hadoop/Flink jar using the very 
latest Hadoop code that isn't yet in an artifactory repository AFAIK (I'm 
looking at [https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/).] 

 

Are there plans or is anyone working on, or have I just missed, binaries being 
available (say, Hadoop 3.3-snapshot)?

 

I remember working on Spark and that was a thing, IIRC you could set a flag to 
true in any of the pom.xmls to accept snapshots and not published things.

 

It's entirely possible I've just forgot how to do it and can't see it well 
documented anywhere, and I don't believe I have to go through the steps of 
setting up a Maven repository somewhere (I want to do the build in Docker, and 
in the pom.xml I would love to just say: use Hadoop version 3.3-snapshot).

 

To give some context, I would like to build the Hadoop/Flink shaded jar using 
the yet to be released Hadoop 3.3 branch's code, so I can then go ahead and 
security scan that and test it out.

 

Thanks in advance!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] qizhu-lucas opened a new pull request #2721: HDFS-15856: Make recover the pipeline in same packet exceed times for…

2021-02-24 Thread GitBox


qizhu-lucas opened a new pull request #2721:
URL: https://github.com/apache/hadoop/pull/2721


   … stream closed configurable.
   
   @jojochuang Could you help review this?
   Thanks.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-17532) Yarn Job execution get failed when LZ4 Compression Codec is used

2021-02-24 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289901#comment-17289901
 ] 

Takanobu Asanuma edited comment on HADOOP-17532 at 2/24/21, 12:53 PM:
--

Sorry, I'm not familiar with this topic.

 [~csun] / [~viirya] Could you take a look since it is introduced by 
HADOOP-17292?


was (Author: tasanuma0829):
Sorry, I'm not familiar with this topic.

Could you take a look since it is introduced by HADOOP-17292, [~csun] and 
[~viirya]?

> Yarn Job execution get failed when LZ4 Compression Codec is used
> 
>
> Key: HADOOP-17532
> URL: https://issues.apache.org/jira/browse/HADOOP-17532
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bhavik Patel
>Priority: Major
> Attachments: HADOOP-17532.001.patch, LZ4.png, lz4-test.jpg
>
>
> When we try to compress a file using the LZ4 codec compression type then the 
> yarn job gets failed with the error message :
> {code:java}
> net.jpountz.lz4.LZ4Compressorcompres(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17532) Yarn Job execution get failed when LZ4 Compression Codec is used

2021-02-24 Thread Takanobu Asanuma (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289901#comment-17289901
 ] 

Takanobu Asanuma commented on HADOOP-17532:
---

Sorry, I'm not familiar with this topic.

Could you take a look since it is introduced by HADOOP-17292, [~csun] and 
[~viirya]?

> Yarn Job execution get failed when LZ4 Compression Codec is used
> 
>
> Key: HADOOP-17532
> URL: https://issues.apache.org/jira/browse/HADOOP-17532
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Bhavik Patel
>Priority: Major
> Attachments: HADOOP-17532.001.patch, LZ4.png, lz4-test.jpg
>
>
> When we try to compress a file using the LZ4 codec compression type then the 
> yarn job gets failed with the error message :
> {code:java}
> net.jpountz.lz4.LZ4Compressorcompres(Ljava/nio/ByteBuffer;Ljava/nio/ByteBuffer;)V
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories

2021-02-24 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289884#comment-17289884
 ] 

Steve Loughran commented on HADOOP-17531:
-

go for it. 

# The abfs listing speedup actually does >1 page prefetch; we could consider 
that for S3A too. Maybe. There's recycling of the S3A LlstRequestV2 class 
between requests and you can't kick off request 2 until async request 1 is in, 
so you'd still have only one thread fetching. We could just build up a list of 
>1 page worth of results if there's a mismatch between consumer and supplier.
# hdfs, webhdfs and now s3a and abfs are the sole stores where 
listStatusIterator() does more than wrap listStatus; for hdfs/webhdfs its for 
keeping page size down (scale), for the cloud stores its because list is so 
slow, and iteration can help swallow the cost.
# if listFiles(recursive) could be used for the scan, then we'd really see 
speedups on S3a

Anyway, yes, distcp speedup where possible is good.

Note also s3a and (soon) abfs RemoteIterator objects do/will implement 
IOStatisticsSource -you can collect stats on all their IO and performance. Log 
their toString() Value at debug (See IOStatisticsLogging) and you can get 
summaries. 

ps:

# {{hadoop fs -ls}} uses listStatusIterator. 
# PoC of a higher performance copyFromLocal command for cloud storage; uses 
listFiles(path, recursive=true), picks off the largest files first (so they 
don't become stragglers), then randomises the rest to reduce shard throttling: 
https://github.com/steveloughran/cloudstore/blob/trunk/src/main/java/org/apache/hadoop/fs/tools/cloudup/Cloudup.java

> DistCp: Reduce memory usage on copying huge directories
> ---
>
> Key: HADOOP-17531
> URL: https://issues.apache.org/jira/browse/HADOOP-17531
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Critical
> Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log
>
>
> Presently distCp, uses the producer-consumer kind of setup while building the 
> listing, the input queue and output queue are both unbounded, thus the 
> listStatus grows quite huge.
> Rel Code Part :
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635
> This goes on bredth-first traversal kind of stuff(uses queue instead of 
> earlier stack), so if you have files at lower depth, it will like open up the 
> entire tree and the start processing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17542) Avoid unsafe split and append on fields that might be IPv6 literals

2021-02-24 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289859#comment-17289859
 ] 

Steve Loughran commented on HADOOP-17542:
-

# needs to be submitted as a github PR, thanks
# Path is one of those critical-path classes. We are going to have to be so 
careful here -which means lots of tests for it

> Avoid unsafe split and append on fields that might be IPv6 literals
> ---
>
> Key: HADOOP-17542
> URL: https://issues.apache.org/jira/browse/HADOOP-17542
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.1.1
>Reporter: ANANDA G B
>Priority: Minor
>  Labels: ipv6
> Attachments: HADOOP-17542-HADOOP-11890-001.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-17510) Hadoop prints sensitive Cookie information.

2021-02-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HADOOP-17510:
-
Fix Version/s: 3.2.3
   2.10.2
   3.1.5
   3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Merged. Thanks for review and patch!

> Hadoop prints sensitive Cookie information.
> ---
>
> Key: HADOOP-17510
> URL: https://issues.apache.org/jira/browse/HADOOP-17510
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1, 3.4.0, 3.1.5, 2.10.2, 3.2.3
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.AuthCookieHandler#setAuthCookie
>  - prints cookie information in log. Any sensitive infomation in Cookies will 
> be logged, which needs to be avaided.
> LOG.trace("Setting token value to {} ({})", authCookie, oldCookie);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17510) Hadoop prints sensitive Cookie information.

2021-02-24 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reassigned HADOOP-17510:


Assignee: Renukaprasad C

> Hadoop prints sensitive Cookie information.
> ---
>
> Key: HADOOP-17510
> URL: https://issues.apache.org/jira/browse/HADOOP-17510
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.AuthCookieHandler#setAuthCookie
>  - prints cookie information in log. Any sensitive infomation in Cookies will 
> be logged, which needs to be avaided.
> LOG.trace("Setting token value to {} ({})", authCookie, oldCookie);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Work logged] (HADOOP-17510) Hadoop prints sensitive Cookie information.

2021-02-24 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17510?focusedWorklogId=556810&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-556810
 ]

ASF GitHub Bot logged work on HADOOP-17510:
---

Author: ASF GitHub Bot
Created on: 24/Feb/21 09:29
Start Date: 24/Feb/21 09:29
Worklog Time Spent: 10m 
  Work Description: jojochuang merged pull request #2673:
URL: https://github.com/apache/hadoop/pull/2673


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
---

Worklog Id: (was: 556810)
Time Spent: 1h  (was: 50m)

> Hadoop prints sensitive Cookie information.
> ---
>
> Key: HADOOP-17510
> URL: https://issues.apache.org/jira/browse/HADOOP-17510
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.3.0
>Reporter: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> org.apache.hadoop.security.authentication.client.AuthenticatedURL.AuthCookieHandler#setAuthCookie
>  - prints cookie information in log. Any sensitive infomation in Cookies will 
> be logged, which needs to be avaided.
> LOG.trace("Setting token value to {} ({})", authCookie, oldCookie);



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] jojochuang merged pull request #2673: HADOOP-17510. Hadoop prints sensitive Cookie information.

2021-02-24 Thread GitBox


jojochuang merged pull request #2673:
URL: https://github.com/apache/hadoop/pull/2673


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories

2021-02-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289795#comment-17289795
 ] 

Ayush Saxena commented on HADOOP-17531:
---

Planning to proceed with a PR with the proposed solution in a day or two, would 
enable that by means of a config, if the config isn't set, the present flow 
will not get affected, So, s3 won't be impacted.

For HDFS or other FS, where listing isn't an issue but memory is, this can be 
enabled by this property. My present use case is HDFS to HDFS and HDFS to s3, 
Will keep a follow up Jira open to sort out s3 stuff post that...

[~ste...@apache.org]/[~rajesh.balamohan] let me know if you folks posses any 
objections

> DistCp: Reduce memory usage on copying huge directories
> ---
>
> Key: HADOOP-17531
> URL: https://issues.apache.org/jira/browse/HADOOP-17531
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Critical
> Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log
>
>
> Presently distCp, uses the producer-consumer kind of setup while building the 
> listing, the input queue and output queue are both unbounded, thus the 
> listStatus grows quite huge.
> Rel Code Part :
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635
> This goes on bredth-first traversal kind of stuff(uses queue instead of 
> earlier stack), so if you have files at lower depth, it will like open up the 
> entire tree and the start processing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] Neilxzn commented on pull request #2719: YARN-10649. fix RMNodeImpl.updateExistContainers leak

2021-02-24 Thread GitBox


Neilxzn commented on pull request #2719:
URL: https://github.com/apache/hadoop/pull/2719#issuecomment-784905670


   @bibinchundatt @macroadster  Can you please review this patch? thx



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] Neilxzn removed a comment on pull request #2719: YARN-10649. fix RMNodeImpl.updateExistContainers leak

2021-02-24 Thread GitBox


Neilxzn removed a comment on pull request #2719:
URL: https://github.com/apache/hadoop/pull/2719#issuecomment-784854345


   @bibinchundatt Can you please review this patch? thx



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] qizhu-lucas commented on pull request #2720: YARN-10650: Create dispatcher metrics interface, and apply to RM asyn…

2021-02-24 Thread GitBox


qizhu-lucas commented on pull request #2720:
URL: https://github.com/apache/hadoop/pull/2720#issuecomment-784904274


   @ericbadger  @jojochuang  @bibinchundatt  @jbrennan333 
   Could you help review this to realize the dispatcher event metrics monitor 
about event time and event counters.
   I think this is very helpful to big cluster.
   
   Thanks.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] qizhu-lucas opened a new pull request #2720: YARN-10650: Create dispatcher metrics interface, and apply to RM asyn…

2021-02-24 Thread GitBox


qizhu-lucas opened a new pull request #2720:
URL: https://github.com/apache/hadoop/pull/2720


   …c dispatcher.
   
   Now there are no dispatcher detailed metrics about event type counters and 
processing time metrics. It will be very helpful to big cluster event monitor. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[GitHub] [hadoop] qizhu-lucas closed pull request #2570: YARN-8557: Exclude lagged/unhealthy/decommissioned nodes in async all…

2021-02-24 Thread GitBox


qizhu-lucas closed pull request #2570:
URL: https://github.com/apache/hadoop/pull/2570


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org