[jira] [Created] (HADOOP-17905) Modify Text.ensureCapacity() to efficiently max out the backing array size
Peter Bacsko created HADOOP-17905: - Summary: Modify Text.ensureCapacity() to efficiently max out the backing array size Key: HADOOP-17905 URL: https://issues.apache.org/jira/browse/HADOOP-17905 Project: Hadoop Common Issue Type: Improvement Reporter: Peter Bacsko Assignee: Peter Bacsko This is a continuation of HADOOP-17901. Right now we use a factor of 1.5x to increase the byte array if it's full. However, if the size reaches a certain point, the increment is only (current size + length). This can cause performance issues if the textual data which we intend to store is beyond this point. Instead, let's max out the array to the maximum. Based on different sources, this is usually determined to be Integer.MAX_VALUE - 8 (see ArrayList, AbstractCollection, HashTable, etc). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951
Peter Bacsko created HADOOP-17901: - Summary: Performance degradation in Text.append() after HADOOP-16951 Key: HADOOP-17901 URL: https://issues.apache.org/jira/browse/HADOOP-17901 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Peter Bacsko Assignee: Peter Bacsko We discovered a serious performance degradation in {{Text.append()}}. The problem is that the logic which intends to increase the size of the backing array does not work as intended. It's very difficult to spot, so I added extra logs to see what happens. Let's add 4096 bytes of textual data in a loop: {noformat} public static void main(String[] args) { Text text = new Text(); String toAppend = RandomStringUtils.randomAscii(4096); for(int i = 0; i < 100; i++) { text.append(toAppend.getBytes(), 0, 4096); } } {noformat} With some debug printouts, we can observe: {noformat} 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - length: 24576, len: 4096, utf8ArraySize: 4096, bytes.length: 30720 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length + (length >> 1): 36864 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(254)) - length + len: 28672 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:ensureCapacity(287)) - >>> enhancing capacity from 30720 to 36864 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(251)) - length: 28672, len: 4096, utf8ArraySize: 4096, bytes.length: 36864 2021-09-08 13:35:29,528 INFO [main] io.Text (Text.java:append(253)) - length + (length >> 1): 43008 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length + len: 32768 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) - >>> enhancing capacity from 36864 to 43008 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(251)) - length: 32768, len: 4096, utf8ArraySize: 4096, bytes.length: 43008 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(253)) - length + (length >> 1): 49152 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:append(254)) - length + len: 36864 2021-09-08 13:35:29,529 INFO [main] io.Text (Text.java:ensureCapacity(287)) - >>> enhancing capacity from 43008 to 49152 ... {noformat} After a certain number of {{append()}} calls, subsequent capacity increments are small. It's because the difference between two {{length + (length >> 1)}} values is always 6144 bytes. Because the size of the backing array is trailing behind the calculated value, the increment will also be 6144 bytes. This means that new arrays are constantly created. Suggested solution: don't calculate the capacity in advance based on length. Instead, pass the required minimum to {{ensureCapacity()}}. Then the increment should depend on the actual size of the byte array if the desired capacity is larger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: Article: Cost-Efficient Open Source Big Data Platform at Uber
Hi Akira, from the article, it's not clear to me what they mean by saying "sophisticated features". It is true that the container assignment code path is very complicated and understanding it takes quite a bit of time and effort. So in order to speed up container assignment in large clusters, it might be necessary to rewrite that, also losing certain features in the process - but what those might be is not elaborated. But they didn't take this path and instead opted for multiple Hadoop clusters. Since they didn't share profiling results or heat maps, we can only guess what part of Capacity Scheduler is deemed slow or a possible bottleneck. Peter On Thu, Aug 12, 2021 at 9:48 AM Akira Ajisaka wrote: > Hi folks, > > I read Uber's article > https://eng.uber.com/cost-efficient-big-data-platform/. This article > is very interesting for me, and now I have some questions. > > > For example, we identified that the Capacity Scheduler has some complex > logic that slows down task assignment. However, code changes to get rid of > those won’t be able to merge into Apache Hadoop trunk, since those > sophisticated features may be needed by other companies. > > - What are those sophisticated features in the Capacity Scheduler? > - In the future, can we turn off the features by some flags in Apache > Hadoop? > - Is there any other examples like this? > > Thanks and regards, > Akira > > - > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >
Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree
+1 (non-binding) On Fri, Sep 20, 2019 at 8:01 AM Rakesh Radhakrishnan wrote: > +1 > > Rakesh > > On Fri, Sep 20, 2019 at 12:29 AM Aaron Fabbri wrote: > > > +1 (binding) > > > > Thanks to the Ozone folks for their efforts at maintaining good > separation > > with HDFS and common. I took a lot of heat for the unpopular opinion that > > they should be separate, so I am glad the process has worked out well > for > > both codebases. It looks like my concerns were addressed and I appreciate > > it. It is cool to see the evolution here. > > > > Aaron > > > > > > On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran > > > > > wrote: > > > > > in that case, > > > > > > +1 from me (binding) > > > > > > On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton wrote: > > > > > > > > one thing to consider here as you are giving up your ability to > make > > > > > changes in hadoop-* modules, including hadoop-common, and their > > > > > dependencies, in sync with your own code. That goes for filesystem > > > > contract > > > > > tests. > > > > > > > > > > are you happy with that? > > > > > > > > > > > > Yes. I think we can live with it. > > > > > > > > Fortunatelly the Hadoop parts which are used by Ozone (security + > rpc) > > > > are stable enough, we didn't need bigger changes until now (small > > > > patches are already included in 3.1/3.2). > > > > > > > > I think it's better to use released Hadoop bits in Ozone anyway, and > > > > worst (best?) case we can try to do more frequent patch releases from > > > > Hadoop (if required). > > > > > > > > > > > > m. > > > > > > > > > > > > > > > > > >
Re: [VOTE] Move Submarine source code, documentation, etc. to a separate Apache Git repo
+1 (non-binding) On Sat, Aug 24, 2019 at 4:06 AM Wangda Tan wrote: > Hi devs, > > This is a voting thread to move Submarine source code, documentation from > Hadoop repo to a separate Apache Git repo. Which is based on discussions of > > https://lists.apache.org/thread.html/e49d60b2e0e021206e22bb2d430f4310019a8b29ee5020f3eea3bd95@%3Cyarn-dev.hadoop.apache.org%3E > > Contributors who have permissions to push to Hadoop Git repository will > have permissions to push to the new Submarine repository. > > This voting thread will run for 7 days and will end at Aug 30th. > > Please let me know if you have any questions. > > Thanks, > Wangda Tan >
[jira] [Created] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener
Peter Bacsko created HADOOP-16238: - Summary: Add the possbility to set SO_REUSEADDR in IPC Server Listener Key: HADOOP-16238 URL: https://issues.apache.org/jira/browse/HADOOP-16238 Project: Hadoop Common Issue Type: Improvement Reporter: Peter Bacsko Assignee: Peter Bacsko Currently we can't enable SO_REUSEADDR in the IPC Server. In some circumstances, this would be desirable, see explanation here: [https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-] Rarely it also causes problems in a test case {{TestMiniMRClientCluster.testRestart}}: {noformat} 2019-04-04 11:21:31,896 INFO [main] service.AbstractService (AbstractService.java:noteFailure(273)) - Service org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [test-host:35491] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: Problem binding to [test-host:35491] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138) at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65) at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355) at org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127) at org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121) at org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312) at org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194) at org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73) at org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat} At least for testing, having this socket option enabled is benefical. We could enable this with a new property like {{ipc.server.reuseaddr}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
Re: [VOTE] Release Apache Hadoop 3.2.0 - RC0
+1 (non-binding) - Built from source at tag 3.2.0-rc0 (Ubuntu 18.10, JDK1.8.0_191) - Verified checksums of hadoop-3.2.0.tar.gz - Installed on a 3-node physical cluster - Ran teragen/terasort/teravalidate - Ran distributed shell a couple of times - Checked UIs (RM, NM, DN, JHS) Peter On Wed, Nov 28, 2018 at 5:17 PM Jason Lowe wrote: > Thanks for driving this release, Sunil! > > +1 (binding) > > - Verified signatures and digests > - Successfully performed a native build > - Deployed a single-node cluster > - Ran some sample jobs > > Jason > > On Fri, Nov 23, 2018 at 6:07 AM Sunil G wrote: > > > Hi folks, > > > > > > > > Thanks to all contributors who helped in this release [1]. I have created > > > > first release candidate (RC0) for Apache Hadoop 3.2.0. > > > > > > Artifacts for this RC are available here: > > > > http://home.apache.org/~sunilg/hadoop-3.2.0-RC0/ > > > > > > > > RC tag in git is release-3.2.0-RC0. > > > > > > > > The maven artifacts are available via repository.apache.org at > > > > https://repository.apache.org/content/repositories/orgapachehadoop-1174/ > > > > > > This vote will run 7 days (5 weekdays), ending on Nov 30 at 11:59 pm PST. > > > > > > > > 3.2.0 contains 1079 [2] fixed JIRA issues since 3.1.0. Below feature > > additions > > > > are the highlights of this release. > > > > 1. Node Attributes Support in YARN > > > > 2. Hadoop Submarine project for running Deep Learning workloads on YARN > > > > 3. Support service upgrade via YARN Service API and CLI > > > > 4. HDFS Storage Policy Satisfier > > > > 5. Support Windows Azure Storage - Blob file system in Hadoop > > > > 6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a > > > > 7. Improvements in Router-based HDFS federation > > > > > > > > Thanks to Wangda, Vinod, Marton for helping me in preparing the release. > > > > I have done few testing with my pseudo cluster. My +1 to start. > > > > > > > > Regards, > > > > Sunil > > > > > > > > [1] > > > > > > > https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E > > > > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0) > > AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved > > ORDER BY fixVersion ASC > > >
[jira] [Created] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env
Peter Bacsko created HADOOP-14982: - Summary: Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env Key: HADOOP-14982 URL: https://issues.apache.org/jira/browse/HADOOP-14982 Project: Hadoop Common Issue Type: Bug Components: common Reporter: Peter Bacsko Assignee: Peter Bacsko If HA is configured for the Resource Manager in a secure environment, using the mapred client goes into a loop if the user is not authenticated with Kerberos. {noformat} [root@pb6sec-1 ~]# mapred job -list 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 1 failover attempts. Trying to failover after sleeping for 160ms. 17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 failover attempts. Trying to failover after sleeping for 582ms. 17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 3 failover attempts. Trying to failover after sleeping for 977ms. 17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:45 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 failover attempts. Trying to failover after sleeping for 1667ms. 17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm36 17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] 17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination host is: "com.host2.redacted:8032; , while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 5 failover attempts. Trying to failover after sleeping for 2776ms. 17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm25 17/10/25 06:37:49 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 6 failover attempts. Try