[jira] [Created] (HADOOP-17905) Modify Text.ensureCapacity() to efficiently max out the backing array size

2021-09-09 Thread Peter Bacsko (Jira)
Peter Bacsko created HADOOP-17905:
-

 Summary: Modify Text.ensureCapacity() to efficiently max out the 
backing array size
 Key: HADOOP-17905
 URL: https://issues.apache.org/jira/browse/HADOOP-17905
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Peter Bacsko
Assignee: Peter Bacsko


This is a continuation of HADOOP-17901.

Right now we use a factor of 1.5x to increase the byte array if it's full. 
However, if the size reaches a certain point, the increment is only (current 
size + length). This can cause performance issues if the textual data which we 
intend to store is beyond this point.

Instead, let's max out the array to the maximum. Based on different sources, 
this is usually determined to be Integer.MAX_VALUE - 8 (see ArrayList, 
AbstractCollection, HashTable, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17901) Performance degradation in Text.append() after HADOOP-16951

2021-09-08 Thread Peter Bacsko (Jira)
Peter Bacsko created HADOOP-17901:
-

 Summary: Performance degradation in Text.append() after 
HADOOP-16951
 Key: HADOOP-17901
 URL: https://issues.apache.org/jira/browse/HADOOP-17901
 Project: Hadoop Common
  Issue Type: Bug
  Components: common
Reporter: Peter Bacsko
Assignee: Peter Bacsko


We discovered a serious performance degradation in {{Text.append()}}.

The problem is that the logic which intends to increase the size of the backing 
array does not work as intended.
It's very difficult to spot, so I added extra logs to see what happens.

Let's add 4096 bytes of textual data in a loop:
{noformat}
  public static void main(String[] args) {
Text text = new Text();
String toAppend = RandomStringUtils.randomAscii(4096);

for(int i = 0; i < 100; i++) {
  text.append(toAppend.getBytes(), 0, 4096);
}
  }
{noformat}

With some debug printouts, we can observe:
{noformat}
2021-09-08 13:35:29,528 INFO  [main] io.Text (Text.java:append(251)) - length: 
24576,  len: 4096, utf8ArraySize: 4096, bytes.length: 30720
2021-09-08 13:35:29,528 INFO  [main] io.Text (Text.java:append(253)) - length + 
(length >> 1): 36864
2021-09-08 13:35:29,528 INFO  [main] io.Text (Text.java:append(254)) - length + 
len: 28672
2021-09-08 13:35:29,528 INFO  [main] io.Text (Text.java:ensureCapacity(287)) - 
>>> enhancing capacity from 30720 to 36864
2021-09-08 13:35:29,528 INFO  [main] io.Text (Text.java:append(251)) - length: 
28672,  len: 4096, utf8ArraySize: 4096, bytes.length: 36864
2021-09-08 13:35:29,528 INFO  [main] io.Text (Text.java:append(253)) - length + 
(length >> 1): 43008
2021-09-08 13:35:29,529 INFO  [main] io.Text (Text.java:append(254)) - length + 
len: 32768
2021-09-08 13:35:29,529 INFO  [main] io.Text (Text.java:ensureCapacity(287)) - 
>>> enhancing capacity from 36864 to 43008
2021-09-08 13:35:29,529 INFO  [main] io.Text (Text.java:append(251)) - length: 
32768,  len: 4096, utf8ArraySize: 4096, bytes.length: 43008
2021-09-08 13:35:29,529 INFO  [main] io.Text (Text.java:append(253)) - length + 
(length >> 1): 49152
2021-09-08 13:35:29,529 INFO  [main] io.Text (Text.java:append(254)) - length + 
len: 36864
2021-09-08 13:35:29,529 INFO  [main] io.Text (Text.java:ensureCapacity(287)) - 
>>> enhancing capacity from 43008 to 49152
...
{noformat}

After a certain number of {{append()}} calls, subsequent capacity increments 
are small.

It's because the difference between two {{length + (length >> 1)}} values is 
always 6144 bytes. Because the size of the backing array is trailing behind the 
calculated value, the increment will also be 6144 bytes. This means that new 
arrays are constantly created.

Suggested solution: don't calculate the capacity in advance based on length. 
Instead, pass the required minimum to {{ensureCapacity()}}. Then the increment 
should depend on the actual size of the byte array if the desired capacity is 
larger.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Article: Cost-Efficient Open Source Big Data Platform at Uber

2021-08-18 Thread Peter Bacsko
Hi Akira,

from the article, it's not clear to me what they mean by saying
"sophisticated features". It is true that the container assignment code
path is very complicated and understanding it takes quite a bit of time and
effort. So in order to speed up container assignment in large clusters, it
might be necessary to rewrite that, also losing certain features in the
process - but what those might be is not elaborated. But they didn't take
this path and instead opted for multiple Hadoop clusters.

Since they didn't share profiling results or heat maps, we can only guess
what part of Capacity Scheduler is deemed slow or a possible bottleneck.

Peter

On Thu, Aug 12, 2021 at 9:48 AM Akira Ajisaka  wrote:

> Hi folks,
>
> I read Uber's article
> https://eng.uber.com/cost-efficient-big-data-platform/. This article
> is very interesting for me, and now I have some questions.
>
> > For example, we identified that the Capacity Scheduler has some complex
> logic that slows down task assignment. However, code changes to get rid of
> those won’t be able to merge into Apache Hadoop trunk, since those
> sophisticated features may be needed by other companies.
>
> - What are those sophisticated features in the Capacity Scheduler?
> - In the future, can we turn off the features by some flags in Apache
> Hadoop?
> - Is there any other examples like this?
>
> Thanks and regards,
> Akira
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


Re: [DISCUSS] Separate Hadoop Core trunk and Hadoop Ozone trunk source tree

2019-09-20 Thread Peter Bacsko
+1 (non-binding)

On Fri, Sep 20, 2019 at 8:01 AM Rakesh Radhakrishnan 
wrote:

> +1
>
> Rakesh
>
> On Fri, Sep 20, 2019 at 12:29 AM Aaron Fabbri  wrote:
>
> > +1 (binding)
> >
> > Thanks to the Ozone folks for their efforts at maintaining good
> separation
> > with HDFS and common. I took a lot of heat for the unpopular opinion that
> > they should  be separate, so I am glad the process has worked out well
> for
> > both codebases. It looks like my concerns were addressed and I appreciate
> > it.  It is cool to see the evolution here.
> >
> > Aaron
> >
> >
> > On Thu, Sep 19, 2019 at 3:37 AM Steve Loughran
>  > >
> > wrote:
> >
> > > in that case,
> > >
> > > +1 from me (binding)
> > >
> > > On Wed, Sep 18, 2019 at 4:33 PM Elek, Marton  wrote:
> > >
> > > >  > one thing to consider here as you are giving up your ability to
> make
> > > >  > changes in hadoop-* modules, including hadoop-common, and their
> > > >  > dependencies, in sync with your own code. That goes for filesystem
> > > > contract
> > > >  > tests.
> > > >  >
> > > >  > are you happy with that?
> > > >
> > > >
> > > > Yes. I think we can live with it.
> > > >
> > > > Fortunatelly the Hadoop parts which are used by Ozone (security +
> rpc)
> > > > are stable enough, we didn't need bigger changes until now (small
> > > > patches are already included in 3.1/3.2).
> > > >
> > > > I think it's better to use released Hadoop bits in Ozone anyway, and
> > > > worst (best?) case we can try to do more frequent patch releases from
> > > > Hadoop (if required).
> > > >
> > > >
> > > > m.
> > > >
> > > >
> > > >
> > >
> >
>


Re: [VOTE] Move Submarine source code, documentation, etc. to a separate Apache Git repo

2019-08-26 Thread Peter Bacsko
+1 (non-binding)

On Sat, Aug 24, 2019 at 4:06 AM Wangda Tan  wrote:

> Hi devs,
>
> This is a voting thread to move Submarine source code, documentation from
> Hadoop repo to a separate Apache Git repo. Which is based on discussions of
>
> https://lists.apache.org/thread.html/e49d60b2e0e021206e22bb2d430f4310019a8b29ee5020f3eea3bd95@%3Cyarn-dev.hadoop.apache.org%3E
>
> Contributors who have permissions to push to Hadoop Git repository will
> have permissions to push to the new Submarine repository.
>
> This voting thread will run for 7 days and will end at Aug 30th.
>
> Please let me know if you have any questions.
>
> Thanks,
> Wangda Tan
>


[jira] [Created] (HADOOP-16238) Add the possbility to set SO_REUSEADDR in IPC Server Listener

2019-04-04 Thread Peter Bacsko (JIRA)
Peter Bacsko created HADOOP-16238:
-

 Summary: Add the possbility to set SO_REUSEADDR in IPC Server 
Listener
 Key: HADOOP-16238
 URL: https://issues.apache.org/jira/browse/HADOOP-16238
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Peter Bacsko
Assignee: Peter Bacsko


Currently we can't enable SO_REUSEADDR in the IPC Server. In some 
circumstances, this would be desirable, see explanation here:

[https://developer.ibm.com/tutorials/l-sockpit/#pitfall-3-address-in-use-error-eaddrinuse-]

Rarely it also causes problems in a test case 
{{TestMiniMRClientCluster.testRestart}}:
{noformat}
2019-04-04 11:21:31,896 INFO [main] service.AbstractService 
(AbstractService.java:noteFailure(273)) - Service 
org.apache.hadoop.yarn.server.resourcemanager.AdminService failed in state 
STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: 
java.net.BindException: Problem binding to [test-host:35491] 
java.net.BindException: Address already in use; For more details see: 
http://wiki.apache.org/hadoop/BindException
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.net.BindException: 
Problem binding to [test-host:35491] java.net.BindException: Address already in 
use; For more details see: http://wiki.apache.org/hadoop/BindException
 at 
org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl.getServer(RpcServerFactoryPBImpl.java:138)
 at 
org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getServer(HadoopYarnProtoRPC.java:65)
 at org.apache.hadoop.yarn.ipc.YarnRPC.getServer(YarnRPC.java:54)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.startServer(AdminService.java:178)
 at 
org.apache.hadoop.yarn.server.resourcemanager.AdminService.serviceStart(AdminService.java:165)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
 at 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1244)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.yarn.server.MiniYARNCluster.startResourceManager(MiniYARNCluster.java:355)
 at 
org.apache.hadoop.yarn.server.MiniYARNCluster.access$300(MiniYARNCluster.java:127)
 at 
org.apache.hadoop.yarn.server.MiniYARNCluster$ResourceManagerWrapper.serviceStart(MiniYARNCluster.java:493)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
 at 
org.apache.hadoop.yarn.server.MiniYARNCluster.serviceStart(MiniYARNCluster.java:312)
 at 
org.apache.hadoop.mapreduce.v2.MiniMRYarnCluster.serviceStart(MiniMRYarnCluster.java:210)
 at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
 at 
org.apache.hadoop.mapred.MiniMRYarnClusterAdapter.restart(MiniMRYarnClusterAdapter.java:73)
 at 
org.apache.hadoop.mapred.TestMiniMRClientCluster.testRestart(TestMiniMRClientCluster.java:114)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62){noformat}
 

At least for testing, having this socket option enabled is benefical. We could 
enable this with a new property like {{ipc.server.reuseaddr}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 3.2.0 - RC0

2018-11-28 Thread Peter Bacsko
+1 (non-binding)

- Built from source at tag 3.2.0-rc0 (Ubuntu 18.10, JDK1.8.0_191)
- Verified checksums of hadoop-3.2.0.tar.gz
- Installed on a 3-node physical cluster
- Ran teragen/terasort/teravalidate
- Ran distributed shell a couple of times
- Checked UIs (RM, NM, DN, JHS)

Peter

On Wed, Nov 28, 2018 at 5:17 PM Jason Lowe  wrote:

> Thanks for driving this release, Sunil!
>
> +1 (binding)
>
> - Verified signatures and digests
> - Successfully performed a native build
> - Deployed a single-node cluster
> - Ran some sample jobs
>
> Jason
>
> On Fri, Nov 23, 2018 at 6:07 AM Sunil G  wrote:
>
> > Hi folks,
> >
> >
> >
> > Thanks to all contributors who helped in this release [1]. I have created
> >
> > first release candidate (RC0) for Apache Hadoop 3.2.0.
> >
> >
> > Artifacts for this RC are available here:
> >
> > http://home.apache.org/~sunilg/hadoop-3.2.0-RC0/
> >
> >
> >
> > RC tag in git is release-3.2.0-RC0.
> >
> >
> >
> > The maven artifacts are available via repository.apache.org at
> >
> > https://repository.apache.org/content/repositories/orgapachehadoop-1174/
> >
> >
> > This vote will run 7 days (5 weekdays), ending on Nov 30 at 11:59 pm PST.
> >
> >
> >
> > 3.2.0 contains 1079 [2] fixed JIRA issues since 3.1.0. Below feature
> > additions
> >
> > are the highlights of this release.
> >
> > 1. Node Attributes Support in YARN
> >
> > 2. Hadoop Submarine project for running Deep Learning workloads on YARN
> >
> > 3. Support service upgrade via YARN Service API and CLI
> >
> > 4. HDFS Storage Policy Satisfier
> >
> > 5. Support Windows Azure Storage - Blob file system in Hadoop
> >
> > 6. Phase 3 improvements for S3Guard and Phase 5 improvements S3a
> >
> > 7. Improvements in Router-based HDFS federation
> >
> >
> >
> > Thanks to Wangda, Vinod, Marton for helping me in preparing the release.
> >
> > I have done few testing with my pseudo cluster. My +1 to start.
> >
> >
> >
> > Regards,
> >
> > Sunil
> >
> >
> >
> > [1]
> >
> >
> >
> https://lists.apache.org/thread.html/68c1745dcb65602aecce6f7e6b7f0af3d974b1bf0048e7823e58b06f@%3Cyarn-dev.hadoop.apache.org%3E
> >
> > [2] project in (YARN, HADOOP, MAPREDUCE, HDFS) AND fixVersion in (3.2.0)
> > AND fixVersion not in (3.1.0, 3.0.0, 3.0.0-beta1) AND status = Resolved
> > ORDER BY fixVersion ASC
> >
>


[jira] [Created] (HADOOP-14982) Clients using FailoverOnNetworkExceptionRetry can go into a loop if they're used without authenticating with kerberos in HA env

2017-10-25 Thread Peter Bacsko (JIRA)
Peter Bacsko created HADOOP-14982:
-

 Summary: Clients using FailoverOnNetworkExceptionRetry can go into 
a loop if they're used without authenticating with kerberos in HA env
 Key: HADOOP-14982
 URL: https://issues.apache.org/jira/browse/HADOOP-14982
 Project: Hadoop Common
  Issue Type: Bug
  Components: common
Reporter: Peter Bacsko
Assignee: Peter Bacsko


If HA is configured for the Resource Manager in a secure environment, using the 
mapred client goes into a loop if the user is not authenticated with Kerberos.

{noformat}
[root@pb6sec-1 ~]# mapred job -list
17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm36
17/10/25 06:37:43 WARN ipc.Client: Exception encountered while connecting to 
the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.io.IOException: 
Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination 
host is: "com.host2.redacted:8032; , while invoking 
ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 1 
failover attempts. Trying to failover after sleeping for 160ms.
17/10/25 06:37:43 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm25
17/10/25 06:37:43 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 2 
failover attempts. Trying to failover after sleeping for 582ms.
17/10/25 06:37:44 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm36
17/10/25 06:37:44 WARN ipc.Client: Exception encountered while connecting to 
the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
17/10/25 06:37:44 INFO retry.RetryInvocationHandler: java.io.IOException: 
Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination 
host is: "com.host2.redacted:8032; , while invoking 
ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 3 
failover attempts. Trying to failover after sleeping for 977ms.
17/10/25 06:37:45 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm25
17/10/25 06:37:45 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 4 
failover attempts. Trying to failover after sleeping for 1667ms.
17/10/25 06:37:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm36
17/10/25 06:37:46 WARN ipc.Client: Exception encountered while connecting to 
the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
17/10/25 06:37:46 INFO retry.RetryInvocationHandler: java.io.IOException: 
Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: "host_redacted/IP_redacted"; destination 
host is: "com.host2.redacted:8032; , while invoking 
ApplicationClientProtocolPBClientImpl.getApplications over rm36 after 5 
failover attempts. Trying to failover after sleeping for 2776ms.
17/10/25 06:37:49 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
to rm25
17/10/25 06:37:49 INFO retry.RetryInvocationHandler: java.net.ConnectException: 
Call From host_redacted/IP_redacted to com.host.redacted:8032 failed on 
connection exception: java.net.ConnectException: Connection refused; For more 
details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking 
ApplicationClientProtocolPBClientImpl.getApplications over rm25 after 6 
failover attempts. Try