[
https://issues.apache.org/jira/browse/HDDS-9228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kirill Sizov updated HDDS-9228:
-------------------------------
Description:
h3. TL;DR:
*S3G writes all its responses byte-after-byte.*
h3. Details
This issue was discovered during a performance test run
h4. Cluster configuration
3 master nodes, 5 datanodes.
Each machine runs 96core CPU.
S3G instances are installed on master nodes (3 gateways).
h4. Test preparation
Before the test we uploaded 300000 files to Ozone, 20MB each.
h4. Test configuration
We ran two tests
1. pure writes, no concurrent reads
2. pure reads, no concurrent writes
h4. Load generator
3 load generator nodes, each runs 50 threads.
h4. Ozone configuration
The buckets were created with Erasure Coding RS-3-2-1024k
h3. Results
We found that writes are 3 times faster than reads, moreover reads caused ~70%
CPU usage.
Thread dumps and JFR showed the following stacktraces of HTTP threads:
Stacktrace:
{noformat}
"qtp2079179914-1055393" Id=1055393 RUNNABLE
at
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:291)
at
org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:215)
at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
at
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1310)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
at
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:382)
{noformat}
JFR:
{noformat}
Stack Trace Count Percentage
void org.eclipse.jetty.server.HttpOutput.write(int) 431146 39 %
void
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(int)
431145 39 %
void org.glassfish.jersey.message.internal.CommittingOutputStream.write(int)
431145 39 %
void java.io.FilterOutputStream.write(int) 431145 39 %
void java.io.FilterOutputStream.write(byte[], int, int) 431145 39 %
void
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(byte[],
int, int) 431145 39 %
long org.apache.commons.io.IOUtils.copyLarge(InputStream, OutputStream, byte[])
431145 39 %
{noformat}
We can clearly see the transition {{FilterOutputStream.write(byte[], int, int)
-> FilterOutputStream.write(int)}}, meaning that any incoming array is written
as single bytes, not as an array as a whole.
The place in the code that creates {{FilterOutputStream}} is
{{org.apache.hadoop.ozone.s3.TracingFilter}}:
{code}
OutputStream out = responseContext.getEntityStream();
if (out != null) {
responseContext.setEntityStream(new FilterOutputStream(out) {
@Override
public void close() throws IOException {
super.close();
finishAndClose(scope, span);
}
});
}
{code}
Removing this filter or fixing {{FilterOutputStream.write(byte[], int, int)}}
method resolves performance issues and we see a 5x better throughput and CPU
around 12%.
was:
h3. TL;DR:
*S3G writes all its responses byte-after-byte.*
h3. Details
This issue was discovered during a performance test run
h4. Cluster configuration
3 master nodes, 5 datanodes.
Each machine runs 96core CPU.
S3G instances are installed on master nodes (3 gateways).
h4. Test preparation
Before the test we uploaded 300000 files to Ozone, 20MB each.
h4. Test configuration
We ran two tests
1. pure writes, no concurrent reads
2. pure reads, no concurrent writes
h4. Load generator
3 load generator nodes, each runs 50 threads.
h4. Ozone configuration
The buckets were created with Erasure Coding RS-3-2-1024k
h3. Results
We found that writes are 3 times faster than reads, moreover reads caused ~70%
CPU usage.
Thread dumps and JFR showed the following stacktraces of HTTP threads:
Stacktrace:
{noformat}
"qtp2079179914-1055393" Id=1055393 RUNNABLE
at
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:291)
at
org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:215)
at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
at
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1310)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
at
org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:382)
{noformat}
JFR:
{noformat}
Stack Trace Count Percentage
void org.eclipse.jetty.server.HttpOutput.write(int) 431146 39 %
void
org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(int)
431145 39 %
void org.glassfish.jersey.message.internal.CommittingOutputStream.write(int)
431145 39 %
void java.io.FilterOutputStream.write(int) 431145 39 %
void java.io.FilterOutputStream.write(byte[], int, int) 431145 39 %
void
org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(byte[],
int, int) 431145 39 %
long org.apache.commons.io.IOUtils.copyLarge(InputStream, OutputStream, byte[])
431145 39 %
{noformat}
We can clearly see the transition {{FilterOutputStream.write(byte[], int, int)
->FilterOutputStream.write(int)}}, meaning that any incoming array is written
as single bytes, not as an array as a whole.
The place in the code that creates {{FilterOutputStream}} is
{{org.apache.hadoop.ozone.s3.TracingFilter}}:
{code}
OutputStream out = responseContext.getEntityStream();
if (out != null) {
responseContext.setEntityStream(new FilterOutputStream(out) {
@Override
public void close() throws IOException {
super.close();
finishAndClose(scope, span);
}
});
}
{code}
Removing this filter or fixing {{FilterOutputStream.write(byte[], int, int)}}
method resolves performance issues and we see a 5x better throughput.
> Poor S3G read performance
> -------------------------
>
> Key: HDDS-9228
> URL: https://issues.apache.org/jira/browse/HDDS-9228
> Project: Apache Ozone
> Issue Type: Bug
> Components: S3
> Affects Versions: 1.4.0
> Reporter: Kirill Sizov
> Priority: Critical
>
> h3. TL;DR:
> *S3G writes all its responses byte-after-byte.*
> h3. Details
> This issue was discovered during a performance test run
> h4. Cluster configuration
> 3 master nodes, 5 datanodes.
> Each machine runs 96core CPU.
> S3G instances are installed on master nodes (3 gateways).
> h4. Test preparation
> Before the test we uploaded 300000 files to Ozone, 20MB each.
> h4. Test configuration
> We ran two tests
> 1. pure writes, no concurrent reads
> 2. pure reads, no concurrent writes
> h4. Load generator
> 3 load generator nodes, each runs 50 threads.
> h4. Ozone configuration
> The buckets were created with Erasure Coding RS-3-2-1024k
> h3. Results
> We found that writes are 3 times faster than reads, moreover reads caused
> ~70% CPU usage.
> Thread dumps and JFR showed the following stacktraces of HTTP threads:
> Stacktrace:
> {noformat}
> "qtp2079179914-1055393" Id=1055393 RUNNABLE
> at
> org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(ResponseWriter.java:291)
> at
> org.glassfish.jersey.message.internal.CommittingOutputStream.write(CommittingOutputStream.java:215)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:77)
> at java.io.FilterOutputStream.write(FilterOutputStream.java:125)
> at
> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(WriterInterceptorExecutor.java:276)
> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1310)
> at org.apache.commons.io.IOUtils.copy(IOUtils.java:978)
> at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1282)
> at
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.lambda$get$0(ObjectEndpoint.java:382)
> {noformat}
> JFR:
> {noformat}
> Stack Trace Count Percentage
> void org.eclipse.jetty.server.HttpOutput.write(int) 431146 39 %
> void
> org.glassfish.jersey.servlet.internal.ResponseWriter$NonCloseableOutputStreamWrapper.write(int)
> 431145 39 %
> void org.glassfish.jersey.message.internal.CommittingOutputStream.write(int)
> 431145 39 %
> void java.io.FilterOutputStream.write(int) 431145 39 %
> void java.io.FilterOutputStream.write(byte[], int, int) 431145 39 %
> void
> org.glassfish.jersey.message.internal.WriterInterceptorExecutor$UnCloseableOutputStream.write(byte[],
> int, int) 431145 39 %
> long org.apache.commons.io.IOUtils.copyLarge(InputStream, OutputStream,
> byte[]) 431145 39 %
> {noformat}
> We can clearly see the transition {{FilterOutputStream.write(byte[], int,
> int) -> FilterOutputStream.write(int)}}, meaning that any incoming array is
> written as single bytes, not as an array as a whole.
> The place in the code that creates {{FilterOutputStream}} is
> {{org.apache.hadoop.ozone.s3.TracingFilter}}:
> {code}
> OutputStream out = responseContext.getEntityStream();
> if (out != null) {
> responseContext.setEntityStream(new FilterOutputStream(out) {
> @Override
> public void close() throws IOException {
> super.close();
> finishAndClose(scope, span);
> }
> });
> }
> {code}
> Removing this filter or fixing {{FilterOutputStream.write(byte[], int, int)}}
> method resolves performance issues and we see a 5x better throughput and CPU
> around 12%.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]