[
https://issues.apache.org/jira/browse/HDDS-15183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ivan Andika updated HDDS-15183:
-------------------------------
Description:
This is simply an idea.
Container replications currently still use buffered writes, meaning that it
still occupies the page cache. This can cause page cache pollution which might
trigger memory reclaim down the road, affecting performance. We can maybe
achieve this by using DIRECT_IO, but currently we need to use
TarContainerPacker which needs to be tar and untarred (since one container
contains multiple blocks, unlike the proposal HDDS-12659 to put a single
container in a single file). Therefore, we cannot use DIRECT_IO cleanly.
We can consider using POSIX_FADV_DONTNEED
(https://man7.org/linux/man-pages/man2/posix_fadvise.2.html) after container
replication so that the OS can free this page caches immediately.
Seems this is used by HDFS BlockSender and BlockReceiver so there is a
precedent.
Of course, we need to check whether the initial claim is correct and whether
this is actually beneficial to prevent premature optimization. See
https://github.com/restic/restic/issues/4465 for the possible issues.
{code:java}
import java.io.FileDescriptor;
import org.apache.hadoop.io.nativeio.NativeIO;
import org.apache.hadoop.io.nativeio.NativeIOException;
private static void dontNeed(String identifier, FileDescriptor fd,
long offset, long length) {
try {
NativeIO.POSIX.getCacheManipulator().posixFadviseIfPossible(
identifier,
fd,
offset,
length,
NativeIO.POSIX.POSIX_FADV_DONTNEED);
} catch (NativeIOException e) {
LOG.debug("Failed to advise DONTNEED for {}", identifier, e);
}
}
{code}
{code:java}
try (FileInputStream input = new FileInputStream(file)) {
IOUtils.copy(input, output, bufferSize);
dontNeed(file.getAbsolutePath(), input.getFD(), 0, 0);
}
{code}
was:
This is simply an idea.
Container replications currently still use buffered writes, meaning that it
still occupies the page cache. This can cause page cache pollution. We can
maybe achieve this by using DIRECT_IO, but currently we need to use
TarContainerPacker which needs to be tar and untarred (since one container
contains multiple blocks, unlike the proposal HDDS-12659 to put a single
container in a single file). Therefore, we cannot use DIRECT_IO cleanly.
We can consider using POSIX_FADV_DONTNEED
(https://man7.org/linux/man-pages/man2/posix_fadvise.2.html) after container
replication so that the OS can free this page caches immediately.
Seems this is used by HDFS BlockSender and BlockReceiver so there is a
precedent.
Of course, we need to check whether the initial claim is correct and whether
this is actually beneficial to prevent premature optimization. See
https://github.com/restic/restic/issues/4465 for the possible issues.
{code:java}
import java.io.FileDescriptor;
import org.apache.hadoop.io.nativeio.NativeIO;
import org.apache.hadoop.io.nativeio.NativeIOException;
private static void dontNeed(String identifier, FileDescriptor fd,
long offset, long length) {
try {
NativeIO.POSIX.getCacheManipulator().posixFadviseIfPossible(
identifier,
fd,
offset,
length,
NativeIO.POSIX.POSIX_FADV_DONTNEED);
} catch (NativeIOException e) {
LOG.debug("Failed to advise DONTNEED for {}", identifier, e);
}
}
{code}
{code:java}
try (FileInputStream input = new FileInputStream(file)) {
IOUtils.copy(input, output, bufferSize);
dontNeed(file.getAbsolutePath(), input.getFD(), 0, 0);
}
{code}
> Use POSIX fadvise for container replication
> -------------------------------------------
>
> Key: HDDS-15183
> URL: https://issues.apache.org/jira/browse/HDDS-15183
> Project: Apache Ozone
> Issue Type: Improvement
> Reporter: Ivan Andika
> Priority: Major
>
> This is simply an idea.
> Container replications currently still use buffered writes, meaning that it
> still occupies the page cache. This can cause page cache pollution which
> might trigger memory reclaim down the road, affecting performance. We can
> maybe achieve this by using DIRECT_IO, but currently we need to use
> TarContainerPacker which needs to be tar and untarred (since one container
> contains multiple blocks, unlike the proposal HDDS-12659 to put a single
> container in a single file). Therefore, we cannot use DIRECT_IO cleanly.
> We can consider using POSIX_FADV_DONTNEED
> (https://man7.org/linux/man-pages/man2/posix_fadvise.2.html) after container
> replication so that the OS can free this page caches immediately.
> Seems this is used by HDFS BlockSender and BlockReceiver so there is a
> precedent.
> Of course, we need to check whether the initial claim is correct and whether
> this is actually beneficial to prevent premature optimization. See
> https://github.com/restic/restic/issues/4465 for the possible issues.
> {code:java}
> import java.io.FileDescriptor;
> import org.apache.hadoop.io.nativeio.NativeIO;
> import org.apache.hadoop.io.nativeio.NativeIOException;
> private static void dontNeed(String identifier, FileDescriptor fd,
> long offset, long length) {
> try {
> NativeIO.POSIX.getCacheManipulator().posixFadviseIfPossible(
> identifier,
> fd,
> offset,
> length,
> NativeIO.POSIX.POSIX_FADV_DONTNEED);
> } catch (NativeIOException e) {
> LOG.debug("Failed to advise DONTNEED for {}", identifier, e);
> }
> }
> {code}
> {code:java}
> try (FileInputStream input = new FileInputStream(file)) {
> IOUtils.copy(input, output, bufferSize);
> dontNeed(file.getAbsolutePath(), input.getFD(), 0, 0);
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]