ArafatKhan2198 commented on a change in pull request #3242:
URL: https://github.com/apache/ozone/pull/3242#discussion_r835535700
##########
File path:
hadoop-ozone/tools/src/main/java/org/apache/hadoop/ozone/freon/ContentGenerator.java
##########
@@ -48,38 +49,100 @@
private final byte[] buffer;
- ContentGenerator(long keySize, int bufferSize) {
- this(keySize, bufferSize, bufferSize);
- }
+ /**
+ * Issue Hsync after every write ( Cannot be used with Hflush ).
+ */
+ private final boolean hSync;
+
+ /**
+ * Issue Hflush after every write ( Cannot be used with Hsync ).
+ */
+ private final boolean hFlush;
- ContentGenerator(long keySize, int bufferSize, int copyBufferSize) {
- this.keySize = keySize;
- this.bufferSize = bufferSize;
- this.copyBufferSize = copyBufferSize;
+ ContentGenerator(Builder objectBuild) {
+ this.keySize = objectBuild.keySize;
+ this.bufferSize = objectBuild.bufferSize;
+ this.copyBufferSize = objectBuild.copyBufferSize;
+ this.hSync = objectBuild.hSync;
+ this.hFlush = objectBuild.hFlush;
buffer = RandomStringUtils.randomAscii(bufferSize)
.getBytes(StandardCharsets.UTF_8);
}
- /**
- * Write the required bytes to the output stream.
- */
+
public void write(OutputStream outputStream) throws IOException {
- for (long nrRemaining = keySize;
- nrRemaining > 0; nrRemaining -= bufferSize) {
+ for (long nrRemaining = keySize; nrRemaining > 0;
+ nrRemaining -= bufferSize) {
int curSize = (int) Math.min(bufferSize, nrRemaining);
if (copyBufferSize == 1) {
for (int i = 0; i < curSize; i++) {
outputStream.write(buffer[i]);
+ flushOrSync(outputStream);
Review comment:
When a file was closed or flushed, the file contents were written from
the DataNode into the operating system using the normal close() and write()
system calls.
The data may not be immediately persisted to the underlying physical
storage, but may still reside in-memory in the operating system's file cache.
This creates a window of vulnerability where if multiple DataNode machines fail
simultaneously (e.g. loss of power to a rack), then previously written data may
be lost. To combat this problem, HDFS (as of Hadoop 2.0) has introduced new
APIs to provide a way to guarantee that written data will be immediately
persisted to the underlying physical storage. These APIs are described in the
following table.
> **hflush()** : This Method present inside the **FSDataOutputStream.java**
flushes all outstanding data (i.e. the current unfinished packet) from the
client into the OS buffers on all DataNode replicas.
> **hsync()** : This Method present inside the **FSDataOutputStream.java**
flushes the data to the DataNodes, like hflush(), but should also force the
data to underlying physical storage via fsync (or equivalent).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]