[GitHub] [orc] guiyanakuang commented on a change in pull request #941: ORC-1030: Java Tools may have missed OrcFile.MAGIC during file recovery

GitBox Mon, 25 Oct 2021 23:25:17 -0700


guiyanakuang commented on a change in pull request #941:
URL: https://github.com/apache/orc/pull/941#discussion_r736188522




##########
File path: java/tools/src/test/org/apache/orc/tools/TestFileDump.java
##########
@@ -709,6 +714,79 @@ public void testIndexOf() {
     byte[] bytes = ("OO" + OrcFile.MAGIC).getBytes(StandardCharsets.UTF_8);
     byte[] pattern = OrcFile.MAGIC.getBytes(StandardCharsets.UTF_8);
 
-    assertEquals(FileDump.indexOf(bytes, pattern, 1), 2);
+    assertEquals(2, FileDump.indexOf(bytes, pattern, 1));
+  }
+
+  @Test
+  public void testRecover() throws Exception {
+    TypeDescription schema = getMyRecordType();
+    Writer writer = OrcFile.createWriter(testFilePath,
+        OrcFile.writerOptions(conf)
+            .fileSystem(fs)
+            .setSchema(schema));
+    Random r1 = new Random(1);
+    String[] words = new String[]{"It", "was", "the", "best", "of", "times,",
+        "it", "was", "the", "worst", "of", "times,", "it", "was", "the", "age",
+        "of", "wisdom,", "it", "was", "the", "age", "of", "foolishness,", "it",
+        "was", "the", "epoch", "of", "belief,", "it", "was", "the", "epoch",
+        "of", "incredulity,", "it", "was", "the", "season", "of", "Light,",
+        "it", "was", "the", "season", "of", "Darkness,", "it", "was", "the",
+        "spring", "of", "hope,", "it", "was", "the", "winter", "of", 
"despair,",
+        "we", "had", "everything", "before", "us,", "we", "had", "nothing",
+        "before", "us,", "we", "were", "all", "going", "direct", "to",
+        "Heaven,", "we", "were", "all", "going", "direct", "the", "other",
+        "way"};
+    VectorizedRowBatch batch = schema.createRowBatch(1000);
+    for(int i=0; i < 21000; ++i) {
+      appendMyRecord(batch, r1.nextInt(), r1.nextLong(),
+          words[r1.nextInt(words.length)]);
+      if (batch.size == batch.getMaxSize()) {
+        writer.addRowBatch(batch);
+        batch.reset();
+      }
+    }
+    if (batch.size > 0) {
+      writer.addRowBatch(batch);
+    }
+    writer.close();
+
+    long fileSize = fs.getFileStatus(testFilePath).getLen();
+    byte[] bytes = new byte[1024];
+    Path corruptedFilePath = new Path("corruptedFile.orc");
+
+    try {
+      FSDataInputStream fdis = fs.open(testFilePath);
+      FileStatus fileStatus = fs.getFileStatus(testFilePath);
+      FSDataOutputStream fdos = fs.create(corruptedFilePath, true,
+          conf.getInt("io.file.buffer.size", 4096),
+          fileStatus.getReplication(),
+          fileStatus.getBlockSize());
+      long remaining = fileSize;
+
+      while (remaining > 0) {
+        int toRead = (int) Math.min(DEFAULT_BLOCK_SIZE, remaining);
+        byte[] data = new byte[toRead];
+        long startPos = fileSize - remaining;
+        fdis.readFully(startPos, data, 0, toRead);
+        fdos.write(data);
+        remaining = remaining - toRead;
+      }
+      fdos.write(bytes);
+      fdos.write(OrcFile.MAGIC.getBytes(StandardCharsets.UTF_8));
+      fdos.write(bytes);
+      fdis.close();
+      fdos.close();

Review comment:
       I tried `fs.append(testFilePath);`
   It looks like the local filesystem doesn't support append, or is there some 
configuration I haven't set to make it work?
   ```
   java.io.IOException: Not supported
   
        at 
org.apache.hadoop.fs.ChecksumFileSystem.append(ChecksumFileSystem.java:358)
        at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1333)
        at org.apache.orc.tools.TestFileDump.testRecover(TestFileDump.java:756)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [orc] guiyanakuang commented on a change in pull request #941: ORC-1030: Java Tools may have missed OrcFile.MAGIC during file recovery

Reply via email to