[GitHub] jcmcote commented on a change in pull request #1500: DRILL-6820: Msgpack format reader

GitBox Tue, 06 Nov 2018 15:59:45 -0800

jcmcote commented on a change in pull request #1500: DRILL-6820: Msgpack format 
reader
URL: https://github.com/apache/drill/pull/1500#discussion_r231339350


 ##########
 File path: 
contrib/format-msgpack/src/main/java/org/apache/drill/exec/store/msgpack/MsgpackSchema.java
 ##########
 @@ -0,0 +1,114 @@
+package org.apache.drill.exec.store.msgpack;
+
+import java.io.FileNotFoundException;
+import java.io.IOException;
+
+import org.apache.commons.io.IOUtils;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.exception.SchemaChangeRuntimeException;
+import org.apache.drill.exec.proto.UserBitShared.SerializedField;
+import org.apache.drill.exec.proto.UserBitShared.SerializedField.Builder;
+import org.apache.drill.exec.record.MaterializedField;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.security.AccessControlException;
+
+import com.google.protobuf.TextFormat;
+import com.google.protobuf.TextFormat.ParseException;
+
+public class MsgpackSchema {
+  public static final String SCHEMA_FILE_NAME = ".schema.proto";
+
+  @SuppressWarnings("unused")
+  private static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(MsgpackSchema.class);
+
+  private DrillFileSystem fileSystem;
+
+  public MsgpackSchema(DrillFileSystem fileSystem) {
+    this.fileSystem = fileSystem;
+  }
+
+  public MaterializedField load(Path schemaLocation) throws 
AccessControlException, FileNotFoundException, IOException {
+    MaterializedField previousMapField = null;
+    if (schemaLocation != null && fileSystem.exists(schemaLocation)) {
+      try (FSDataInputStream in = fileSystem.open(schemaLocation)) {
+        String schemaData = IOUtils.toString(in);
+        Builder newBuilder = SerializedField.newBuilder();
+        try {
+          TextFormat.merge(schemaData, newBuilder);
+        } catch (ParseException e) {
+          throw new DrillRuntimeException("Failed to read schema file: " + 
schemaLocation, e);
+        }
+        SerializedField read = newBuilder.build();
+        previousMapField = MaterializedField.create(read);
+      }
+    }
+    return previousMapField;
+  }
+
+  public void save(MaterializedField mapField, Path schemaLocation) throws 
IOException {
 
 Review comment:
   No it does not handle multi-thread cases. Right now I turn on learning mode 
and submit queries of the form
   select * from dfs.root.`dir/aSingleFile.mp`
   I can scan more files but only one at a time. I then copy that schema file 
in all directories container that type of data. That's fine for my purpose 
right now. It's a bit manual and not ready for prime time as it is.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] jcmcote commented on a change in pull request #1500: DRILL-6820: Msgpack format reader

Reply via email to