n3nash commented on a change in pull request #2453: URL: https://github.com/apache/hudi/pull/2453#discussion_r559015240
########## File path: hudi-common/src/main/java/org/apache/hudi/common/config/SerializableSchema.java ########## @@ -62,12 +62,17 @@ private void readObject(ObjectInputStream in) throws IOException { // create a public write method for unit test public void writeObjectTo(ObjectOutputStream out) throws IOException { - out.writeUTF(schema.toString()); + // Note: writeUTF cannot support string length > 64K. So use writeObject which has small overhead (relatively). + out.writeObject(schema.toString()); } // create a public read method for unit test public void readObjectFrom(ObjectInputStream in) throws IOException { - schema = new Schema.Parser().parse(in.readUTF()); + try { + schema = new Schema.Parser().parse(in.readObject().toString()); Review comment: Can we do the following : int length = in.readInt(); byte[] value = new byte[length]; in.readFully(data); String schemaStr = new String(value, "UTF-8"); This will ensure UTF encoding/decoding ########## File path: hudi-common/src/main/java/org/apache/hudi/common/config/SerializableSchema.java ########## @@ -62,12 +62,17 @@ private void readObject(ObjectInputStream in) throws IOException { // create a public write method for unit test public void writeObjectTo(ObjectOutputStream out) throws IOException { - out.writeUTF(schema.toString()); + // Note: writeUTF cannot support string length > 64K. So use writeObject which has small overhead (relatively). + out.writeObject(schema.toString()); Review comment: Instead of this, can we do : byte[] data = schema.toString.getBytes("UTF-8"); out.writeBytes(data) to ensure we don't lost UTF encoding ? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org