muehlburger commented on a change in pull request #971:
URL: https://github.com/apache/systemml/pull/971#discussion_r445350487



##########
File path: src/main/java/org/apache/sysds/runtime/io/FrameWriterProto.java
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.sysds.runtime.io;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Arrays;
+import java.util.Iterator;
+
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.sysds.conf.ConfigurationManager;
+import org.apache.sysds.protobuf.SysdsProtos;
+import org.apache.sysds.runtime.DMLRuntimeException;
+import org.apache.sysds.runtime.matrix.data.FrameBlock;
+import org.apache.sysds.runtime.util.HDFSTool;
+
+public class FrameWriterProto extends FrameWriter {
+       @Override
+       public void writeFrameToHDFS(FrameBlock src, String fname, long rlen, 
long clen)
+               throws IOException, DMLRuntimeException {
+               // prepare file access
+               JobConf job = new 
JobConf(ConfigurationManager.getCachedJobConf());
+               Path path = new Path(fname);
+
+               // if the file already exists on HDFS, remove it.
+               HDFSTool.deleteFileIfExistOnHDFS(fname);
+
+               // validity check frame dimensions
+               if(src.getNumRows() != rlen || src.getNumColumns() != clen) {
+                       throw new IOException("Frame dimensions mismatch with 
metadata: " + src.getNumRows() + "x"
+                               + src.getNumColumns() + " vs " + rlen + "x" + 
clen + ".");
+               }
+
+               writeProtoFrameToHDFS(path, job, src, rlen, clen);
+       }
+
+       protected void writeProtoFrameToHDFS(Path path, JobConf jobConf, 
FrameBlock src, long rlen, long clen)
+               throws IOException {
+               FileSystem fileSystem = IOUtilFunctions.getFileSystem(path, 
jobConf);
+               writeProtoFrameToFile(path, fileSystem, src, 0, (int) rlen);
+               IOUtilFunctions.deleteCrcFilesFromLocalFileSystem(fileSystem, 
path);
+       }
+
+       protected void writeProtoFrameToFile(Path path, FileSystem fileSystem, 
FrameBlock src, int lowerRowBound,
+               int upperRowBound) throws IOException {
+               // what about > 2GB Protobuf Messages?

Review comment:
       "Protobuf has a hard limit of 2GB, because many implementations use 
32-bit signed arithmetic." (see [StackOverflow 
question](https://stackoverflow.com/questions/34128872/google-protobuf-maximum-size#:~:text=Protobuf%20has%20a%20hard%20limit,manually%20if%20you%20need%20to.)).
   
   I'm not sure if this applies to us, as we use Java and no C++. 
   As every Frame Row is defined as message in 
[Frame.proto](https://github.com/apache/systemml/blob/0daa501240b14d8eb6cff07a6b7b38017dec0414/src/main/resources/protobuf/Frame.proto#L29)
 this would mean that the limit applies to every row in a frame.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to