abdullah alamoudi has posted comments on this change. Change subject: Continue Cleaninig Up File References and Splits ......................................................................
Patch Set 4: (60 comments) https://asterix-gerrit.ics.uci.edu/#/c/1359/4/asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/stream/factory/LocalFSInputStreamFactory.java File asterixdb/asterix-external-data/src/main/java/org/apache/asterix/external/input/stream/factory/LocalFSInputStreamFactory.java: Line 53: protected Pair<String, String>[] inputFileSplits; > Seems that a FileSplit would not be too bad here, as it provides a better d Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/algebricks/algebricks-tests/src/test/java/org/apache/hyracks/algebricks/tests/pushruntime/PushRuntimeTest.java File hyracks-fullstack/algebricks/algebricks-tests/src/test/java/org/apache/hyracks/algebricks/tests/pushruntime/PushRuntimeTest.java: Line 627: public FileSplit createTempFile(NodeControllerService ncs) throws IOException { > Not sure I understand the need for temp files here. This thing was used to create the files where the output will go and get references to them for comparing with expected output. I just replaced the mechanism to create the files to ensure those files go to the IO device of the output node. I am going to rename the method to create them... https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/FileSplit.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/FileSplit.java: Line 27: * A node and a managed path inside an IO device. Used to identify a file/dir across the cluster. > The comment doesn't seem to be correct, a subclass supports unmanaged files Done Line 40: public FileSplit(String node, String path) { > Should be private. Done Line 46: public String toString() { > Would be sufficient to implement this in the subclasses. Done Line 64: public File getFile(IIOManager ioManager) throws HyracksDataException { > Would be sufficient to implement this in the subclasses. Done Line 75: public FileReference getFileRef(IIOManager ioManager) throws HyracksDataException { > Would be sufficient to implement this in the subclasses. Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/IIOManager.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/IIOManager.java: Line 66: * @param relativePath > s/relativePath/path/ Done Line 72: * A file reference based on the mounting point of {@code ioDeviceId} and the passed {@code relativePath} > s/relativePath/path/ Done Line 78: public FileReference getFileRef(String path) throws HyracksDataException; > Could we call this "resolvePath" to make it clear that some work is done he Done Line 87: public FileReference absToFileRef(String path) throws HyracksDataException; > The existence of this method seems to break the IOManager abstraction. I actually marked it as deprecated. It will stay until we have no more callers for it. as for the resolve, I think it should stay for support of the use case where the logic of device assignment is done in the NC side. https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/MappedFileSplit.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/MappedFileSplit.java: Line 28: public class MappedFileSplit extends FileSplit { > Having a "MappedFileSplit" and an "UnmanagedFileSplit" seems inconsistent. Why? This MappedFileSplit provides an additional information which is the IO device number. It is used to enable CC to decide the IO device. https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/UnmanagedFileSplit.java File hyracks-fullstack/hyracks/hyracks-api/src/main/java/org/apache/hyracks/api/io/UnmanagedFileSplit.java: Line 31: public UnmanagedFileSplit(String node, String path) { > empty line .. Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java File hyracks-fullstack/hyracks/hyracks-control/hyracks-control-nc/src/main/java/org/apache/hyracks/control/nc/io/IOManager.java: Line 216: public synchronized FileReference getFile(String path, int ioDevice) throws HyracksDataException { > s/getFile/getFileReference/ Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/AbstractFileWriteOperatorDescriptor.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/AbstractFileWriteOperatorDescriptor.java: Line 35: private final IOManager ioManager; > s/IOManager/IIOManager/ Done Line 39: FileWriteOperator(IOManager ioManager, int index) { > s/IOManager/IIOManager/ Done Line 92: protected abstract IRecordWriter createRecordWriter(IOManager ioManager, FileSplit fileSplit, int index) > s/IOManager/IIOManager/ Done Line 98: return new DeserializedOperatorNodePushable(ctx, new FileWriteOperator((IOManager) ctx.getIOManager(), > No cast required. Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/LineFileWriteOperatorDescriptor.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/LineFileWriteOperatorDescriptor.java: Line 41: return new FileOutputStream((File) args[0]); > indentation. Done Line 42: } catch (Exception e) { > s/Exception/FileNotFoundException|SecurityException/ Done Line 66: protected IRecordWriter createRecordWriter(IOManager ioManager, FileSplit fileSplit, int index) > s/IOManager/IIOManager/ Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/RecordWriter.java File hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/file/RecordWriter.java: Line 71: if (columns == null) { > Indentation. Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/btree/AbstractBTreeOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/btree/AbstractBTreeOperatorTest.java: Line 269: "data/tpch0.002/orders-part2.tbl") }; > single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/AbstractRTreeOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/AbstractRTreeOperatorTest.java: Line 217: "data/orders-with-locations-part1.txt") }; > Single line Done Line 328: "data/orders-with-locations-part2.txt") }; > Single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexInsertOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexInsertOperatorTest.java: Line 116: createTempFile(nc1) }); > Single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexScanOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexScanOperatorTest.java: Line 99: createTempFile(nc1) }); > single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexSearchOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexSearchOperatorTest.java: Line 112: createTempFile(nc1) }); > single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexStatsOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/am/rtree/RTreeSecondaryIndexStatsOperatorTest.java: Line 64: createTempFile(nc1) }); > single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/AbstractIntegrationTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/AbstractIntegrationTest.java: Line 234: String fileName = "f" + aInteger.getAndIncrement() + ".tmp"; > Is it necessary to create temp files in the IOManager? Not sure if it is necessary but it is convenient. It is used as an output destination where we can compare with the expected output. https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/ReplicateOperatorTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/ReplicateOperatorTest.java: Line 111: expectedResultsFileNames[i] = "data" + File.separator + "device0" + File.separator + inputFileName; > Seems that we don't need File.separator here if we use '/' in inputFileName Agree and I noticed it. The problem is that it is everywhere and hence decided to leave it for another change. Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/ScanPrintTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/ScanPrintTest.java: Line 87: "data/tpch0.001/orders.tbl") }; > single line Done Line 126: "data/tpch0.001/orders.tbl") }; > single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/TPCHCustomerOptimizedHybridHashJoinTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/TPCHCustomerOptimizedHybridHashJoinTest.java: Line 90: new FileSplit(NC1_ID, path) }); > single line? Done Line 100: "data/tpch0.001/customer4.tbl") }; > single line Done Line 148: "data/tpch0.001/customer3.tbl") }; > single line Done Line 181: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? because the input operator is running on a different NC and hence a one to one connector doesn't work. The reason is that it worked before is that it was running on NC1 while accessing a file in NC2 because of the use of the absolute path Line 198: "data/tpch0.001/customer3.tbl") }; > single line Done Line 202: "data/tpch0.001/orders1.tbl") }; > single line Done Line 232: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? Same as above https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/TPCHCustomerOrderHashJoinTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/TPCHCustomerOrderHashJoinTest.java: Line 74: "data/tpch0.001/customer.tbl") }; > single line Done Line 83: "data/tpch0.001/orders.tbl") }; > single line Done Line 135: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? same as the other changed connectors Line 153: "data/tpch0.001/customer.tbl") }; > single line Done Line 162: "data/tpch0.001/orders.tbl") }; > single line Done Line 218: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? same as the other connectors Line 236: "data/tpch0.001/customer.tbl") }; > single line Done Line 245: "data/tpch0.001/orders.tbl") }; > single line Done Line 301: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? same as the other connectors. It only worked before because of the use of the absolute path and so source and destination were running on the same node. it is not the case after fixing this Line 319: "data/tpch0.001/customer.tbl") }; > single line Done Line 328: "data/tpch0.001/orders.tbl") }; > single line Done Line 385: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? Done Line 411: "data/tpch0.001/orders.tbl") }; > single line Done Line 472: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? Done Line 490: "data/tpch0.001/customer.tbl") }; > single line Done Line 499: "data/tpch0.001/orders.tbl") }; > single line Done https://asterix-gerrit.ics.uci.edu/#/c/1359/4/hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/TPCHCustomerOrderNestedLoopJoinTest.java File hyracks-fullstack/hyracks/hyracks-examples/hyracks-integration-tests/src/test/java/org/apache/hyracks/tests/integration/TPCHCustomerOrderNestedLoopJoinTest.java: Line 128: "data/tpch0.001/customer.tbl") }; > single line Done Line 137: "data/tpch0.001/orders.tbl") }; > single line Done Line 185: IConnectorDescriptor ordJoinConn = new MToNBroadcastConnectorDescriptor(spec); > Why is the type of the connector changed here? As others :) Line 342: IConnectorDescriptor custJoinConn = new OneToOneConnectorDescriptor(spec); > Why is the type of the connector changed here? As others -- To view, visit https://asterix-gerrit.ics.uci.edu/1359 To unsubscribe, visit https://asterix-gerrit.ics.uci.edu/settings Gerrit-MessageType: comment Gerrit-Change-Id: I166af8f9b3a2257f94d7b05db94888fb7cb4c79e Gerrit-PatchSet: 4 Gerrit-Project: asterixdb Gerrit-Branch: master Gerrit-Owner: abdullah alamoudi <bamou...@gmail.com> Gerrit-Reviewer: Jenkins <jenk...@fulliautomatix.ics.uci.edu> Gerrit-Reviewer: Murtadha Hubail <hubail...@gmail.com> Gerrit-Reviewer: Till Westmann <ti...@apache.org> Gerrit-Reviewer: abdullah alamoudi <bamou...@gmail.com> Gerrit-HasComments: Yes