andrewpalumbo commented on a change in pull request #394: URL: https://github.com/apache/mahout/pull/394#discussion_r414073687
########## File path: community/community-engines/flink-batch/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala ########## @@ -62,17 +62,16 @@ class CheckpointedFlinkDrm[K: ClassTag:TypeInformation](val ds: DrmDataSet[K], // this is extra I/O for each cache call. this needs to be moved somewhere where it is called // only once. Possibly FlinkDistributedEngine. - GlobalConfiguration.loadConfiguration(mahoutHome + "/conf/flink-config.yaml") + GlobalConfiguration.loadConfiguration(mahoutHome + "/conf/") Review comment: getting a failure here: with `mvn clean package install`. We've moved The `Flink` Module to community so that any members of the community can make use of it, see what we ran into, etc. Our main issue was with flink's greedy execution of code re-written by the optpimizer, and Overflowing Memory mid-expression which should not have been necessary: E.g. ` X:= AX.tX * B.tB - c` when put through the optimizer will come out as ` A:= SelfSq(X)AB.tB - c` Something similar was happening, i cant remember exactly, but the second `B.tB` woule be expanded into memory.. I may be way off, It was the same for iterative algorithms though, the DAG would compute at each iteration for certain expressions, which were meant to be checkpoint and cached. These Checpoints would Eagerly evaluate and crash Iterative Algos like DSSVD, and DSPCA. I'ms, unsure of the current state of Flink Batch, but is was our simple , mutual understanding that their lack of a `cache` and `checkpointing`, which kept the Mahout DSL fro every being used to its fullest, if at all. ```org.apache.mahout.flinkbindings.DrmLikeOpsSuite *** ABORTED *** org.apache.flink.configuration.IllegalConfigurationException: The Flink config file '/Users/colleenpalumbo/sandbox/mahout/conf/flink-conf.yaml' (/Users/colleenpalumbo/sandbox/mahout/conf) does not exist. at org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalConfiguration.java:124) at org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalCon``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org