andrewpalumbo commented on a change in pull request #394:
URL: https://github.com/apache/mahout/pull/394#discussion_r414073687



##########
File path: 
community/community-engines/flink-batch/src/main/scala/org/apache/mahout/flinkbindings/drm/CheckpointedFlinkDrm.scala
##########
@@ -62,17 +62,16 @@ class CheckpointedFlinkDrm[K: ClassTag:TypeInformation](val 
ds: DrmDataSet[K],
 
   // this is extra I/O for each cache call.  this needs to be moved somewhere 
where it is called
   // only once.  Possibly FlinkDistributedEngine.
-  GlobalConfiguration.loadConfiguration(mahoutHome + "/conf/flink-config.yaml")
+  GlobalConfiguration.loadConfiguration(mahoutHome + "/conf/")

Review comment:
       getting a failure here: with `mvn clean package install`.  We've moved 
The `Flink` Module to community so that any members of the community can make 
use of it, see what we ran into, etc.  Our main issue was with flink's greedy 
execution of code re-written by the optpimizer, and Overflowing Memory 
mid-expression which should not have been necessary:
   
   E.g.  
             `         X:= AX.tX * B.tB - c`
   when put through the optimizer will come out as 
             `        A:= SelfSq(X)AB.tB - c`
   
   Something similar was happening, i cant remember exactly, but the second 
`B.tB` woule be expanded into memory.. I may be way off, It was the same for 
iterative algorithms though, the DAG would compute at each iteration for 
certain expressions, which were meant to be checkpoint and cached.  These 
Checpoints would Eagerly evaluate and crash Iterative Algos like DSSVD, and 
DSPCA.
   
   
   I'ms, unsure of the current state of Flink Batch, but is was our simple , 
mutual understanding that their lack of a `cache` and `checkpointing`, which 
kept the Mahout DSL fro every being used to its fullest, if at all. 
   
   
   
   ```org.apache.mahout.flinkbindings.DrmLikeOpsSuite *** ABORTED ***
     org.apache.flink.configuration.IllegalConfigurationException: The Flink 
config file '/Users/colleenpalumbo/sandbox/mahout/conf/flink-conf.yaml' 
(/Users/colleenpalumbo/sandbox/mahout/conf) does not exist.
     at 
org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalConfiguration.java:124)
     at 
org.apache.flink.configuration.GlobalConfiguration.loadConfiguration(GlobalCon```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to