Has anyone implemented a way to track the performance of a data model? We currently have an algorithm to do record linkage and spit out statistics of matches, non-matches, and/or partial matches with reason codes of why we didn’t match accurately. In this way, we will know if something goes wrong down the line. All of this goes into a csv file directories partitioned by datetime with a hive table on top. Then, we can do analytical queries and even charting if need be. All of this is very manual, but I was wondering if there is a package, software, built-in module, etc. that would do this automatically. Since we are using CDH, it would be great if these graphs could be integrated into Cloudera Manager too.
Any advice is welcome. Thanks, Ben --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org