[ 
https://issues.apache.org/jira/browse/SYSTEMML-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951780#comment-15951780
 ] 

Mike Dusenberry edited comment on SYSTEMML-1451 at 3/31/17 11:20 PM:
---------------------------------------------------------------------

It would also be nice to include a restructuring of the algorithm scripts in 
this as well.  For example, if we are going to work on the performance testing, 
then we should be testing the exact algorithms sitting in 
{{scripts/algorithms}}.  If we're going to performance test those algorithms in 
an automated fashion, we should restructure them so that they contain DML 
functions for {{train}}, {{predict}}, {{eval}}, {{init}} (for any 
initialization), and {{generate_dummy_data}}.  That way, all of the logic 
needed for an algorithm would be co-located, and callable as functions with a 
simple API.  This would be in contrast to the current approach that uses a 
command-line driven approach to running the scripts with a non-uniform API and 
separate datagen.  The added benefit is, of course, that the algorithms would 
become easier to use.


was (Author: mwdus...@us.ibm.com):
It would also be nice to include a restructuring of the algorithm scripts in 
this as well.  For example, if we are going to work on the performance testing, 
then we should be testing the exact algorithms sitting in 
{{scripts/algorithms}}.  If we're going to performance test those algorithms in 
an automated fashion, we should restructure them so that they contain DML 
functions for {{train}}, {{predict}}, {{eval}}, {{init}} (for any 
initialization), and {{generate_dummy_data}}.  That way, the performance suite 
could run each algorithm one-by-one in a coherent manner.  The added benefit 
is, of course, that the algorithms will become easier to use.

> Automate performance testing and reporting
> ------------------------------------------
>
>                 Key: SYSTEMML-1451
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1451
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Infrastructure, Test
>            Reporter: Nakul Jindal
>              Labels: gsoc2017, mentor, performance, reporting, testing
>
> As part of a release (and in general), performance tests are run for SystemML.
> Currently, running and reporting on these performance tests are a manual 
> process. There are helper scripts, but largely the process is manual.
> The aim of this GSoC 2017 project is to automate performance testing and its 
> reporting.
> These are the tasks that this entails
> 1. Automate running of the performance tests, including generation of test 
> data
> 2. Detect errors and report if any
> 3. Record performance benchmarking information
> 4. Automatically compare this performance to previous version to check for 
> performance regressions
> 5. Automatically compare to Spark MLLib, R?, Julia?
> 6. Prepare report with all the information about failed jobs, performance 
> information, perf info against other comparable projects/algorithms 
> (plotted/in plain text in CSV, PDF or other common format)
> 7. Create scripts to automatically run this process on a cloud provider that 
> spins up machines, runs the test, saves the reports and spins down the 
> machines.
> 8. Create a web application to do this interactively without dropping down 
> into a shell.
> As part of this project, the student will need to know scripting (in Bash, 
> Python, etc). It may also involve changing error reporting and performance 
> reporting code in SystemML. 
> Rating - Medium (for the amount of work)
> Mentor - [~nakul02] (Other co-mentors will join in)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to