Build failed in Jenkins: SystemML-DailyTest #951

2017-04-25 Thread jenkins
See 

Changes:

[Arvind Surve] [SYSTEMML-1440] Automate Release Artifact verification

--
[...truncated 27687 lines...]
17/04/25 15:54:53 INFO storage.ShuffleBlockFetcherIterator: Started 0 remote 
fetches in 0 ms
17/04/25 15:54:53 INFO executor.Executor: Finished task 0.0 in stage 261.0 (TID 
271). 2024 bytes result sent to driver
17/04/25 15:54:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
261.0 (TID 271) in 7 ms on localhost (executor driver) (1/1)
17/04/25 15:54:53 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 261.0, 
whose tasks have all completed, from pool default
17/04/25 15:54:53 INFO scheduler.DAGScheduler: ResultStage 261 (collectAsList 
at MLContextTest.java:1325) finished in 0.008 s
17/04/25 15:54:53 INFO server.ServerConnector: Stopped 
ServerConnector@7b9c2b43{HTTP/1.1}{0.0.0.0:4040}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@66c51362{/stages/stage/kill,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@cd597d8{/jobs/job/kill,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@580de8cf{/api,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@58c3df4a{/,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@4ec497cf{/static,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@c4b18fe{/executors/threadDump/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@48190aef{/executors/threadDump,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@1fd1380d{/executors/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@56129bfb{/executors,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@33bef51f{/environment/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@276e0e26{/environment,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@59d20b07{/storage/rdd/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@60c0a122{/storage/rdd,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@4a5bd3ff{/storage/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@59e4e4c7{/storage,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@2dc28d30{/stages/pool/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@13e73acc{/stages/pool,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@76acb014{/stages/stage/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@59e5a6e1{/stages/stage,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@6c6a4e3e{/stages/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@3d3b02b7{/stages,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@74bb7ba2{/jobs/job/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@86b78d{/jobs/job,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@2a1f73f5{/jobs/json,null,UNAVAILABLE}
17/04/25 15:54:53 INFO handler.ContextHandler: Stopped 
o.s.j.s.ServletContextHandler@fb8ce2c{/jobs,null,UNAVAILABLE}
17/04/25 15:54:53 INFO ui.SparkUI: Stopped Spark web UI at 
http://169.54.146.43:4040
17/04/25 15:54:53 INFO spark.MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
17/04/25 15:54:53 INFO memory.MemoryStore: MemoryStore cleared
17/04/25 15:54:53 INFO storage.BlockManager: BlockManager stopped
17/04/25 15:54:53 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
17/04/25 15:54:53 INFO 
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
17/04/25 15:54:53 INFO spark.SparkContext: Successfully stopped SparkContext
Running org.apache.sysml.test.integration.mlcontext.MLContextTest
Tests run: 177, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.312 sec - 
in org.apache.sysml.test.integration.mlcontext.MLContextTest
17/04/25 15:54:53 INFO util.ShutdownHookManager: Shutdown hook called
17/04/25 15:54:53 INFO ut

MLContext scratch space cleanup

2017-04-25 Thread Matthias Boehm
A recent issue, described in SYSTEMML-1466, made me think about the cleanup
semantics of our temporary scratch_space when coming through the new
MLContext API. For our main compilation chain (hadoop/spark_submit), the
semantics are very clear: we delete the entire script specific directory
before and after execution. However, for MLContext it is not as easy
because temporary variables are potentially handed out as results but we
need the cleanup because otherwise temporary writes fail. Checking for
existing files is also not possible, as this might even lead to incorrect
results. Could somebody please clarify the current cleanup semantics and
point me to the relevant code?

Regards,
Matthias


Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project

2017-04-25 Thread dusenberrymw
Hi Aishwarya,

Unfortunately this mailing list removes all images, so I can't view your 
screenshot.  I'm assuming that it is the same issue with the missing 
SparkContext `sc` object, but please let me know if it is a different issue.  
This sounds like it could be an issue with multiple kernels installed in 
Jupyter.  When you start the notebook, can you see if there are multiple 
kernels listed in the "Kernel" -> "Change Kernel" menu?  If so, please try one 
of the other kernels to see if Jupyter is starting by default with a non-spark 
kernel.  Also, is it possible that you have more than one instance of the 
Jupyter server running?  I.e. for this scenario, we start Jupyter itself 
directly via pyspark using the command sent previously, whereas usually Jupyter 
can just be started with `jupyter notebook`.  In the latter case, PySpark (and 
thus `sc`) would *not* be available (unless you've set up special PySpark 
kernels separately).  In summary, can you (1) check for other kernels via the 
menus, and (2) check for other running Jupyter servers that are non-PySpark?

As for the other inquiry, great question!  When training models, it's quite 
useful to track the loss and other metrics (i.e. accuracy) from *both* the 
training and validation sets.  The reasoning is that it allows for a more 
holistic view of the overall learning process, such as evaluating whether any 
overfitting or underfitting is occurring.  For example, say that you train a 
model and achieve an accuracy of 80% on the validation set.  Is this good?  Is 
this the best that can be done?  Without also tracking performance on the 
training set, it can be difficult to make these decisions.  Say that you then 
measure the performance on the training set and find that the model achieves 
100% accuracy on that data.  That might be a good indication that your model is 
overfitting the training set, and that a combination of more data, 
regularization, and a smaller model may be helpful in raising the 
generalization performance, i.e. the performance on the validation set and 
future real examples on which you wish to make predictions.  If on the other 
hand, the model achieved an 82% on the training set, this could be a good 
indication that the model is underfitting, and that a combination of a more 
expressive model and better data could be helpful.  In summary, tracking 
performance on both the training and validation datasets can be useful for 
determining ways in which to improve the overall learning process.


- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 25, 2017, at 8:47 AM, Aishwarya Chaurasia  
> wrote:
> 
> We had another query, sir. We read the entire MachineLearning.ipynb code.
> in it the training samples and the validation samples have both been
> evaluated separately and their respective losses and accuracies obtained.
> Why are the training samples being evaluated again if they were used to
> train the model in the first place? Shouldn't only the validation data
> frames be evaluated to find out the loss and accuracy?
> 
> Thank you
> 
> On 25-Apr-2017 4:00 PM, "Aishwarya Chaurasia" 
> wrote:
> 
>> Hello sir,
>> 
>> The NameError is occuring again sir. Why does it keep resurfacing?
>> 
>> Attaching the screenshot of the error.
>> 
>>> On 25-Apr-2017 2:50 AM,  wrote:
>>> 
>>> Hi Aishwarya,
>>> 
>>> For the error message, that just means that the SystemML jar isn't being
>>> found.  Can you add a `--driver-class-path 
>>> $SYSTEMML_HOME/target/SystemML.jar`
>>> to the invocation of Jupyter?  I.e. `PYSPARK_PYTHON=python3
>>> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook"
>>> pyspark  --jars $SYSTEMML_HOME/target/SystemML.jar --driver-class-path
>>> $SYSTEMML_HOME/target/SystemML.jar`. There was a PySpark bug that was
>>> supposed to have been fixed in Spark 2.x, but it's possible that it is
>>> still an issue.
>>> 
>>> As for the output, the notebook will create SystemML `Matrix` objects for
>>> all of the weights and biases of the trained models.  To save, please
>>> convert each one to a DataFrame, i.e. `Wc1.toDF()` and repeated for each
>>> matrix, and then simply save the DataFrames.  This could be done all at
>>> once like this for a SystemML Matrix object `Wc1`:
>>> `Wc1.toDf().write.save("path/to/save/Wc1.parquet", format="parquet")`.
>>> Just repeat for each matrix returned by the "Train" code for the
>>> algorithms.  At that point, you will have a set of saved DataFrames
>>> representing a trained SystemML model, and these can be used in downstream
>>> classification tasks in a similar manner to the "Eval" sections.
>>> 
>>> -Mike
>>> 
>>> --
>>> 
>>> Mike Dusenberry
>>> GitHub: github.com/dusenberrymw
>>> LinkedIn: linkedin.com/in/mikedusenberry
>>> 
>>> Sent from my iPhone.
>>> 
>>> 
 On Apr 24, 2017, at 3:07 AM, Aishwarya Chaurasia <
>>> aishwarya2...@gmail.com> wrote:
 
 Further more :
 What 

Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project

2017-04-25 Thread Aishwarya Chaurasia
We had another query, sir. We read the entire MachineLearning.ipynb code.
in it the training samples and the validation samples have both been
evaluated separately and their respective losses and accuracies obtained.
Why are the training samples being evaluated again if they were used to
train the model in the first place? Shouldn't only the validation data
frames be evaluated to find out the loss and accuracy?

Thank you

On 25-Apr-2017 4:00 PM, "Aishwarya Chaurasia" 
wrote:

> Hello sir,
>
> The NameError is occuring again sir. Why does it keep resurfacing?
>
> Attaching the screenshot of the error.
>
> On 25-Apr-2017 2:50 AM,  wrote:
>
>> Hi Aishwarya,
>>
>> For the error message, that just means that the SystemML jar isn't being
>> found.  Can you add a `--driver-class-path 
>> $SYSTEMML_HOME/target/SystemML.jar`
>> to the invocation of Jupyter?  I.e. `PYSPARK_PYTHON=python3
>> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook"
>> pyspark  --jars $SYSTEMML_HOME/target/SystemML.jar --driver-class-path
>> $SYSTEMML_HOME/target/SystemML.jar`. There was a PySpark bug that was
>> supposed to have been fixed in Spark 2.x, but it's possible that it is
>> still an issue.
>>
>> As for the output, the notebook will create SystemML `Matrix` objects for
>> all of the weights and biases of the trained models.  To save, please
>> convert each one to a DataFrame, i.e. `Wc1.toDF()` and repeated for each
>> matrix, and then simply save the DataFrames.  This could be done all at
>> once like this for a SystemML Matrix object `Wc1`:
>> `Wc1.toDf().write.save("path/to/save/Wc1.parquet", format="parquet")`.
>> Just repeat for each matrix returned by the "Train" code for the
>> algorithms.  At that point, you will have a set of saved DataFrames
>> representing a trained SystemML model, and these can be used in downstream
>> classification tasks in a similar manner to the "Eval" sections.
>>
>> -Mike
>>
>> --
>>
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>>
>> Sent from my iPhone.
>>
>>
>> > On Apr 24, 2017, at 3:07 AM, Aishwarya Chaurasia <
>> aishwarya2...@gmail.com> wrote:
>> >
>> > Further more :
>> > What is the output of MachineLearning.ipynb you're obtaining sir?
>> > We are actually nearing our deadline for our problem.
>> > Thanks a lot.
>> >
>> > On 24-Apr-2017 2:58 PM, "Aishwarya Chaurasia" 
>> > wrote:
>> >
>> > Hello sir,
>> >
>> > Thanks a lot for replying sir. But unfortunately it did not work.
>> Although
>> > the NameError did not appear this time but another error came about :
>> >
>> > https://paste.fedoraproject.org/paste/TUMtSIb88Q73FYekwJmM7V
>> > 5M1UNdIGYhyRLivL9gydE=
>> >
>> > This error was obtained after executing the second block of code of
>> > MachineLearning.py in terminal. ( ml = MLContext(sc) )
>> >
>> > We have installed the bleeding-edge version of systemml only and the
>> > installation was done correctly. We are in a fix now. :/
>> > Kindly look into the matter asap
>> >
>> > On 24-Apr-2017 12:15 PM, "Mike Dusenberry" 
>> wrote:
>> >
>> > Hi Aishwarya,
>> >
>> > Glad to hear that the preprocessing stage was successful!  As for the
>> > `MachineLearning.ipynb` notebook, here is a general guide:
>> >
>> >
>> >   - The `MachineLearning.ipynb` notebook essentially (1) loads in the
>> >   training and validation DataFrames from the preprocessing step, (2)
>> >   converts them to normalized & one-hot encoded SystemML matrices for
>> >   consumption by the ML algorithms, and (3) explores training a couple
>> of
>> >   models.
>> >   - To run, you'll need to start Jupyter in the context of PySpark via
>> >   `PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=jupyter
>> >   PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark  --jars
>> >   $SYSTEMML_HOME/target/SystemML.jar`.  Note that if you have installed
>> >   SystemML with pip from PyPy (`pip3 install systemml`), this will
>> install
>> >   our 0.13 release, and the `--jars $SYSTEMML_HOME/target/SystemML.jar`
>> > will
>> >   not be necessary.  If you instead have installed a bleeding-edge
>> version
>> > of
>> >   SystemML locally (git clone locally, maven build, `pip3 install -e
>> >   src/main/python` as listed in `projects/breast_cancer/README.md`),
>> the
>> >   `--jars $SYSTEMML_HOME/target/SystemML.jar` part *is* necessary.  We
>> are
>> >   about to release 0.14, and for this project, I *would* recommend
>> using a
>> >   bleeding edge install.
>> >   - Once Jupyter has been started in the context of PySpark, the `sc`
>> >   SparkContext object should be available.  Please let me know if you
>> >   continue to see this issue.
>> >   - The "Read in train & val data" section simply reads in the training
>> >   and validation data generated in the preprocessing stage.  Be sure
>> that
>> > the
>> >   `size` setting is the same as the preprocessing size.  The percentage
>> `p`
>> >   setting determines whether the full or sampled DataFrames are
>> loaded.  If
>> >   you set `p 

Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project

2017-04-25 Thread Aishwarya Chaurasia
Hello sir,

The NameError is occuring again sir. Why does it keep resurfacing?

Attaching the screenshot of the error.

On 25-Apr-2017 2:50 AM,  wrote:

> Hi Aishwarya,
>
> For the error message, that just means that the SystemML jar isn't being
> found.  Can you add a `--driver-class-path $SYSTEMML_HOME/target/SystemML.jar`
> to the invocation of Jupyter?  I.e. `PYSPARK_PYTHON=python3
> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook"
> pyspark  --jars $SYSTEMML_HOME/target/SystemML.jar --driver-class-path
> $SYSTEMML_HOME/target/SystemML.jar`. There was a PySpark bug that was
> supposed to have been fixed in Spark 2.x, but it's possible that it is
> still an issue.
>
> As for the output, the notebook will create SystemML `Matrix` objects for
> all of the weights and biases of the trained models.  To save, please
> convert each one to a DataFrame, i.e. `Wc1.toDF()` and repeated for each
> matrix, and then simply save the DataFrames.  This could be done all at
> once like this for a SystemML Matrix object `Wc1`:
> `Wc1.toDf().write.save("path/to/save/Wc1.parquet", format="parquet")`.
> Just repeat for each matrix returned by the "Train" code for the
> algorithms.  At that point, you will have a set of saved DataFrames
> representing a trained SystemML model, and these can be used in downstream
> classification tasks in a similar manner to the "Eval" sections.
>
> -Mike
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Apr 24, 2017, at 3:07 AM, Aishwarya Chaurasia <
> aishwarya2...@gmail.com> wrote:
> >
> > Further more :
> > What is the output of MachineLearning.ipynb you're obtaining sir?
> > We are actually nearing our deadline for our problem.
> > Thanks a lot.
> >
> > On 24-Apr-2017 2:58 PM, "Aishwarya Chaurasia" 
> > wrote:
> >
> > Hello sir,
> >
> > Thanks a lot for replying sir. But unfortunately it did not work.
> Although
> > the NameError did not appear this time but another error came about :
> >
> > https://paste.fedoraproject.org/paste/TUMtSIb88Q73FYekwJmM7V
> > 5M1UNdIGYhyRLivL9gydE=
> >
> > This error was obtained after executing the second block of code of
> > MachineLearning.py in terminal. ( ml = MLContext(sc) )
> >
> > We have installed the bleeding-edge version of systemml only and the
> > installation was done correctly. We are in a fix now. :/
> > Kindly look into the matter asap
> >
> > On 24-Apr-2017 12:15 PM, "Mike Dusenberry" 
> wrote:
> >
> > Hi Aishwarya,
> >
> > Glad to hear that the preprocessing stage was successful!  As for the
> > `MachineLearning.ipynb` notebook, here is a general guide:
> >
> >
> >   - The `MachineLearning.ipynb` notebook essentially (1) loads in the
> >   training and validation DataFrames from the preprocessing step, (2)
> >   converts them to normalized & one-hot encoded SystemML matrices for
> >   consumption by the ML algorithms, and (3) explores training a couple of
> >   models.
> >   - To run, you'll need to start Jupyter in the context of PySpark via
> >   `PYSPARK_PYTHON=python3 PYSPARK_DRIVER_PYTHON=jupyter
> >   PYSPARK_DRIVER_PYTHON_OPTS="notebook" pyspark  --jars
> >   $SYSTEMML_HOME/target/SystemML.jar`.  Note that if you have installed
> >   SystemML with pip from PyPy (`pip3 install systemml`), this will
> install
> >   our 0.13 release, and the `--jars $SYSTEMML_HOME/target/SystemML.jar`
> > will
> >   not be necessary.  If you instead have installed a bleeding-edge
> version
> > of
> >   SystemML locally (git clone locally, maven build, `pip3 install -e
> >   src/main/python` as listed in `projects/breast_cancer/README.md`), the
> >   `--jars $SYSTEMML_HOME/target/SystemML.jar` part *is* necessary.  We
> are
> >   about to release 0.14, and for this project, I *would* recommend using
> a
> >   bleeding edge install.
> >   - Once Jupyter has been started in the context of PySpark, the `sc`
> >   SparkContext object should be available.  Please let me know if you
> >   continue to see this issue.
> >   - The "Read in train & val data" section simply reads in the training
> >   and validation data generated in the preprocessing stage.  Be sure that
> > the
> >   `size` setting is the same as the preprocessing size.  The percentage
> `p`
> >   setting determines whether the full or sampled DataFrames are loaded.
> If
> >   you set `p = 1`, the full DataFrames will be used.  If you instead
> would
> >   prefer to use the smaller sampled DataFrames while getting started,
> > please
> >   set it to the same value as used in the preprocessing to generate the
> >   smaller sampled DataFrames.
> >   - The `Extract X & Y matrices` section splits each of the train and
> >   validation DataFrames into effectively X & Y matrices (still as
> DataFrame
> >   types), with X containing the images, and Y containing the labels.
> >   - The `Convert to SystemML Matrices` section passes the X & Y
> DataFrames
> >   into a SystemML script that performs