Re: New Google Summer of Code 2017 Student - Krishna Kalyan

2017-05-10 Thread dusenberrymw
Welcome, Krishna!  Looking forward to working with you!  For a bit of my 
background related to the project, I've been heavily focused on deep learning 
by building a DML library for DL (in `scripts/nn`) and working on an applied DL 
project (in `projects/breast_cancer`).  I've also worked on the engine 
optimizer a bit, added a few new built-in ops to the engine, and run the perf 
tests previously.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On May 6, 2017, at 3:18 AM, Arvind Surve <ac...@yahoo.com.INVALID> wrote:
> 
> Welcome Krishna
>  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> 
>  From: Niketan Pansare <npan...@us.ibm.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Friday, May 5, 2017 3:45 PM
> Subject: Re: New Google Summer of Code 2017 Student - Krishna Kalyan
> 
> Welcome Krishna !!
> 
> 
> Krishna Kalyan ---05/05/2017 03:36:59 PM---Thank you so much, Looking forward 
> to work with every one in this community. Thank you for all
> 
> From: Krishna Kalyan <krishnakaly...@gmail.com>
> To: Nakul Jindal <naku...@gmail.com>
> Cc: dev@systemml.incubator.apache.org
> Date: 05/05/2017 03:36 PM
> Subject: Re: New Google Summer of Code 2017 Student - Krishna Kalyan
> 
> 
> 
> Thank you so much,
> Looking forward to work with every one in this community. Thank you for all
> the feedback and this amazing opportunity.
> 
> Regards,
> Krishna
> 
> 
> 
> 
> 
> On May 5, 2017 19:05, "Nakul Jindal" <naku...@gmail.com> wrote:
> 
> Hi All,
> 
> Let us all welcome Krishna Kalyan as a student of Google Summer of Code to
> work on SystemML.
> He will be working on automating the performance testing process of
> SystemML.
> 
> His project proposal is attached and the JIRA tracking his project can be
> found at https://issues.apache.org/jira/browse/SYSTEMML-1451
> 
> He has already been active with the community (https://www.mail-archive.com/
> dev@systemml.incubator.apache.org/msg01209.html) since January.
> 
> @Krishna - Even though I am officially the mentor, I encourage you to
> address questions to various members of the community with issues you
> encounter throughout the project. Dig through Pull Requests and discussions
> to figure out who is familiar with which components.
> 
> (I can help a cbit with my background - I have worked on the DML grammar and
> ANTLR parser layer previously and am working on the GPU backend now. I also
> ran the perf tests and am somewhat familiar with the work needed to
> automate it.)
> 
> Welcome!
> 
> -Nakul
> 
> 
> 
> 
> 


Re: [DISCUSS] Remove old MLContext API

2017-05-01 Thread dusenberrymw
+1

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On May 1, 2017, at 5:13 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> 
> 
> Hi all,
> 
> The old MLContext API (org.apache.sysml.api.MLContext, org.apache.sysml.api
> .MLContextProxy, org.apache.sysml.api.MLMatrix, org.apache.sysml.api.
> MLOutput and org.apache.sysml.api.MLBlock) has been deprecated for a while.
> I would recommend removing it from our source code. Please email back if
> you have concerns or objections.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar


Re: [NOTICE] New Apache SystemML Committer and PPMC Member

2017-05-01 Thread dusenberrymw
Welcome, Felix!

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On May 1, 2017, at 4:23 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> Congratulations Felix !!
> 
> 
> Luciano Resende ---05/01/2017 04:21:30 PM---Welcome Felix. On Mon, May 1, 
> 2017 at 4:18 PM, Arvind Surve <ac...@yahoo.com.invalid>
> 
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@systemml.incubator.apache.org, Arvind Surve <ac...@yahoo.com>
> Date: 05/01/2017 04:21 PM
> Subject: Re: [NOTICE] New Apache SystemML Committer and PPMC Member
> 
> 
> 
> 
> Welcome Felix.
> 
> On Mon, May 1, 2017 at 4:18 PM, Arvind Surve <ac...@yahoo.com.invalid>
> wrote:
> 
> > I would like to welcome Felix Schueler as a new
> > Committer and PPMC member of Apache SystemML.
> >
> > Thanks for all your work, and welcome !!!
> >
> >  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> 
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> 
> 
> 


Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)

2017-04-28 Thread dusenberrymw
+1  Grabbed the tar binary and the tar source and tested various local scripts 
in Scala & Python 2 + 3, and those ran fine.  However, I did run the MNIST 
LeNet demo on both our 0.13 release and this 0.14 candidate, and I noticed a 
regression in 0.14.  For the same script run back to back, the 0.14 candidate 
took longer, and looking into the stats, on 0.13 there were 864 Spark 
instructions executed, while on this 0.14 there were 2513 Spark instructions 
executed.   This also brought the `sp_mapmm` and `sp_sel+` instructions into 
the top 10 heavy hitters.  This could be related to the issue that I am seeing 
in SYSTEMML-1561.

Regardless, I'm still fine with releasing this, since the deep learning support 
is still experimental for 0.14.  For our upcoming 1.0 release, all engine bugs 
and issues related to deep learning need to be fixed.  Most of these bugs are 
generally applicable to all algorithms, so it is in the benefit of the project 
to fix them.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 28, 2017, at 10:37 AM, Arvind Surve <ac...@yahoo.com.INVALID> wrote:
> 
> +1
> Completed following verifications   - License and Notice validations   - 
> Binary runtime validations- Source code compilation and runtime 
> validations   - Python scripts validations using Python 2 Arvind Surve | 
> Spark Technology Center  | http://www.spark.tc/
> 
>  From: Glenn Weidner <gweid...@us.ibm.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Monday, April 24, 2017 9:30 PM
> Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)
> 
> +1
> 
> Successfully ran Linear Regression, Logistic Regression, Naive Bayes, SVM in
> Python notebooks with Spark 2.0.2 (in cloud environment) and Spark 2.1 (on 
> local test cluster) after pip install of RC4 python artifact
> systemml-0.14.0-incubating-python.tgz. Also ran Linear Regression Conjugate 
> Gradient in Scala notebooks.
> 
> Regards,
> Glenn
> 
> Matthias Boehm ---04/24/2017 02:02:12 AM---+1 I ran large-scale experiments 
> on Spark 2.1 for L2SVM, GLM, MLogreg,
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 04/24/2017 02:02 AM
> Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)
> 
> 
> 
> +1
> 
> I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg,
> LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up
> to 1TB, with uncompressed and compressed linear algebra) without any
> issues.
> 
> Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've
> seen substantial performance improvements of >2x for iterative algorithms
> with RDD operations in the inner loop over out-of-core datasets.
> 
> Regards,
> Matthias
> 
> On Wed, Apr 19, 2017 at 4:17 PM, Arvind Surve <ac...@yahoo.com.invalid>
> wrote:
> 
>> Please vote on releasing the following candidate as Apache SystemML
>> version 0.14.0-incubating !
>> The vote is open for at least 72 hours and passes if a majority of at
>> least 3 +1 PMC votes are cast.
>> [ ] +1 Release this package as Apache SystemML 0.14.0-incubating[ ] -1 Do
>> not release this package because ...
>> To learn more about Apache SystemML, please see http://systemml.apache.
>> org/
>> The tag to be voted on is v0.14.0-incubating-rc4 (
>> 8bdcf106ca9bd04c0f68924ad5827eb7d7d54952)
>> https://github.com/apache/incubator-systemml/commit/
>> 8bdcf106ca9bd04c0f68924ad5827eb7d7d54952
>> 
>> The release artifacts can be found at :https://dist.apache.org/
>> repos/dist/dev/incubator/systemml/0.14.0-incubating-rc4/
>> The maven release artifacts, including signatures, digests, etc. can
>> be found at:https://repository.apache.org/content/repositories/
>> orgapachesystemml-1021/org/apache/systemml/systemml/0.14.0-incubating/
>> === Apache Incubator release policy
>> ===Please find below the guide to
>> release management during incubation:http://incubator.apache.org/guides/
>> releasemanagement.html
>> = How can I help test this
>> release? =If you are a SystemML
>> user, you can help us test this release by taking an existing Algorithm or
>> workload and running on this release candidate, thenreporting any
>> regressions.
>> == What justifies a -1
>> vote for this release? ==-1
>> votes should only occur for significant stop-ship bugs or legal
>> related issues (e.g. wrong license, missing header files, etc). Minor bugs
>> or regressions should not block this release.
>>  -Arvind
>>  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> 
> 
> 
> 
> 


Re: Build passed/failed messages for pull requests

2017-04-28 Thread dusenberrymw
I would prefer option 2.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 28, 2017, at 12:40 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
> 
> My preference is option 3.
> 
> Thanks,
> Glenn
> 
> 
> Arvind Surve ---04/28/2017 11:09:48 AM---Agree, these messages are 
> distractions.  Arvind Surve | Spark Technology Center  | http://www.spark.
> 
> From: Arvind Surve <ac...@yahoo.com.INVALID>
> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org>
> Date: 04/28/2017 11:09 AM
> Subject: Re: Build passed/failed messages for pull requests
> 
> 
> 
> 
> Agree, these messages are distractions.
>  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> 
>  From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Friday, April 28, 2017 11:05 AM
> Subject: Re: Build passed/failed messages for pull requests
>   
> as I commented on one of these github comments, I'm strongly against 
> these kind of unnecessary messages because they distract from the actual 
> discussions. I already had to change my notification settings 
> accordingly - essentially I'm not watching SystemML's PR activity any 
> more.
> 
> Regards,
> Matthias
> 
> On 4/28/2017 10:42 AM, Deron Eriksson wrote:
> > Hi,
> >
> > When a pull request is created or another commit is pushed to that pull
> > request, a build including running our test suite is performed (Jenkins at
> > https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/).
> > This is the same model that other projects such as Apache Spark use
> > (Jenkins at
> > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/).
> >
> > A few days ago, automated build passed/failed pull request messages were
> > introduced to our pull requests, following the same type of Spark model.
> > A) SystemML example: https://github.com/apache/incubator-systemml/pull/442
> > B) Spark example: https://github.com/apache/spark/pull/17765
> >
> > Personally I like these messages because for contributors that do pull
> > requests, it automatically tells them the status of the build for their
> > pull requests and gives them a direct link to the build/test results. An
> > opposing viewpoint would be that these messages are somewhat like spam.
> >
> > So we should make a public decision on the mailing list what to do about
> > these automated build status messages.
> >
> > Some options:
> > (1) keep the automated messages exactly as they are
> > (2) keep the automated messages, but consolidate the two messages into one
> > (such as "Build successful" and "Refer to this link...").
> > (3) get rid of the automated messages
> >
> > I like (2). Any other opinions or options?
> >
> > Thoughts?
> >
> > Deron
> >
> >
> 
> 
>   
> 
> 


Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project

2017-04-27 Thread dusenberrymw
Hi Aishwarya,

Yes, it is quite strange that Jupyter isn't running on the PySpark kernel even 
though it's being started in that manner.  The good news is that we do use this 
everyday, so once we find the root issue with your Jupyter, it should work 
great!  Let's try temporarily removing all of the existing Jupyter/IPython 
settings & kernels and basically start fresh.  Assuming you are on OS X / macOS 
or Linux, can you do the following? (Please double check the exact paths, as 
I'm typing on a phone.)

* Stop Jupyter, and make sure that it is not running.
* Temporarily remove the Jupyter kernels.  First, you will need to see where 
they are installed, and then just rename that path.
`jupyter kernelspec list`
# look at paths above.  For example, on macOS, it may be located at 
~/Library/Jupyter/kernels, and thus to move it, you would use the following. 
Update this as needed for the exact paths listed above 
`mv ~/Library/Jupyter/kernels ~/Library/Jupyter_OLD/kernels`
* Temporarily remove the Jupyter & IPython settings:
`mv ~/.jupyter ~/.jupyter_OLD`
`mv ~/.ipython ~/.ipython_OLD`
* Make sure Jupyter is up to date:
`pip3 install -U ipython jupyter`

After that, please ensure that Jupyter is not running, then start it in the 
context of PySpark as sent previously.  Once Jupyter is started this time, 
there should only be one kernel listed, and `sc` should be available.

Can you try that?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 26, 2017, at 2:13 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> 
> wrote:
> 
> Hi sir,
> The sc NameError persists.
> 
> (1) There is only one jupyter server running. And that was started with the
> pyspark command in the previous mail.
> (2) Two kernels are appearing in the change kernel option - Python3 and
> Python2. Tried with both of them and the result is the same.
> 
> How is jupyter not being able to run on the pyspark kernel when we have
> started the notebook with the pyspark command only?
> 
> Is it possible to create a .py file of MachineLearning.ipynb like was done
> with preprocessing.ipynb with explicitly creating a SparkContext() ?
> 
>> On 25-Apr-2017 11:57 PM, <dusenberr...@gmail.com> wrote:
>> 
>> Hi Aishwarya,
>> 
>> Unfortunately this mailing list removes all images, so I can't view your
>> screenshot.  I'm assuming that it is the same issue with the missing
>> SparkContext `sc` object, but please let me know if it is a different
>> issue.  This sounds like it could be an issue with multiple kernels
>> installed in Jupyter.  When you start the notebook, can you see if there
>> are multiple kernels listed in the "Kernel" -> "Change Kernel" menu?  If
>> so, please try one of the other kernels to see if Jupyter is starting by
>> default with a non-spark kernel.  Also, is it possible that you have more
>> than one instance of the Jupyter server running?  I.e. for this scenario,
>> we start Jupyter itself directly via pyspark using the command sent
>> previously, whereas usually Jupyter can just be started with `jupyter
>> notebook`.  In the latter case, PySpark (and thus `sc`) would *not* be
>> available (unless you've set up special PySpark kernels separately).  In
>> summary, can you (1) check for other kernels via the menus, and (2) check
>> for other running Jupyter servers that are non-PySpark?
>> 
>> As for the other inquiry, great question!  When training models, it's
>> quite useful to track the loss and other metrics (i.e. accuracy) from
>> *both* the training and validation sets.  The reasoning is that it allows
>> for a more holistic view of the overall learning process, such as
>> evaluating whether any overfitting or underfitting is occurring.  For
>> example, say that you train a model and achieve an accuracy of 80% on the
>> validation set.  Is this good?  Is this the best that can be done?  Without
>> also tracking performance on the training set, it can be difficult to make
>> these decisions.  Say that you then measure the performance on the training
>> set and find that the model achieves 100% accuracy on that data.  That
>> might be a good indication that your model is overfitting the training set,
>> and that a combination of more data, regularization, and a smaller model
>> may be helpful in raising the generalization performance, i.e. the
>> performance on the validation set and future real examples on which you
>> wish to make predictions.  If on the other hand, the model achieved an 82%
>> on the training set, this could be a good indication that the model is
>> underfitting, and that a combination of a more expre

Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project

2017-04-25 Thread dusenberrymw
Hi Aishwarya,

Unfortunately this mailing list removes all images, so I can't view your 
screenshot.  I'm assuming that it is the same issue with the missing 
SparkContext `sc` object, but please let me know if it is a different issue.  
This sounds like it could be an issue with multiple kernels installed in 
Jupyter.  When you start the notebook, can you see if there are multiple 
kernels listed in the "Kernel" -> "Change Kernel" menu?  If so, please try one 
of the other kernels to see if Jupyter is starting by default with a non-spark 
kernel.  Also, is it possible that you have more than one instance of the 
Jupyter server running?  I.e. for this scenario, we start Jupyter itself 
directly via pyspark using the command sent previously, whereas usually Jupyter 
can just be started with `jupyter notebook`.  In the latter case, PySpark (and 
thus `sc`) would *not* be available (unless you've set up special PySpark 
kernels separately).  In summary, can you (1) check for other kernels via the 
menus, and (2) check for other running Jupyter servers that are non-PySpark?

As for the other inquiry, great question!  When training models, it's quite 
useful to track the loss and other metrics (i.e. accuracy) from *both* the 
training and validation sets.  The reasoning is that it allows for a more 
holistic view of the overall learning process, such as evaluating whether any 
overfitting or underfitting is occurring.  For example, say that you train a 
model and achieve an accuracy of 80% on the validation set.  Is this good?  Is 
this the best that can be done?  Without also tracking performance on the 
training set, it can be difficult to make these decisions.  Say that you then 
measure the performance on the training set and find that the model achieves 
100% accuracy on that data.  That might be a good indication that your model is 
overfitting the training set, and that a combination of more data, 
regularization, and a smaller model may be helpful in raising the 
generalization performance, i.e. the performance on the validation set and 
future real examples on which you wish to make predictions.  If on the other 
hand, the model achieved an 82% on the training set, this could be a good 
indication that the model is underfitting, and that a combination of a more 
expressive model and better data could be helpful.  In summary, tracking 
performance on both the training and validation datasets can be useful for 
determining ways in which to improve the overall learning process.


- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 25, 2017, at 8:47 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> 
> wrote:
> 
> We had another query, sir. We read the entire MachineLearning.ipynb code.
> in it the training samples and the validation samples have both been
> evaluated separately and their respective losses and accuracies obtained.
> Why are the training samples being evaluated again if they were used to
> train the model in the first place? Shouldn't only the validation data
> frames be evaluated to find out the loss and accuracy?
> 
> Thank you
> 
> On 25-Apr-2017 4:00 PM, "Aishwarya Chaurasia" <aishwarya2...@gmail.com>
> wrote:
> 
>> Hello sir,
>> 
>> The NameError is occuring again sir. Why does it keep resurfacing?
>> 
>> Attaching the screenshot of the error.
>> 
>>> On 25-Apr-2017 2:50 AM, <dusenberr...@gmail.com> wrote:
>>> 
>>> Hi Aishwarya,
>>> 
>>> For the error message, that just means that the SystemML jar isn't being
>>> found.  Can you add a `--driver-class-path 
>>> $SYSTEMML_HOME/target/SystemML.jar`
>>> to the invocation of Jupyter?  I.e. `PYSPARK_PYTHON=python3
>>> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook"
>>> pyspark  --jars $SYSTEMML_HOME/target/SystemML.jar --driver-class-path
>>> $SYSTEMML_HOME/target/SystemML.jar`. There was a PySpark bug that was
>>> supposed to have been fixed in Spark 2.x, but it's possible that it is
>>> still an issue.
>>> 
>>> As for the output, the notebook will create SystemML `Matrix` objects for
>>> all of the weights and biases of the trained models.  To save, please
>>> convert each one to a DataFrame, i.e. `Wc1.toDF()` and repeated for each
>>> matrix, and then simply save the DataFrames.  This could be done all at
>>> once like this for a SystemML Matrix object `Wc1`:
>>> `Wc1.toDf().write.save("path/to/save/Wc1.parquet", format="parquet")`.
>>> Just repeat for each matrix returned by the "Train" code for the
>>> algorithms.  At that point, you will have a set of saved DataFrames
>

Evaluate a scalar DAG during compilation

2017-04-24 Thread dusenberrymw
During compilation, is it possible to evaluate a scalar sub-DAG of scalar 
operations in which all leaf nodes are literals to allow for replacement with a 
literal?  For example, in our `nn` library, our convolution and pooling layers 
have to pass around the spatial dimensions (height and width) of the images 
that are stretched out into rows of the input/output matrices.  These output 
dimensions are computed within the forward functions of the above layers as 
small scalar equations.  From a mathematical standpoint, these sizes can be 
determined at compile time, and it is nice to have these size equations in DML 
(v.s. hiding them inside the engine within built-in functions).  However, we do 
not currently evaluate these expressions during compilation, and thus we are 
left with unknown sizes even during recompilation.  This naturally leads to max 
memory estimates and thus often leads to unnecessary distributed runtime ops 
rather than simple CP ones.

Thoughts?


-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: function default parameters

2017-04-21 Thread dusenberrymw
Yeah we should adopt the syntax that R and Python both use, in which default 
arguments are defined in the function definition.  

Primitive types such as ints and strings can be set in the function definition, 
and more complex types such as matrices can simply use a null value as the 
default in the function definition, followed by an actual assignment within the 
function body.

In R:
```
f <- function(x=3)
  x

f()  # 3
f(2)  # 2
```

```
f <- function(x=NULL) {
  if (is.null(x))
x = matrix(4, 1, 10)
  x
}

f()  # matrix of 4's
f(matrix(2, 5, 12))  # matrix of 2's
```

Same thing in Python, except it uses `None` instead of `NULL`:
```
def f(x=3):
  return x

f()  # 3
f(2)  # 2
```

```
def f(x=None):
  if x is None:
x = [1,2,3]
  return x

f()  # list [1,2,3]
f([4,5,6])  # list [4,5,6]
```


--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 21, 2017, at 5:40 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> BTW, that is assuming our algorithms have been converted to functions.
> Deron
> 
> 
> On Fri, Apr 21, 2017 at 5:37 PM, Deron Eriksson <deroneriks...@gmail.com>
> wrote:
> 
>> Thank you Matthias. I highly agree with your idea about having a default
>> specification similar to R WRT the function signatures for default values.
>> 
>> This becomes a significant issue for some of our algorithms, where they
>> might take in 10 arguments but default values are should typically be used
>> for  6+ or 7+ of the arguments.
>> 
>> Deron
>> 
>> 
>> On Fri, Apr 21, 2017 at 5:25 PM, Matthias Boehm <mboe...@googlemail.com>
>> wrote:
>> 
>>> well, for arguments passed into dml scripts there is of course ifdef($b,
>>> 2)
>>> but for functions there is indeed no good support. At runtime level we
>>> still support default parameters for scalar arguments at the tail of the
>>> parameter list but I guess at one point the corresponding parser support
>>> was discontinued.
>>> 
>>> I personally would like a default specification similar to R in the
>>> function signature with the corresponding function calls that bind values
>>> to a subset of parameters.
>>> 
>>> Regards,
>>> Matthias
>>> 
>>> On Fri, Apr 21, 2017 at 4:18 PM, Deron Eriksson <deroneriks...@gmail.com>
>>> wrote:
>>> 
>>>> Is there a way to set default parameter values using DML? I believe
>>> both R
>>>> and Python offer this capability.
>>>> 
>>>> The only solution I could come up with using DML is to pass in a
>>> variable
>>>> that is NaN and cast this to a string and use this string in an if
>>>> conditional statement.
>>>> 
>>>> addone = function(double b) return (double a) {
>>>>c = ''+b;
>>>>if (c == 'NaN') {
>>>>b = 2.0
>>>>}
>>>>a = b + 1;
>>>> }
>>>> 
>>>> z=0.0/0.0;
>>>> x = addone(z);
>>>> print(x);
>>>> y = addone(4.0);
>>>> print(y);
>>>> 
>>>> Is there a cleaner way to accomplish this, or is DML lacking this R
>>>> feature?
>>>> 
>>>> Deron
>>>> 
>>>> --
>>>> Deron Eriksson
>>>> Spark Technology Center
>>>> http://www.spark.tc/
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Deron Eriksson
>> Spark Technology Center
>> http://www.spark.tc/
>> 
>> 
> 
> 
> -- 
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/


Re: Regarding incubator systemml/breast_cancer project

2017-04-19 Thread dusenberrymw
Hi Aishwarya,

Looks like you've just encountered an out of memory error on one of the 
executors.  Therefore, you just need to adjust the `spark.executor.memory` and 
`spark.driver.memory` settings with higher amounts of RAM.  What is your 
current setup?  I.e. are you using a cluster of machines, or a single machine?  
We generally use a large driver on one machine, and then a single large 
executor on each other machine.  I would give a sizable amount of memory to the 
driver, and about half the possible memory on the executors so that the Python 
processes have enough memory as well.  PySpark has JVM and Python components, 
and the Spark memory settings only pertain to the JVM side, thus the need to 
save about half the executor memory for the Python side.

Thanks!

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 19, 2017, at 5:53 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> 
> wrote:
> 
> Hello sir,
> 
> We also wanted to ensure that the spark-submit command we're using is the
> correct one for running 'preprocess.py'.
> Command :  /home/new/sparks/bin/spark-submit preprocess.py
> 
> 
> Thank you.
> Aishwarya Chaurasia.
> 
> On 19-Apr-2017 3:55 PM, "Aishwarya Chaurasia" <aishwarya2...@gmail.com>
> wrote:
> 
> Hello sir,
> On running the file preprocess.py we are getting the following error :
> 
> https://paste.fedoraproject.org/paste/IAvqiiyJChSC0V9eeETe2F5M1UNdIG
> YhyRLivL9gydE=
> 
> Can you please help us by looking into the error and kindly tell us the
> solution for it.
> Thanks a lot.
> Aishwarya Chaurasia
> 
> 
>> On 19-Apr-2017 12:43 AM, <dusenberr...@gmail.com> wrote:
>> 
>> Hi Aishwarya,
>> 
>> Certainly, here is some more detailed information about`preprocess.py`:
>> 
>>  * The preprocessing Python script is located at
>> https://github.com/apache/incubator-systemml/blob/master/
>> projects/breast_cancer/preprocess.py.  Note that this is different than
>> the library module at https://github.com/apache/incu
>> bator-systemml/blob/master/projects/breast_cancer/breastc
>> ancer/preprocessing.py.
>>  * This script is used to preprocess a set of histology slide images,
>> which are `.svs` files in our case, and `.tiff` files in your case.
>>  * Lines 63-79 contain "settings" such as the output image sizes, folder
>> paths, etc.  Of particular interest, line 72 has the folder path for the
>> original slide images that should be commonly accessible from all machines
>> being used, and lines 74-79 contain the names of the output DataFrames that
>> will be saved.
>>  * Line 82 performs the actual preprocessing and creates a Spark
>> DataFrame with the following columns: slide number, tumor score, molecular
>> score, sample.  The "sample" in this case is the actual small, chopped-up
>> section of the image that has been extracted and flattened into a row
>> Vector.  For test images without labels (`training=false`), only the slide
>> number and sample will be contained in the DataFrame (i.e. no labels).
>> This calls the `preprocess(...)` function located on line 371 of
>> https://github.com/apache/incubator-systemml/blob/master/
>> projects/breast_cancer/breastcancer/preprocessing.py, which is a
>> different file.
>>  * Line 87 simply saves the above DataFrame to HDFS with the name from
>> line 74.
>>  * Line 93 splits the above DataFrame row-wise into separate "training"
>> and "validation" DataFrames, based on the split percentage from line 70
>> (`train_frac`).  This is performed so that downstream machine learning
>> tasks can learn from the training set, and validate performance and
>> hyperparameter choices on the validation set.  These DataFrames will start
>> with the same columns as the above DataFrame.  If `add_row_indices` from
>> line 69 is true, then an additional row index column (`__INDEX`) will be
>> pretended.  This is useful for SystemML in downstream machine learning
>> tasks as it gives the DataFrame row numbers like a real matrix would have,
>> and SystemML is built to operate on matrices.
>>  * Lines 97 & 98 simply save the training and validation DataFrames using
>> the names defined on lines 76 & 78.
>>  * Lines 103-137 create smaller train and validation DataFrames by taking
>> small row-wise samples of the full train and validation DataFrames.  The
>> percentage of the sample is defined on line 111 (`p=0.01` for a 1%
>> sample).  This is generally useful for quicker downstream tasks without
>> having to load in the larger DataFrames, assuming 

Re: Regarding incubator systemml/breast_cancer project

2017-04-18 Thread dusenberrymw
Hi Aishwarya,

Certainly, here is some more detailed information about`preprocess.py`:

  * The preprocessing Python script is located at 
https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/preprocess.py.
  Note that this is different than the library module at 
https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/breastcancer/preprocessing.py.
 
  * This script is used to preprocess a set of histology slide images, which 
are `.svs` files in our case, and `.tiff` files in your case.
  * Lines 63-79 contain "settings" such as the output image sizes, folder 
paths, etc.  Of particular interest, line 72 has the folder path for the 
original slide images that should be commonly accessible from all machines 
being used, and lines 74-79 contain the names of the output DataFrames that 
will be saved.
  * Line 82 performs the actual preprocessing and creates a Spark DataFrame 
with the following columns: slide number, tumor score, molecular score, sample. 
 The "sample" in this case is the actual small, chopped-up section of the image 
that has been extracted and flattened into a row Vector.  For test images 
without labels (`training=false`), only the slide number and sample will be 
contained in the DataFrame (i.e. no labels).  This calls the `preprocess(...)` 
function located on line 371 of 
https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/breastcancer/preprocessing.py,
 which is a different file.
  * Line 87 simply saves the above DataFrame to HDFS with the name from line 74.
  * Line 93 splits the above DataFrame row-wise into separate "training" and 
"validation" DataFrames, based on the split percentage from line 70 
(`train_frac`).  This is performed so that downstream machine learning tasks 
can learn from the training set, and validate performance and hyperparameter 
choices on the validation set.  These DataFrames will start with the same 
columns as the above DataFrame.  If `add_row_indices` from line 69 is true, 
then an additional row index column (`__INDEX`) will be pretended.  This is 
useful for SystemML in downstream machine learning tasks as it gives the 
DataFrame row numbers like a real matrix would have, and SystemML is built to 
operate on matrices.
  * Lines 97 & 98 simply save the training and validation DataFrames using the 
names defined on lines 76 & 78.
  * Lines 103-137 create smaller train and validation DataFrames by taking 
small row-wise samples of the full train and validation DataFrames.  The 
percentage of the sample is defined on line 111 (`p=0.01` for a 1% sample).  
This is generally useful for quicker downstream tasks without having to load in 
the larger DataFrames, assuming you have a large amount of data.  For us, we 
have ~7TB of data, so having 1% sampled DataFrames is useful for quicker 
downstream tests.  Once again, the same columns from the larger train and 
validation DataFrames will be used.
  * Lines 146 & 147 simply save these sampled train and validation DataFrames.

As a summary, after running `preprocess.py`, you will be left with the 
following saved DataFrames in HDFS:
  * Full DataFrame
  * Training DataFrame
  * Validation DataFrame
  * Sampled training DataFrame
  * Sampled validation DataFrame

As for visualization, you may visualize a "sample" (i.e. small, chopped-up 
section of original image) from a DataFrame by using the 
`breastcancer.visualization.visualize_sample(...)` function.  You will need to 
do this after creating the DataFrames.  Here is a snippet to visualize the 
first row sample in a DataFrame, where `df` is one of the DataFrames from above:

```
from breastcancer.visualization import visualize_sample
visualize_sample(df.first().sample)
```

Please let me know if you have any additional questions.

Thanks!

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 15, 2017, at 4:38 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> 
> wrote:
> 
> Hello sir,
> Can you please elaborate more on what output we would be getting because we
> tried executing the preprocess.py file using spark submit it keeps on
> adding the tiles in rdd and while running the visualisation.py file it
> isn't showing any output. Can you please help us out asap stating the
> output we will be getting and the sequence of execution of files.
> Thank you.
> 
>> On 07-Apr-2017 5:54 AM, <dusenberr...@gmail.com> wrote:
>> 
>> Hi Aishwarya,
>> 
>> Thanks for sharing more info on the issue!
>> 
>> To facilitate easier usage, I've updated the preprocessing code by pulling
>> out most of the logic into a `breastcancer/preprocessing.py` module,
>> leaving just the execution in the `Preprocessing.ipynb` notebook.  There is
>> also a `pr

Re: [VOTE] Apache SystemML 0.14.0-incubating (RC3)

2017-04-17 Thread dusenberrymw
+1 and please call it `branch-0.14`.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 17, 2017, at 8:50 AM, Arvind Surve <ac...@yahoo.com.INVALID> wrote:
> 
> I will create next RC (RC4) for SystemML 0.14 in day or two and create a 
> branch.
>  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> 
>  From: Niketan Pansare <npan...@us.ibm.com>
> To: dev@systemml.incubator.apache.org 
> Cc: Arvind Surve <ac...@yahoo.com>
> Sent: Sunday, April 16, 2017 11:57 AM
> Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC3)
> 
>> we should create a 0.14 branch along with it to unblock ongoing
>> development
> 
> +1
> 
>> On Apr 15, 2017, at 9:27 PM, Matthias Boehm <mboe...@googlemail.com> wrote:
>> 
>> I think SYSTEMML-1518 and SYSTEMML-1520 require a new RC and I agree that
>> we should create a 0.14 branch along with it to unblock ongoing
>> development. I'm happy to backport any additional fixes into this branch
>> until we have a solid release candidate.
>> 
>> Regards,
>> Matthias
>> 
>> On Thu, Apr 13, 2017 at 5:34 PM, Arvind Surve <ac...@yahoo.com.invalid>
>> wrote:
>> 
>>> Please vote on releasing the following candidate as Apache SystemML
>>> version 0.14.0-incubating !
>>> 
>>> The vote is open for at least 72 hours and passes if a majority of at
>>> least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache SystemML 0.14.0-incubating
>>> [ ] -1 Do not release this package because ...
>>> 
>>> To learn more about Apache SystemML, please see http://systemml.apache.
>>> org/
>>> 
>>> The tag to be voted on is v0.14.0-incubating-rc3 (
>>> fe6d887420143277aa8930cbea6d43a460ae7789)
>>> 
>>> https://github.com/apache/incubator-systemml/commit/
>>> fe6d887420143277aa8930cbea6d43a460ae7789
>>> 
>>> 
>>> The release artifacts can be found at :
>>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
>>> 14.0-incubating-rc3/
>>> 
>>> The maven release artifacts, including signatures, digests, etc. can
>>> be found at:
>>> https://repository.apache.org/content/repositories/
>>> orgapachesystemml-1020/org/apache/systemml/systemml/0.14.0-incubating/
>>> 
>>> =
>>> == Apache Incubator release policy ==
>>> =
>>> Please find below the guide to release management during incubation:
>>> http://incubator.apache.org/guides/releasemanagement.html
>>> 
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a SystemML user, you can help us test this release by taking
>>> an existing Algorithm or workload and running on this release candidate,
>>> then
>>> reporting any regressions.
>>> 
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> -1 votes should only occur for significant stop-ship bugs or legal
>>> related issues (e.g. wrong license, missing header files, etc). Minor bugs
>>> or regressions should not block this release.
>>> -Arvind  Arvind Surve | Spark Technology Center  | http://www.spark.tc/
> 
> 


Re: Regarding incubator systemml/breast_cancer project

2017-04-06 Thread dusenberrymw
Hi Aishwarya,

Thanks for sharing more info on the issue!

To facilitate easier usage, I've updated the preprocessing code by pulling out 
most of the logic into a `breastcancer/preprocessing.py` module, leaving just 
the execution in the `Preprocessing.ipynb` notebook.  There is also a 
`preprocess.py` script with the same contents as the notebook for use with 
`spark-submit`.  The choice of the notebook or the script is just a matter of 
convenience, as they both import from the same `breastcancer/preprocessing.py` 
package.  

As part of the updates, I've added an explicit SparkSession parameter (`spark`) 
to the `preprocess(...)` function, and updated the body to use this 
SparkSession object rather than the older SparkContext `sc` object.  
Previously, the `preprocess(...)` function accessed the `sc` object that was 
pulled in from the enclosing scope, which would work while all of the code was 
colocated within the notebook, but not if the code was extracted and imported.  
The explicit parameter now allows for the code to be imported.

Can you please try again with the latest updates?  We are currently using Spark 
2.x with Python 3.  If you use the notebook, the pyspark kernel should have a 
`spark` object available that can be supplied to the functions (as is done now 
in the notebook), and if you use the `preprocess.py` script with 
`spark-submit`, the `spark` object will be created explicitly by the script.

For a bit of context to others, Aishwarya initially reached out to find out if 
our breast cancer project could be applied to TIFF images, rather than the SVS 
images we are currently using (the answer is "yes" so long as they are "generic 
tiled TIFF images, according to the OpenSlide documentation), and then followed 
up with Spark issues related to the preprocessing code.  This conversation has 
been promptly moved to the mailing list so that others in the community can 
benefit.


Thanks!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 6, 2017, at 5:09 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> 
> wrote:
> 
> Hey,
> 
> The object sc is already defined in pyspark and yet this name error keeps
> occurring. We are using spark 2.*
> 
> Here is the link to error that we are getting :
> https://paste.fedoraproject.org/paste/89iQODxzpNZVbSfgwocH8l5M1UNdIGYhyRLivL9gydE=


Re: Java compiler for code generation

2017-04-03 Thread dusenberrymw
Using Janino sounds like a great idea.  As for the footprint size for Java-only 
execution modes, it might make sense to do an audit of our current dependencies 
to see if anything can be removed to make up for the additional amount.  Then 
we could just use it in all scenarios without worry.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Mar 31, 2017, at 9:25 PM, Matthias Boehm <mboe...@googlemail.com> wrote:
> 
> that is a good question. Yes, if we want to enable code generation in such
> a scenario it would also need Janino, which increases our footprint by
> roughly 0.6MB.
> 
> Btw, Janino fits much better into such an in-memory deployment because it
> compiles classes in-memory without the need to write class files into a
> local working directory. The same could be done for
> javax.tools.JavaCompiler, but would require to custom in-memory
> JavaFileManager.
> 
> Regards,
> Matthias
> 
> On Fri, Mar 31, 2017 at 9:14 PM, Berthold Reinwald <reinw...@us.ibm.com>
> wrote:
> 
>> Sounds like a good idea.
>> 
>> Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will
>> the dependency on Janino still be there (that question applies to JDK as
>> well), and what is the footprint?
>> 
>> Regards,
>> Berthold Reinwald
>> IBM Almaden Research Center
>> office: (408) 927 2208; T/L: 457 2208
>> e-mail: reinw...@us.ibm.com
>> 
>> 
>> 
>> From:   Matthias Boehm <mboe...@googlemail.com>
>> To: dev@systemml.incubator.apache.org
>> Date:   03/31/2017 08:17 PM
>> Subject:Java compiler for code generation
>> 
>> 
>> 
>> Hi all,
>> 
>> currently, our new code generator for operator fusion, uses the
>> programmatic javax.tools.JavaCompiler, which is Java's standard API for
>> compilation. Despite a plan cache that mitigates unnecessary compilation
>> and recompilation overheads, we still see significant end-to-end overhead
>> especially for small input data.
>> 
>> Moving forward, I'd like to switch to Janino
>> (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java
>> compiler with restricted language support. The advantages are
>> 
>> (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM,
>> and MLogreg, Janino improved total javac compilation time from 2.039 to
>> 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854
>> to
>> 0.283 (46 operators), respectively. At the same time, there was no
>> measurable impact on runtime efficiency, but even slightly reduced JIT
>> compilation overhead.
>> 
>> (2) Removed JDK requirement: Using the standard javax.tools.JavaCompiler
>> requires the existence of a JDK, while Janino only requires a JRE, which
>> means it makes it easier to apply code generation by default.
>> 
>> However, I'm raising this here as Janino would add another explicit
>> dependency (with BSD license). Fortunately, Spark also uses Janino for
>> whole-stage-codegen. So we should be able to mark Janino as provided
>> library. The only issue is a pure Hadoop environment, where we still want
>> to use code generation for CP operations. To simplify the build, I could
>> imagine using the javax.tools.JavaCompiler for hadoop execution types, but
>> Janino by default.
>> 
>> If you have any concerns, please let me know by Monday; otherwise I'd like
>> to push this change into our upcoming 0.14 release.
>> 
>> 
>> Regards,
>> Matthias
>> 
>> 
>> 
>> 
>> 


Re: UDFs Within Expressions

2017-03-30 Thread dusenberrymw
Great, we should definitely add this to the 1.0 release in order to allow for 
more expressivity in our DML, and to allow for the cleanup of existing DML that 
has had to code around this, such as the `nn` library.

I will add a JIRA (or search for one) and tag it for 1.0. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Mar 29, 2017, at 4:18 PM, Matthias Boehm <mboe...@googlemail.com> wrote:
> 
> Well, this would indeed be a very useful extension - I've actually seen
> many use cases, where new users ran into issues with simple expressions
> like X[i,i] = foo(). In the general case, the problem with UDFs is that
> they can have - in contrast to builtin functions - multiple returns. These
> multiple returns would translate to HOPs with multiple outputs, which in
> turn cannot be represented with our current HOP DAG representation because
> a HOP represents an operation and the characteristics of a single output,
> but potentially with many consumers. This is also the reason why builtin
> functions with multiple outputs (i.e., lu, eigen, qr) are internally mapped
> to FunctionOps with the same restrictions.
> 
> However, we could actually allow UDFs with a single output in expressions.
> This would require a generalization of how results variables are bound but
> should not take too much effort. Additional it would require a full pass
> through the compiler to remove any assumptions that FunctionOps always
> appear as DAG root nodes. Bottom line: We could realistically add it to our
> feature list for the 1.0 release.
> 
> Regards,
> Matthias
> 
>> On Wed, Mar 29, 2017 at 3:55 PM, <dusenberr...@gmail.com> wrote:
>> 
>> Currently, it is not possible to use UDFs within an expression.  I.e. I'd
>> like to be able to use something like `out = (-1/2) *
>> util::my_function(x)`.  This would of course extend to more elaborate
>> expressions.  Also, note that we *are* able to use built-in functions
>> within expressions.
>> 
>> I think it would be good to allow for this.  Are there any issues that
>> would make this difficult?
>> 
>> -Mike
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 


Re: Release cadence

2017-03-07 Thread dusenberrymw
+1 for immediately starting work on SystemML 1.0 as our next release.

At this point, the project and our users will benefit most from a thorough 
cleanup, as it will make the project simpler to use and easier to maintain.  
Simplicity will allow users and maintainers to regain focus on ML research and 
products, which is a win for the entire community.  We should create a solid 
list of items that we, and the rest of the community, want to address for the 
1.0 release and make sure that they are indeed completed.  At the same time, we 
should ensure that we don't drag out the release process.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Mar 6, 2017, at 10:14 AM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> +1 for SystemML 1.0 as the next release.
> 
> On Sat, Mar 4, 2017 at 10:23 AM, Deron Eriksson <deroneriks...@gmail.com>
> wrote:
> 
>> Personally I would like the next release to be 1.0. We have been an
>> incubator project since November 2015 and I believe that after over 1,000
>> commits since then that the project is about ready for a 1.0 release.
>> 
>> I agree with Matthias that we need to make a decision regarding this topic.
>> For new issues and fixed issues in JIRA, we need to be able to assign the
>> correct version, or else someone potentially needs to go through and fix
>> the version numbers, as Glenn has been doing. Additionally, it would be
>> nice to do some of the 1.0 code updates (such as removing the old
>> MLContext) now rather than waiting additional months. Also I would like to
>> be able to correctly identify our next version in the online documentation.
>> 
>> 
> How about just make SystemML Next and change the release name when we do
> the release ?
> 
> 
> 
>> Deron
>> 
>> 
>> On Sat, Mar 4, 2017 at 12:47 AM, Matthias Boehm <mboe...@googlemail.com>
>> wrote:
>> 
>>> thanks Arvind for bringing some structure to the release process. I
>> think a
>>> fixed cadence of 2 months is useful as it makes upcoming releases more
>>> predictable for devs and users.
>>> 
>>> However, we're discussing a major 1.0 release for a while now. I think it
>>> would be useful to come to an agreement if we go for 1.0 in April or not.
>>> There are some pending changes such as removing the old MLContext,
>> removing
>>> the file-based transform, isolating the matrix block library, and some
>>> language changes that should only be addresses in a major release as they
>>> break backwards compatibility. Right now, we can't touch these changes
>>> without knowing the target release.
>>> 
>>> Personally, I don't see a good reason why we should wait. Postponing this
>>> major release just creates unnecessary overhead in maintaining these old
>>> components that will be removed eventually. Since we cut RC for 0.13 on
>> Feb
>>> 20, I think having an RC around April 20 would be a good target for this
>>> 1.0 release.
>>> 
>>> 
>>> Regards,
>>> Matthias
>>> 
>>> 
>>> On Fri, Mar 3, 2017 at 5:44 PM, Arvind Surve <ac...@yahoo.com.invalid>
>>> wrote:
>>> 
>>>> Based on last couple of release cycles, we will continue with 2 months
>>>> release cycles.We will do first RC build by end of first week of second
>>>> month.
>>>> We will plan on releasing next release by end of April 2017.We will
>> have
>>>> RC build on ~April 6th.  -Arvind
>>>> Arvind Surve | Spark Technology Center  | http://www.spark.tc/
>>>> 
>>>>  From: Acs S <ac...@yahoo.com.INVALID>
>>>> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.
>>>> apache.org>
>>>> Sent: Monday, January 9, 2017 11:41 AM
>>>> Subject: Re: Release cadence
>>>> 
>>>> We need to release SystemML on more frequent basis to get community
>>>> engaged. It will provide us more feedback on functionality we add.While
>>>> releasing SystemML on monthly basis is challenge due to longer phase of
>>>> validation process we need to find a way to be quicker.
>>>> I can propose options to get closer to monthly release if acceptable.
>>>> Make every two releases available on monthly basis and third on two
>>> months
>>>> basis. This cycle will continue.
>>>> 1. Do minimal testing on two releases (minor releases) and release them
>>> on
>>>> monthly basis. Pe

Re: Dropping Java 6 and 7 support

2017-03-07 Thread dusenberrymw
+1

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Mar 7, 2017, at 10:49 AM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> +1
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Berthold Reinwald---03/06/2017 11:16:19 PM---+1 on removing java 6 and 7. 
> Regards,
> 
> From: Berthold Reinwald/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 03/06/2017 11:16 PM
> Subject: Re: Dropping Java 6 and 7 support
> 
> 
> 
> 
> +1 on removing java 6 and 7.
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date:   03/06/2017 10:58 PM
> Subject:Dropping Java 6 and 7 support
> 
> 
> 
> Hi all,
> 
> I'd like to drop the support for Java 6 and 7 in our SystemML 1.0 release.
> Our build still refers to a java compliance level 6, which has not been
> changed for more than 5 years now. Spark >= 1.5 anyway requires Java 7 and
> there has been some discussion on removing Java 7 as well because it
> reached end of life in April 2015. Moving to Java 8 would allow us to
> modernize the code base going forward and the 1.0 release would be the
> perfect time for this change.
> 
> Regards,
> Matthias
> 
> 
> 
> 
> 
> 
> 


Re: [DISCUSS] SystemML Graduation

2017-03-03 Thread dusenberrymw
+1

Thanks for bringing up this topic, Luciano.  I definitely think it is the right 
time to start discussing graduation.  The past 16 months have shown a sustained 
and growing level of commitment to the project, with several exciting new areas 
of development that the community is continuing to work on.  As a community, 
we've grown to value and embrace the Apache process, and it's allowed us to 
hold effective public discussions on code, branding, etc., to the benefit of 
the project. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Mar 3, 2017, at 5:41 PM, Nakul Jindal <naku...@gmail.com> wrote:
> 
> +1
> 
> Thank you Luciano for starting this discussion and the guidance you've
> provided on this project.
> In addition to the aforementioned accomplishments of the project, the
> roadmap (which has been on the mailing list) also directs us towards making
> continued healthy progress.
> 
> Nakul Jindal
> 
> 
>> On Fri, Mar 3, 2017 at 5:00 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
>> 
>> +1
>> 
>> Thank you Luciano for starting the discussion and for all the guidance
>> you've provided from the beginning of the project. I agree that the Apache
>> SystemML community has grown and achieved many exciting things during
>> incubation. For example, today we completed our fifth release of Apache
>> SystemML after releasing previous version in February. Graduating to a
>> top-level project will be another important accomplishment and help
>> continue momentum with developers and users.
>> 
>> Regards,
>> Glenn
>> 
>> 
>> [image: Inactive hide details for Deron Eriksson ---03/03/2017 12:02:35
>> PM---+1 Thank you for starting this important discussion Lucian]Deron
>> Eriksson ---03/03/2017 12:02:35 PM---+1 Thank you for starting this
>> important discussion Luciano, and thank you for
>> 
>> From: Deron Eriksson <deroneriks...@gmail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 03/03/2017 12:02 PM
>> Subject: Re: [DISCUSS] SystemML Graduation
>> --
>> 
>> 
>> 
>> +1
>> 
>> Thank you for starting this important discussion Luciano, and thank you for
>> all the guidance that you have provided us regarding the Apache Incubator,
>> the Apache Software Foundation, and open-source software development! I'd
>> also like to thank Henry for all the great assistance and hard work since
>> becoming an additional mentor for the project.
>> 
>> I believe that we may indeed be ready to graduate to a top level project
>> due both to our technical efforts and our community efforts. Since we
>> became an incubator project, in terms of code we have consistently
>> demonstrated a high level of excellent activity from a wide range of
>> contributors. We have 1,065 commits since we became an incubator project
>> and have closed 391 pull requests in that time. Additionally, over time, we
>> have all learned many best practices and Apache guidelines, for example how
>> to properly validate our source releases in terms of content and licenses.
>> We have also learned the processes involved with topics such as JIRA,
>> GitHub, Git, Subversion, and software releases, and how to interact with
>> groups such as Apache infrastructure to effectively develop open-source
>> software following the Apache way.
>> 
>> I think everyone on the SystemML project has also worked hard to build an
>> open community around the project. We have open discussions on technical
>> matters, especially in the area of pull requests, and these discussions
>> demonstrate a consistent ability to reach consensus while allowing
>> respectful disagreement. I believe our mailing list could be used more
>> frequently, since it offers a more centralized location for discussions
>> (compared to pull request discussions), which could be an addition way to
>> help the community. However, we do have important discussions on the
>> mailing list, for example in regards to questions from users, and
>> communication on the mailing list is positive and encouraging to community
>> growth.
>> 
>> Deron
>> 
>> 
>> On Thu, Mar 2, 2017 at 5:14 PM, Luciano Resende <luckbr1...@gmail.com>
>> wrote:
>> 
>>> It has been an exciting 16 months so far, and the project has
>> accomplished
>>> 4 official Apache Releases and is currently requesting the IPMC to
>> approve
>>> the 5th release. We have voted 3 new committers and PPMC members and
>>> welcomed a ne

Re: [VOTE] Apache SystemML 0.13.0-incubating (RC2)

2017-02-27 Thread dusenberrymw
+1

Installed the Python package using the URL and ran a quick sanity test using 
MLContext.  The Python package is installed correctly, and the JAR is 
seamlessly installed in the background as desired.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 23, 2017, at 1:46 PM, Nakul Jindal <naku...@gmail.com> wrote:
> 
> +1
> 
> Basic sanity tests pass on Mac.
> 
> On Thu, Feb 23, 2017 at 1:14 PM, Deron Eriksson <deroneriks...@gmail.com>
> wrote:
> 
>> +1
>> 
>> Performed the following validations for artifacts at
>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
>> 13.0-incubating-rc2/
>> :
>> 
>> 1. -bin.tgz/-bin.zip contain disclaimer, license, notice
>> 2. -bin.tgz/-bin.zip licenses reference all included dependencies with
>> correct licenses
>> 3. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar contains
>> disclaimer, license, notice
>> 3. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar contains antlr
>> runtime and wink classes
>> 4. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar license references
>> antlr runtime and wink
>> 5. -python.tgz contains disclaimer, license notice
>> 6. -python.tgz license references antlr runtime and wink with correct
>> licenses
>> 7. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar
>> contains disclaimer, license, notice
>> 8. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar
>> contains antlr runtime and wink classes
>> 9. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar
>> license references antlr runtime and wink
>> 10. -src.tgz/-src.zip contain disclaimer, license, notice
>> 11. -src.tgz/-src.zip licenses reference all included projects (jquery,
>> etc) with correct licenses
>> 12. -src.tgz/-src.zip contain no binaries (dll, exe, pdb, lib)
>> 13. -src.tgz/-src.zip build project artifacts (mvn clean package -P
>> distribution)
>> 14. -src.tgz/-src.zip SystemML jar runs (hello world)
>> 15. -src.tgz/-src.zip test suite runs (mvn verify)
>> 16. -bin.tgz/-bin.zip runStandaloneSystemML.sh (hello world)
>> 17. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
>> 2.0.2
>> (hello world)
>> 18. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
>> 2.1.0
>> (hello world)
>> 19. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 (hello
>> world)
>> 20. -bin.tgz/-bin.zip runStandaloneSystemML.sh (univar stats, haberman
>> data)
>> 21. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
>> 2.0.2
>> (univar stats, generated data)
>> 22. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit
>> 2.1.0
>> (univar stats, generated data)
>> 23. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7
>> default
>> exec mode (univar stats, generated data)
>> 24. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 hadoop
>> exec mode (univar stats, generated data)
>> 25. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar MLContext
>> spark-shell 2.0.2 (univar stats, haberman data)
>> 26. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar MLContext
>> spark-shell 2.1.0 (univar stats, haberman data)
>> 
>> 
>> 
>> On Wed, Feb 22, 2017 at 7:23 PM, Arvind Surve <ac...@yahoo.com.invalid>
>> wrote:
>> 
>>> Please vote on releasing the following candidate as Apache SystemML
>>> version 0.13.0-incubating !
>>> 
>>> The vote is open for at least 72 hours and passes if a majority of at
>>> least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache SystemML 0.13.0-incubating
>>> [ ] -1 Do not release this package because ...
>>> 
>>> To learn more about Apache SystemML, please see http://systemml.apache.
>>> org/
>>> 
>>> The tag to be voted on is v0.13.0-incubating-rc2 (
>>> ff3e741694e507f64a6b52ee71638bddecabe7af)
>>> 
>>> https://github.com/apache/incubator-systemml/commit/
>>> ff3e741694e507f64a6b52ee71638bddecabe7af
>>> 
>>> The release artifacts can be found at :
>>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
>>> 13.0-incubating-rc2/
>>> 
>>> The maven release artifacts, including signatures, digests, etc. can
>>> be found at:
>>> 
>>> https://repository.apache.org/content/repositories/

Re: Proposal to add 'accuracy test suite' before 1.0 release

2017-02-17 Thread dusenberrymw
There is also the possibility of writing the correctness tests completely in 
DML itself, thus allowing an ML researcher / data scientist to easily create 
the tests. For example, the SystemML-NN library has a full test suite written 
entirely in DML in the `nn/test/` directory (i.e. no Java tests) that tests 
mathematical correctness of gradients, as well as general correctness of 
various layers as needed.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 17, 2017, at 5:46 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> +1 for creating tests for the main algorithm scripts. This would be a great
> addition to the project.
> 
> Note that the creation of tests (junit) typically requires some Java skills
> (and knowledge of ml algorithms) whereas a new algorithm script typically
> requires R/Python skills. Therefore, testing of algorithms probably
> requires some focused coordination between 'data scientists' and
> 'developers' to occur for this to happen smoothly for new algorithms.
> 
> Deron
> 
> 
>> On Fri, Feb 17, 2017 at 5:28 PM, <dusenberr...@gmail.com> wrote:
>> 
>> +1 for testing our actual (vs simplified test version) scripts against
>> some metric of choice.  This will allow us to (1) ensure that each script
>> does not have a showstopper bug (engine bug), and (2) that this script is
>> still producing a reasonable mathematical result (math bug).
>> 
>> -Mike
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 
>>> On Feb 17, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
>>> 
>>> For now, I have updated our python mllearn tests to compare the
>> prediction of our algorithm to that of scikit-learn:
>> https://github.com/apache/incubator-systemml/blob/
>> master/src/main/python/tests/test_mllearn_numpy.py#L81
>>> 
>>> The test now uses scikit-learn predictions as the baseline and computes
>> the scores (accuracy score for classifiers and r2 score for regressors). If
>> the score is greater than 95%, the test pass. Though using this approach,
>> we do not measure the generalization capability of our algorithm, we at
>> least ensure that our algorithm performs no worse than scikit-learn under
>> default setting. We can make the testing even more rigorous later. The next
>> step would be to enable these python tests through jenkins.
>>> 
>>> Thanks,
>>> 
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>> 
>>> Matthias Boehm ---02/17/2017 11:54:02 AM---Yes, this has been discussed
>> a couple of times now, most recently in SYSTEMML-546. It takes quite s
>>> 
>>> From: Matthias Boehm <mboe...@googlemail.com>
>>> To: dev@systemml.incubator.apache.org
>>> Date: 02/17/2017 11:54 AM
>>> Subject: Re: Proposal to add 'accuracy test suite' before 1.0 release
>>> 
>>> 
>>> 
>>> 
>>> Yes, this has been discussed a couple of times now, most recently in
>>> SYSTEMML-546. It takes quite some effort though to create a
>>> sophisticated algorithm-level test suite as done for GLM. So by all
>>> means, please, go ahead and add these tests.
>>> 
>>> However, I would not impose any constraints on the contribution of new
>>> algorithms in that regard, or similarly on tests with simplified
>>> algorithms because it would raise the bar to high.
>>> 
>>> Regards,
>>> Matthias
>>> 
>>> 
>>>> On 2/17/2017 10:48 AM, Niketan Pansare wrote:
>>>> 
>>>> 
>>>> Hi all,
>>>> 
>>>> We currently test the correctness of individual runtime operators
>> using our
>>>> integration tests but not the "released" algorithms. To be fair, we do
>> test
>>>> a subset of "simplified" algorithms on synthetic datasets and compare
>> the
>>>> accuracy with R. Also, we are testing subset of released algorithms
>> using
>>>> our Python tests, but it's intended purpose is to only test the
>> integration
>>>> of the APIs:
>>>> Simplified algorithms:
>>>> https://github.com/apache/incubator-systemml/tree/
>> master/src/test/scripts/applications
>>>> Released algorithms:

Re: Removal of workaround flags

2017-02-16 Thread dusenberrymw
Yeah I want us to look heavily into this problem in the context of deep 
learning algorithms.  I think we should plan on having first-class support for 
DL in our 1.0 release, including efficient (distributed SGD) training (+GPUs) 
and efficient distributed scoring.  Nice thing too is that when we achieve 
this, we'll end up benefiting most of our existing algorithms as well.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 15, 2017, at 12:22 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> Hi Matthias,
> 
> I am OK with removing this flag, but would prefer that we keep the JIRA open 
> until we are sure that caching is not a bottleneck. I have noticed that the 
> gradients turns to sparse as we execute more iterations. Also, cache release 
> time is dependent on the memory budget. Here are the statistics running Lenet 
> on MNIST using 
> https://github.com/apache/incubator-systemml/tree/master/scripts/staging/SystemML-NN/examples
> 
> With 20G driver memory, the statistics after running 10 epochs are as follows:
> Epoch: 10, Iter: 700, Train Loss: 0.20480149054528493, Train Accuracy: 
> 0.984375, Val Loss: 0.026928755962383588, Val Accuracy: 0.9922
> Epoch: 10, Iter: 800, Train Loss: 0.20165772217976913, Train Accuracy: 1.0, 
> Val Loss: 0.027878978005867083, Val Accuracy: 0.9922
> 17/02/14 16:06:58 INFO DMLScript: SystemML Statistics:
> Total elapsed time: 12687.863 sec.
> Total compilation time: 2.168 sec.
> Total execution time: 12685.694 sec.
> Number of compiled Spark inst: 147.
> Number of executed Spark inst: 4.
> Cache hits (Mem, WB, FS, HDFS): 1096424/0/0/2.
> Cache writes (WB, FS, HDFS): 603950/15/8.
> Cache times (ACQr/m, RLS, EXP): 3.704/0.336/61.831/1.242 sec.
> HOP DAGs recompiled (PRED, SB): 0/154885.
> HOP DAGs recompile time: 28.663 sec.
> Functions recompiled: 1.
> Functions recompile time: 0.024 sec.
> Spark ctx create time (lazy): 1.009 sec.
> Spark trans counts (par,bc,col):0/0/2.
> Spark trans times (par,bc,col): 0.000/0.000/3.433 secs.
> Total JIT compile time: 44.711 sec.
> Total JVM GC count: 7459.
> Total JVM GC time: 166.26 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) train 12138.979 sec 1
> -- 2) conv2d_bias_add 10876.708 sec 17362
> -- 3) conv2d_backward_filter 421.303 sec 17200
> -- 4) sel+ 239.660 sec 25881
> -- 5) update 226.687 sec 68800
> -- 6) update_nesterov 223.775 sec 68800
> -- 7) maxpooling_backward 136.709 sec 17200
> -- 8) conv2d_backward_data 134.315 sec 8600
> -- 9) ba+* 118.897 sec 51762
> -- 10) relu_maxpooling 112.283 sec 17362
> -- 11) relu_backward 107.483 sec 34400
> -- 12) uack+ 89.258 sec 34400
> -- 13) r' 74.304 sec 43000
> -- 14) +* 57.193 sec 34400
> -- 15) * 16.493 sec 95178
> -- 16) rand 16.038 sec 8613
> -- 17) / 8.352 sec 86492
> -- 18) rangeReIndex 6.628 sec 17208
> -- 19) + 3.054 sec 96528
> -- 20) uark+ 2.219 sec 43241
> -- 21) sp_csvrblk 2.183 sec 2
> -- 22) rmvar 1.517 sec 1451571
> -- 23) write 1.250 sec 9
> -- 24) - 1.059 sec 86486
> -- 25) createvar 1.026 sec 587259
> -- 26) exp 0.663 sec 17281
> -- 27) *2 0.361 sec 2
> -- 28) uasqk+ 0.277 sec 320
> -- 29) log 0.200 sec 160
> -- 30) uarmax 0.191 sec 17281
> 
> With 5G driver memory, the statistics after running 10 epochs are as follows:
> Epoch: 10, Iter: 700, Train Loss: 0.19313544015858036, Train Accuracy: 1.0, 
> Val Loss: 0.025943927403263182, Val Accuracy: 0.993
> Epoch: 10, Iter: 800, Train Loss: 0.1883995965207449, Train Accuracy: 1.0, 
> Val Loss: 0.0260796819319468, Val Accuracy: 0.9916
> 17/02/14 20:16:40 INFO DMLScript: SystemML Statistics:
> Total elapsed time: 13886.763 sec.
> Total compilation time: 2.148 sec.
> Total execution time: 13884.615 sec.
> Number of compiled Spark inst: 147.
> Number of executed Spark inst: 4.
> Cache hits (Mem, WB, FS, HDFS): 1096422/0/2/2.
> Cache writes (WB, FS, HDFS): 603868/2176/8.
> Cache times (ACQr/m, RLS, EXP): 3.883/0.343/271.757/1.312 sec.
> HOP DAGs recompiled (PRED, SB): 0/154885.
> HOP DAGs recompile time: 28.290 sec.
> Functions recompiled: 1.
> Functions recompile time: 0.023 sec.
> Spark ctx create time (lazy): 0.981 sec.
> Spark trans counts (par,bc,col):0/0/2.
> Spark trans times (par,bc,col): 0.000/0.000/3.501 secs.
> Total JIT compile time: 45.131 sec.
> Total JVM GC count: 7605.
> Total JVM GC time: 157.716 sec.
> Heavy hitter instructions (name, time, count):
> -- 1) train 13301.811 sec 1
> -- 2) conv2d_bias_add 11890.291 sec 17362
> -- 3) conv2d_backward_filter 416.645 sec 17200
> -- 4) ba+* 252.966 sec 51762
> -- 5) sel+ 237.334 sec 25881
> -- 6) update 228.261 sec 68800
> -- 7) update_nesterov 225.383 sec 68800
> -- 8) m

Re: Namespace handling w/ imports

2017-02-13 Thread dusenberrymw
Thanks, Matthias for bringing this up.  As Glenn pointed out, the full file 
path as the namespace is needed so that we can effectively build 
libraries/packages for SystemML, rather than just single-file scripts.  If you 
truncate the namespace down to just the name of the specific file, then you 
prevent the ability to build a library in which the same file name is used in 
multiple folders.

Another note with the example presented.  Assuming that you are running the 
`mnist_lenet-train.dml` script, the `train` and `predict` functions are defined 
in `mnist_lenet.dml`, which is imported, so those functions would not be in the 
default namespace.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 12, 2017, at 10:15 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
> 
> Use of source filenames instead of default namespace helped address various 
> issues and tasks under https://issues.apache.org/jira/browse/SYSTEMML-590 
> that were encountered when creating the SystemML-NN script library. Unit 
> tests were also added to cover different import scenarios. As I recall, 
> function name conflicts could potentially occur between independent source 
> files when global default namespace used. It also helped simplify calling 
> dml-bodied functions when a file was imported by another.
> 
> Thanks,
> Glenn
> 
> 
> Matthias Boehm ---02/12/2017 12:30:35 AM---While debugging our mnist_lenet 
> script, I encountered an issue with our namespace handling with imp
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 02/12/2017 12:30 AM
> Subject: Namespace handling w/ imports
> 
> 
> 
> 
> While debugging our mnist_lenet script, I encountered an issue with our 
> namespace handling with imports. Here is the related function call graph 
> (after inlining):
> 
> FUNCTION CALL GRAPH
> --MAIN PROGRAM
> .\mnist_lenet.dml::train
> --.\nn/layers/dropout.dml::forward
> --.\mnist_lenet.dml::predict
> 
> but it should read as follows
> 
> FUNCTION CALL GRAPH
> --MAIN PROGRAM
> .defaultNS::train
> --dropout::forward
> --.defaultNS::predict
> 
> The namespace handling was changed a while ago. So my question is: was 
> there a necessity to encode the filenames in the namespace or is this 
> just a bug?
> 
> 
> Regards,
> Matthias
> 
> 
> 
> 


Re: Pull Request Reviews

2017-02-13 Thread dusenberrymw
Thanks, Deron, for bringing up this topic!  PRs, and the associated 
discussions, are a critical part of any modern, successful open source project. 
 As Deron stated, anyone in the community should feel free to review PRs -- we 
want your thoughts and opinions and greatly appreciate your help!

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 3, 2017, at 6:55 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> Hi,
> 
> Reviewing pull requests is a great way to contribute to the success of
> SystemML. If you are involved in any way with SystemML, please consider
> reviewing pull requests. Everyone can review pull requests, and it is a
> great way to gain experience with the project.
> 
> Thanks!
> Deron
> 
> 
> Username PRs Reviewed
> mboehm7 134
> dusenberrymw 112
> deroneriksson 110
> niketanpansare 40
> gweidner 31
> shirisht 26
> akchinSTC 25
> nakul02 23
> bertholdreinwald 15
> lresende 12
> frreiss 12
> fschueler 9
> Wenpei 7
> asurve 5
> iyounus 4
> MechCoder 3
> MadisonJMyers 3
> oza 2
> fmakari 2
> rightwaitforyou 2
> ethanyxu 1
> ckadner 1
> petro-rudenko 1
> hsaputra 1
> FelixNeutatz 1
> nishi-t 0
> sandeep-n 0
> romeokienzler 0
> tgamal 0
> taasawat 0
> sourav-mazumder 0
> kevin-bates 0
> kakal 0
> GrapeBaBa 0
> objectadjective 0
> nmanchev 0
> jodersky 0
> jdyer1 0
> gmlewis 0
> aloknsingh 0
> akunft 0
> ahmaurya 0
> 
> 
> -- 
> Deron Eriksson
> Spark Technology Center
> http://www.spark.tc/


Re: Removal of workaround flags

2017-02-13 Thread dusenberrymw
Thanks for bringing up the topic.  Our deep learning scripts (i.e. algorithms 
with several intermediate transformations) have shown cache release times to be 
a major bottleneck, thus leading to the creation of SYSTEMML-1140.  
Specifically, what did you use to attempt to reproduce 1140?


-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 12, 2017, at 12:30 AM, Matthias Boehm <mboe...@googlemail.com> wrote:
> 
> SYSTEMML-1140


Re: Remove documentation for old MLContext API

2017-02-02 Thread dusenberrymw
+1 for removing that old documentation.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 2, 2017, at 3:54 PM, fschue...@posteo.de wrote:
> 
> As a step to deprecate the old MLContext API, I suggest to remove its 
> documentation for the next release (together with a deprecation of the actual 
> API so that we can remove it in 1.0).
> 
> Currently the section about the old API is placed in between up-to-date 
> documentation and makes it pretty confusing to see what is old and what is 
> new.
> 
> Any objections? Alternatively we could put it all the way to the end or in a 
> separate document.
> 
> -Felix


Re: February Podling Report

2017-02-01 Thread dusenberrymw
LGTM. Thanks, Deron!

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Feb 1, 2017, at 2:33 AM, Matthias Boehm <mboe...@googlemail.com> wrote:
> 
> optionally, we could include the following paper that we presented at CIDR'17 
> in January.
> 
> Tarek Elgamal, Shangyu Luo, Mattias Boehm, Alexandre V. Evfimievski, Shirish 
> Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization 
> and Operator Fusion for Large-Scale Machine Learning, CIDR 2017.
> 
> Regards,
> Matthias
> 
>> On 2/1/2017 7:30 AM, Deron Eriksson wrote:
>> Hi,
>> 
>> I posted our SystemML podling report for February to:
>> https://wiki.apache.org/incubator/February2017
>> 
>> Please feel free to make any additions or modifications, such as individual
>> efforts to help build our project community. If you don't have write access
>> to the wiki, please request write access or ask Mike, Luciano, or me to
>> make any additions or modifications.
>> 
>> Thanks,
>> Deron
>> 


Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)

2017-01-31 Thread dusenberrymw
+1

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 31, 2017, at 4:12 PM, Berthold Reinwald <reinw...@us.ibm.com> wrote:
> 
> +1
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   Glenn Weidner/Silicon Valley/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date:   01/31/2017 08:55 AM
> Subject:Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
> 
> 
> 
> Yes (used same data as in 
> https://github.com/apache/incubator-systemml/tree/master/src/main/python/tests
> ).
> 
> +1
> 
> Thanks,
> Glenn
> 
> 
> Berthold Reinwald---01/31/2017 08:36:24 AM---Thanks, Glenn. Did you run 
> LinearRegression, etc. in the Python Notebook? Regards,
> 
> From: Berthold Reinwald/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 01/31/2017 08:36 AM
> Subject: Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
> 
> 
> 
> Thanks, Glenn. Did you run LinearRegression, etc. in the Python Notebook?
> 
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   Glenn Weidner/Silicon Valley/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date:   01/28/2017 11:55 AM
> Subject:Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
> 
> 
> 
> Verified python artifact functionality in python 2.7 notebook with new 
> spark 1.6 instance via:
> 
> !pip install 
> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.12.0-incubating-rc2/systemml-0.12.0-incubating-python.tgz
> 
> 
> 
> Successfully ran LinearRegression, LogisticRegression, NaiveBayes, SVM.
> 
> Regards,
> Glenn
> 
> Glenn Weidner---01/27/2017 05:46:55 PM---Thank you Matthias! I definitely 
> agree with bringing the S, M, L scenarios to within a few days.
> 
> From: Glenn Weidner/Silicon Valley/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 01/27/2017 05:46 PM
> Subject: Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
> 
> 
> 
> Thank you Matthias! I definitely agree with bringing the S, M, L scenarios 
> 
> to within a few days.
> 
> Yes, for m-svm, the classes argument was default of 150 whereas maxiter 
> was set to 3 (instead of 20). I ran tests with both 0.11 and 0.12 RC1/RC2 
> on same cluster for comparison and will share results separately.
> 
> Thanks,
> Glenn
> 
> 
> Matthias Boehm ---01/27/2017 03:45:47 PM---Thanks Glenn. Could you please 
> also share the measurements (maybe in a jira).
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 01/27/2017 03:45 PM
> Subject: Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
> 
> 
> 
> Thanks Glenn. Could you please also share the measurements (maybe in a 
> jira).
> 
> Furthermore, seeing that you ran only a subset of multinomial 
> experiments, makes me wonder if you used the current default 
> configuration of 150 classes? In the recent past, we usually ran this 
> perftest with a reasonable number of about 20 which significantly 
> impacts performance because broadcast constraints are exceeded. Given 
> the goal of a fast release process, we might want to update the perftest 
> to bring scenarions S, M, and L to something like 2 days.
> 
> 
> Regards,
> Matthias
> 
>> On 1/28/2017 12:10 AM, Glenn Weidner wrote:
>> Successfully completed performance testing including medium and large 
> data
>> sets for Binomial, Clustering, Multinomial (subset), Regression, and
>> Statistics tests.
>> 
>> Regards,
>> Glenn
>> 
>> 
>> 
>> 
>> From: Arvind Surve <ac...@yahoo.com.INVALID>
>> To: Dev <dev@systemml.incubator.apache.org>
>> Date: 01/26/2017 02:50 PM
>> Subject: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
>> 
>> 
>> 
>> Please vote on releasing the following candidate as Apache SystemML
>> version 0.12.0-incubating !
>> 
>> The vote is open for at least 72 hours and passes if a majority of at
>> least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache SystemML 0.12.0-incubating
>> [ ] -1 Do not release this package because ...
>> 
>> To learn more about Apache SystemML, please see 
> http://systemml.apache.org/
>> 
>> The tag to be voted on is v0.12.0-incubating-rc2
>> (d96a17f64cef7f251d9592679ecdee7ac17feb04)
>> 
>> 
> https://github.com/apache/incuba

Re: Jira Notifications

2017-01-25 Thread dusenberrymw
I was under the impression that the issues mailing list contained all the 
general JIRA notifications.  From what you said, it sounds like that may not be 
the case anymore.  Perhaps we should open a ticket with Infrastructure?

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 23, 2017, at 3:04 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> Few questions about Jira Notifications
> 
> 1- When are notifications being Sent (at least when they get created ?)
> 2- Which list ? (I search dev and issues for an example and didn't find
> SYSTEMML-1191)
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Status of `mlpipeline_test` branch

2017-01-25 Thread dusenberrymw
Hi all,

On our Git repo, there is currently a `mlpipeline_test` branch.  Is this still 
needed?  If not, I would like to delete it.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: [DISCUSS] Roadmap SystemML 1.0

2017-01-16 Thread dusenberrymw
Yeah using the target release would be good. Actually, with that in mind, I 
believe that we have been marking closed issues since the 0.11 release as 
targeting an upcoming "1.0" release, but it would probably be more correct to 
update those to "0.12" since we decided to release 0.12. In addition, we should 
set the target of the Spark 2.x support issue to "0.13".

As for the roadmap, it would be good to update the website with a high-level 
overview, with links to associated JIRA issues.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 16, 2017, at 7:35 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> Instead of Epic, we could use the target release ? Also, we have a roadmap
> page on the site and we should keep that up to date, or get rid of that and
> use roadmap on jira.
> 
>> On Mon, Jan 16, 2017 at 6:20 PM <dusenberr...@gmail.com> wrote:
>> 
>> Now that we've had some discussion here, it would be good to transfer this
>> discussion into a JIRA epic, containing sub tasks. That way, we can
>> properly track our progress on these items and facilitate contributions
>> from the community.  Note that some of the sub tasks may already exist as
>> individual issues.
>> 
>> 
>> 
>> Would anyone in the community like to volunteer for creating these issues?
>> 
>> 
>> 
>> - Mike
>> 
>> 
>> 
>> --
>> 
>> 
>> 
>> Mike Dusenberry
>> 
>> GitHub: github.com/dusenberrymw
>> 
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> 
>> 
>> Sent from my iPhone.
>> 
>> 
>> 
>> 
>> 
>>>> On Jan 4, 2017, at 6:00 PM, dusenberr...@gmail.com wrote:
>>> 
>>> 
>> 
>>> Overall, this is a good list of items that should be worked on,
>> particularly because it contains several user-facing items.  However, to
>> echo what Luciano said, I'm also concerned about the timeline.  At this
>> stage, I agree that we need to release more often, and with a more
>> user-oriented "product" focus as a guide for timelines.  I.e. we should
>> orient our release timelines around items that focus on the "product" of
>> allowing the user to work on a wide range of ML problems in a simple and
>> easy manner on top of Spark.
>> 
>>> 
>> 
>>> With that in mind, I agree that a focus on a subset of (1) and (2) would
>> be good for an immediate release, with a particular focus on Spark 2.0
>> support as a priority.
>> 
>>> 
>> 
>>> How about we aim for a February 1st release date for the initial items?
>> 
>>> 
>> 
>>> -Mike
>> 
>>> 
>> 
>>> --
>> 
>>> 
>> 
>>> Mike Dusenberry
>> 
>>> GitHub: github.com/dusenberrymw
>> 
>>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>>> 
>> 
>>> Sent from my iPhone.
>> 
>>> 
>> 
>>> 
>> 
>>>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
>> 
>>>> 
>> 
>>>> Hi Matthias,
>> 
>>>> 
>> 
>>>> Thanks for the detailed roadmap.
>> 
>>>> 
>> 
>>>> +1 for all the items with few modifications.
>> 
>>>> 
>> 
>>>> 1) APIs and Language:
>> 
>>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> 
>>>>>> Ensure Python and Scala MLContext have same API capability.
>> 
>>>> 
>> 
>>>> * Remove old MLContext
>> 
>>>> * Consolidate MLContext and JMLC
>> 
>>>> * Full support for Scala/Python DSLs
>> 
>>>>>> +1 for Python DSL except for push-down of loop structures and
>> functions.
>> 
>>>> 
>> 
>>>> * Remove old file-based transform
>> 
>>>> * Scala/Python wrappers for all existing algorithms
>> 
>>>> * Data converters (additional formats: e.g., libsvm; performance)
>> 
>>>> 
>> 
>>>> 2) Updated Dependencies:
>> 
>>>> * Spark 2.0 support
>> 
>>>> * Matrix block library (isolated jar)
>> 
>>>> 
>> 
>>>> 3) Compiler/Runtime Features:
>> 
>>>> * GPU support (full compiler and runtime support)
>> 
>>>>>> Can we break this down into phases:
>> https://issues.apache.o

Re: Broken Website Menu On iOS

2017-01-16 Thread dusenberrymw
Awesome!  Thanks, Jeremy (& Dexter)!  I just discovered it, so there's not an 
issue created yet -- can you create one?

Thanks!

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 16, 2017, at 6:40 PM, Jeremy Anderson <jer...@objectadjective.com> 
> wrote:
> 
> Dexter and I will pick this up. Is there an issue for this already?
> 
> ...
> 
> Jeremy Anderson
> 
> Github: https://github.com/objectadjective
> Twitter: https://twitter.com/ObjectAdjective
> LinkedIn: http://www.linkedin.com/in/objectadjective
> 
>> On 16 January 2017 at 18:27, <dusenberr...@gmail.com> wrote:
>> 
>> Hi all,
>> 
>> It appears that the main website drop-down menus (Community, Apache) are
>> broken on iOS browsers (iPhone).  By "broken", I mean that it is not
>> possible to click on the down-arrow to expand those drop-down menus.
>> 
>> 1. Can someone check if this is also the case on Android browsers?  In
>> Chrome with mobile rendering?
>> 2. Would someone like to volunteer to fix this?
>> 
>> -Mike
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 


Broken Website Menu On iOS

2017-01-16 Thread dusenberrymw
Hi all,

It appears that the main website drop-down menus (Community, Apache) are broken 
on iOS browsers (iPhone).  By "broken", I mean that it is not possible to click 
on the down-arrow to expand those drop-down menus.

1. Can someone check if this is also the case on Android browsers?  In Chrome 
with mobile rendering?
2. Would someone like to volunteer to fix this?

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: [DISCUSS] Roadmap SystemML 1.0

2017-01-16 Thread dusenberrymw
Now that we've had some discussion here, it would be good to transfer this 
discussion into a JIRA epic, containing sub tasks. That way, we can properly 
track our progress on these items and facilitate contributions from the 
community.  Note that some of the sub tasks may already exist as individual 
issues.

Would anyone in the community like to volunteer for creating these issues?

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 4, 2017, at 6:00 PM, dusenberr...@gmail.com wrote:
> 
> Overall, this is a good list of items that should be worked on, particularly 
> because it contains several user-facing items.  However, to echo what Luciano 
> said, I'm also concerned about the timeline.  At this stage, I agree that we 
> need to release more often, and with a more user-oriented "product" focus as 
> a guide for timelines.  I.e. we should orient our release timelines around 
> items that focus on the "product" of allowing the user to work on a wide 
> range of ML problems in a simple and easy manner on top of Spark.
> 
> With that in mind, I agree that a focus on a subset of (1) and (2) would be 
> good for an immediate release, with a particular focus on Spark 2.0 support 
> as a priority.
> 
> How about we aim for a February 1st release date for the initial items?
> 
> -Mike
> 
> --
> 
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
> 
> Sent from my iPhone.
> 
> 
>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
>> 
>> Hi Matthias,
>> 
>> Thanks for the detailed roadmap. 
>> 
>> +1 for all the items with few modifications.
>> 
>> 1) APIs and Language:
>> * Cleanup new MLContext (matrix/frame data types, move tests, etc)
>> >> Ensure Python and Scala MLContext have same API capability.
>> 
>> * Remove old MLContext
>> * Consolidate MLContext and JMLC
>> * Full support for Scala/Python DSLs
>> >> +1 for Python DSL except for push-down of loop structures and functions. 
>> 
>> * Remove old file-based transform
>> * Scala/Python wrappers for all existing algorithms
>> * Data converters (additional formats: e.g., libsvm; performance)
>> 
>> 2) Updated Dependencies:
>> * Spark 2.0 support
>> * Matrix block library (isolated jar)
>> 
>> 3) Compiler/Runtime Features:
>> * GPU support (full compiler and runtime support)
>> >> Can we break this down into phases: 
>> >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the 
>> >> timeline of the phases in the JIRA.
>> 
>> * Compressed linear algebra v2
>> * Code generation (automatic operator fusion)
>> * Extended parfor (full spark exploitation, micro-batch support)
>> * Scale-up architecture (large dense blocks, numa)?
>> 
>> 4) Tools
>> * Extended stats (task locality, shuffle, etc)
>> * Cloud resource advisor (extended resource optimizer)?
>> 
>> 5) Algorithms
>> * Graduate "staging" algorithms (robustness/performance)
>> * Perftest: include all algorithms into automated performance tests
>> >> via spark-submit + via Scala/Python wrappers
>> 
>> * Simplify usage decision trees, random forest, mlogreg, msvm 
>> (preprocessing, label representation, etc)
>> >> + command-line variable naming. For example: maxi, maxiter, etc.
>> 
>> Thanks,
>> 
>> Niketan Pansare
>> IBM Almaden Research Center
>> E-mail: npansar At us.ibm.com
>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>> 
>> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) 
>> can be done incrementally. For (5), some of the changes might also
>> 
>> From: Matthias Boehm <mboe...@googlemail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 01/03/2017 02:44 PM
>> Subject: Re: [DISCUSS] Roadmap SystemML 1.0
>> 
>> 
>> 
>> 
>> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some 
>> of the changes might also modify the signature of algorithms (i.e., 
>> parameters and required input data) but it would help, for example with 
>> decision trees, as users no longer need to dummy code their inputs.
>> 
>> Generally, I'm fine with making (3), (4), and part of (5) optional and 
>> let the "must-have" features from (1) and (2) determine the timeline.
>> 
>> Regards,
>> Matthias
>> 
>> On 1/3/2017 11:27 PM, Lucian

Re: SystemML Branch for any fixes related to Spark 1.6x

2017-01-13 Thread dusenberrymw
Well, I think the final consensus in the community was that the 0.12 release 
would be the final line that supports Spark 1.6.x.  All future releases will be 
on Spark 2.x.  The idea of supporting both simultaneously was considered, but 
ultimately it was agreed that it just wouldn't be sustainable.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 13, 2017, at 3:46 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> The changes here are related to 1.x spark releases, right? So the idea here
> is that this becomes a dev stream for Spark 1.6 support and you guys can
> have 0.13, 0.14, 0.15, as required from this branch.
> 
> If you guys want to change, I don't have any objections, please go ahead
> and change.
> 
>> On Fri, Jan 13, 2017 at 1:55 PM, <dusenberr...@gmail.com> wrote:
>> 
>> Thanks, Luciano for creating the branch. Could we rename it to
>> "branch-0.12" to better reflect that any changes that are added would only
>> apply to future bug fix releases on the 0.12.x line?  This would be more in
>> line with the naming scheme that Spark uses for its branches, and should
>> cause less confusion.
>> 
>> --
>> 
>> Mike Dusenberry
>> GitHub: github.com/dusenberrymw
>> LinkedIn: linkedin.com/in/mikedusenberry
>> 
>> Sent from my iPhone.
>> 
>> 
>>> On Jan 13, 2017, at 1:50 PM, Luciano Resende <luckbr1...@gmail.com>
>> wrote:
>>> 
>>> We have created the following branch to track Spark 1.6 fixes :
>>> origin/branch-systemml-spark-1.6
>>> 
>>> Note that, fixes that go into master, and are also affecting 1.6, they
>>> should be cherry-picked to the 1.6 branch as well.
>>> 
>>> As for checking out, you will need to do something like the steps below
>>> (your preference might change some steps)
>>> 
>>> git checkout -b branch-systemml-spark-1.6 origin/branch-systemml-spark-
>> 1.6
>>> git branch --set-upstream-to origin/branch-systemml-spark-1.6
>>> branch-systemml-spark-1.6
>>> 
>>> this last one is like:
>>> 
>>> git branch --set-upstream-to origin/my_remote_branch my_local_branch
>>> 
>>> For creating dev branches for 1.6, first go to you local 1.6 branch and
>>> continue with your regular steps such as git branch -b JIRA-222
>>> 
>>> And good luck !!!
>>> 
>>> --
>>> Luciano Resende
>>> http://twitter.com/lresende1975
>>> http://lresende.blogspot.com/
>> 
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: SystemML Branch for any fixes related to Spark 1.6x

2017-01-13 Thread dusenberrymw
Thanks, Luciano for creating the branch. Could we rename it to "branch-0.12" to 
better reflect that any changes that are added would only apply to future bug 
fix releases on the 0.12.x line?  This would be more in line with the naming 
scheme that Spark uses for its branches, and should cause less confusion. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 13, 2017, at 1:50 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> We have created the following branch to track Spark 1.6 fixes :
> origin/branch-systemml-spark-1.6
> 
> Note that, fixes that go into master, and are also affecting 1.6, they
> should be cherry-picked to the 1.6 branch as well.
> 
> As for checking out, you will need to do something like the steps below
> (your preference might change some steps)
> 
> git checkout -b branch-systemml-spark-1.6 origin/branch-systemml-spark-1.6
> git branch --set-upstream-to origin/branch-systemml-spark-1.6
> branch-systemml-spark-1.6
> 
> this last one is like:
> 
> git branch --set-upstream-to origin/my_remote_branch my_local_branch
> 
> For creating dev branches for 1.6, first go to you local 1.6 branch and
> continue with your regular steps such as git branch -b JIRA-222
> 
> And good luck !!!
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: GSoc 2017

2017-01-12 Thread dusenberrymw
Yeah helping to build out our Python DSL into a full-out replacement for the 
current "DML" language would be great, and we'd be quite supportive!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 12, 2017, at 2:58 PM, fschue...@posteo.de wrote:
> 
> Hi Krishna,
> 
> cool to see that you're interested in SystemML!
> 
> From your list I personally think that a) and d) would be well suited for 
> projects, especially a good python DSL is a high priority.
> 
> We will apply as an organization to GSoC once organization applications are 
> open (Jan. 19th) and I think we will find mentors for at least a) and d). If 
> you already want to take a look at what is currently there, I suggest to look 
> at our python APIs and documentation. If you want to take on the DSL project 
> it might also be a good idea to look into the DML documentation and related 
> papers to see what we need to support.
> 
> The proposals will probably circulate on the mailinglist, too, so keep an eye 
> on that :)
> 
> -Felix
> 
> Am 12.01.2017 23:13 schrieb Krishna Kalyan:
>> Hello All,
>> Thank you for your wonderful replies.
>> Tasks that I am interested in:
>> a) Support for Python DSLs
>> b) Python wrappers for all existing algorithms
>> c) GPU support
>> d) Perftest : automated performance tests of algorithms
>> I am also willing to work on the tasks that SystemML community think are
>> important.
>> Regards,
>> Krishna
>> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <dusenberr...@gmail.com>
>> wrote:
>>> Hi Krishna!  Welcome, and thanks for your interest!
>>> We would definitely be excited to collaborate with you on a GSOC project.
>>> We've started another thread to discuss possible new proposals, and we
>>> would also be quite interested in any particular proposal that you might
>>> like to generate tailored towards your interests.  Copied from the other
>>> thread, some possible ideas could include: building out a full ML demo to
>>> solve a real, large-scale problem that would benefit from a distributed
>>> approach; overall performance improvements that address a full class, or
>>> wider area, of ML algorithms, rather than a single, specific script;
>>> infrastructure for [performance] testing, and identification of wide areas
>>> of improvement; helping with building out fully-featured, clean,
>>> well-tested DSLs in Python & Scala (we've started, but it would be good to
>>> continue stressing them -- we could even aim to replace DML with the DSLs);
>>> etc.  Overall, we want to improve the ability of the user to work on a wide
>>> range of large-scale, distributed ML problems in a simple and easy manner
>>> on top of Spark.
>>> In the meantime, you could explore our recent open issues [1] and even
>>> begin discussions or contributions on any of the items.  You could also
>>> view our recent roadmap discussion thread on the mailing list, starting
>>> with the first email [2]:
>>> [1]:
>>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SYSTEMML%20AND%
>>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C%
>>> 20priority%20DESC
>>> [2]:
>>> http://mail-archives.apache.org/mod_mbox/incubator-
>>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c-
>>> bad740599...@gmail.com%3E
>>> - Mike
>>> --
>>> Michael W. Dusenberry
>>> GitHub: github.com/dusenberrymw
>>> LinkedIn: linkedin.com/in/mikedusenberry
>>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <luckbr1...@gmail.com>
>>> wrote:
>>> > As some folks have described on this thread, it would be great to get you
>>> > familiarized with SystemML.
>>> >
>>> > In parallel, I would look for a mentor from the active committer list and
>>> > start working on a project proposal which could be based on the recent
>>> > Roadmap discussion [1].
>>> >
>>> > If you are looking for some guidance on how Apache participate on GSOC,
>>> > take a look at the following resources [2] and [3], and don't hesitate to
>>> > ask questions here.
>>> >
>>> >
>>> > [1]
>>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o
>>> > rg/msg01199.html
>>> > [2] http://community.apache.org/gsoc.html
>>> > [3]
>>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help-
>>> > 

Re: Time To Merge Spark 2.0 Support PR

2017-01-09 Thread dusenberrymw
Let's cut a 0.12 branch tomorrow, and then submit it for the release process on 
Friday. Thoughts?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 9, 2017, at 12:40 PM, Arvind Surve <ac...@yahoo.com.INVALID> wrote:
> 
> ok, Thanks
> 
>  Arvind SurveSpark Technology Centerhttp://www.spark.tc/
> 
>  From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Monday, January 9, 2017 12:33 PM
> Subject: Re: Time To Merge Spark 2.0 Support PR
> 
>> On Mon, Jan 9, 2017 at 12:28 PM, <dusenberr...@gmail.com> wrote:
>> 
>> Right, so we can cut a 0.12 release branch now, and then release from
>> that, while work moves forward on the master branch, including 2.0 support.
>> 
>> 
> Exactly, 0.12 release will come from a brach that we will create and Spark
> 2.0 support gets merged into Master.
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> 
> 


Re: Time To Merge Spark 2.0 Support PR

2017-01-09 Thread dusenberrymw
Right, so we can cut a 0.12 release branch now, and then release from that, 
while work moves forward on the master branch, including 2.0 support.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 9, 2017, at 12:26 PM, Acs S <ac...@yahoo.com.INVALID> wrote:
> 
> As I already mentioned we need to have proper way for Spark 1.6 users to use 
> SystemML Python DSL. Pip Install Artifact was missing from SystemML 0.11 
> release and it needs to be added in SystemML 0.12 release.
> Arvind SurveSpark Technology Centerhttp://www.spark.tc
>  From: "dusenberr...@gmail.com" <dusenberr...@gmail.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Monday, January 9, 2017 12:18 PM
> Subject: Re: Time To Merge Spark 2.0 Support PR
> 
> Just to be clear, instead of creating a branch to merge the 2.0 support, we 
> will want to merge the 2.0 support into the master branch.
> 
> --
> 
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
> 
> Sent from my iPhone.
> 
> 
>> On Jan 9, 2017, at 12:02 PM, Acs S <ac...@yahoo.com.INVALID> wrote:
>> 
>> Based on discussion thread we will start creating SystemML release based on 
>> Spark 2.0.
>> There are bunch of activities need to be completed and we need volunteer for 
>> most of them.
>> Activity 
>>
>> Volunteer1. Create a branch based on SystemML 0.12 release to merge Spark 
>> 2.0 codeLuciano2. Get Spark 2.0 PR merged to 
>> this new branch. 
>>Glenn3. Do build changes to have both Spark 1.6 and 2.0 
>> builds for release and PR.  (Someone needs to work 
>> with Alan)
>> 4. Setup Spark 2.0 cluster (One of the Almaden cluster updated with Spark 
>> 2.0)5. Create Release Candidate  
>> Glenn, 
>> Deron, Arvind6. Performance Testing7. Notebook testing   
>>  
>>   Arvind8. Python DSL verification (2.x and 3.x)9. 
>> Scala DSL verification10. Artifacts verification11. Documentation update.
>> 
>> -Arvind SurveSpark Technology Centerhttp://www.spark.tc
>> 
>>   From: Niketan Pansare <npan...@us.ibm.com>
>> To: dev@systemml.incubator.apache.org 
>> Sent: Friday, January 6, 2017 1:12 PM
>> Subject: Re: Time To Merge Spark 2.0 Support PR
>> 
>> I am fine with creating a branch for Spark 1.6 support and merging Spark 2.0 
>> PR then. Like Luciano said, we can creating a release 0.12 from our Spark 
>> 1.6 branch. 
>> 
>> Overriding previous release is common practice for pip installer, however 
>> pypi does maintain the history of releases. Once a release candidate 0.12 is 
>> created, the user can install SystemML python package in three ways:
>> 1. From source by checking out the branch and executing: mvn package -P 
>> distribution, followed by pip install 
>> target/systemml-0.12.0-incubating-python.tgz
>> 2. From Apache site, pip install 
>> http://www.apache.org/dyn/closer.lua/incubator/systemml/0.12.0-incubating/systemml-0.12.0-incubating-python.tgz
>> 3. From pypi by specifying the version, pip install -I 
>> systemml_incubating==0.12
>> 
>> As long as we ensure that version of the python package on pypi matches our 
>> release version and we document the Spark support in our release notes, 
>> there should not be any confusion on usage :)
>> 
>> Thanks,
>> 
>> Niketan Pansare
>> IBM Almaden Research Center
>> E-mail: npansar At us.ibm.com
>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>> 
>> Acs S ---01/06/2017 12:57:53 PM---I would agree to create a branch and add 
>> Spark 2.0 to it, while still releasing SystemML 0.12 releas
>> 
>> From: Acs S <ac...@yahoo.com.INVALID>
>> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org>
>> Date: 01/06/2017 12:57 PM
>> Subject: Re: Time To Merge Spark 2.0 Support PR
>> 
>> 
>> 
>> I would agree to create a branch and add Spark 2.0 to it, while still 
>> releasing SystemML 0.12 release with Pip Install Artifact.
>> Regarding 

Re: Time To Merge Spark 2.0 Support PR

2017-01-09 Thread dusenberrymw
Just to be clear, instead of creating a branch to merge the 2.0 support, we 
will want to merge the 2.0 support into the master branch.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 9, 2017, at 12:02 PM, Acs S <ac...@yahoo.com.INVALID> wrote:
> 
> Based on discussion thread we will start creating SystemML release based on 
> Spark 2.0.
> There are bunch of activities need to be completed and we need volunteer for 
> most of them.
> Activity  
>   
> Volunteer1. Create a branch based on SystemML 0.12 release to merge Spark 2.0 
> codeLuciano2. Get Spark 2.0 PR merged to this 
> new branch.   
>  Glenn3. Do build changes to have both Spark 1.6 and 2.0 builds 
> for release and PR.  (Someone needs to work with Alan)
> 4. Setup Spark 2.0 cluster (One of the Almaden cluster updated with Spark 
> 2.0)5. Create Release Candidate   
> Glenn, Deron, 
> Arvind6. Performance Testing7. Notebook testing   
>   
>  Arvind8. Python DSL verification (2.x and 3.x)9. Scala DSL 
> verification10. Artifacts verification11. Documentation update.
> 
> -Arvind SurveSpark Technology Centerhttp://www.spark.tc
> 
>  From: Niketan Pansare <npan...@us.ibm.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Friday, January 6, 2017 1:12 PM
> Subject: Re: Time To Merge Spark 2.0 Support PR
> 
> I am fine with creating a branch for Spark 1.6 support and merging Spark 2.0 
> PR then. Like Luciano said, we can creating a release 0.12 from our Spark 1.6 
> branch. 
> 
> Overriding previous release is common practice for pip installer, however 
> pypi does maintain the history of releases. Once a release candidate 0.12 is 
> created, the user can install SystemML python package in three ways:
> 1. From source by checking out the branch and executing: mvn package -P 
> distribution, followed by pip install 
> target/systemml-0.12.0-incubating-python.tgz
> 2. From Apache site, pip install 
> http://www.apache.org/dyn/closer.lua/incubator/systemml/0.12.0-incubating/systemml-0.12.0-incubating-python.tgz
> 3. From pypi by specifying the version, pip install -I 
> systemml_incubating==0.12
> 
> As long as we ensure that version of the python package on pypi matches our 
> release version and we document the Spark support in our release notes, there 
> should not be any confusion on usage :)
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Acs S ---01/06/2017 12:57:53 PM---I would agree to create a branch and add 
> Spark 2.0 to it, while still releasing SystemML 0.12 releas
> 
> From: Acs S <ac...@yahoo.com.INVALID>
> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org>
> Date: 01/06/2017 12:57 PM
> Subject: Re: Time To Merge Spark 2.0 Support PR
> 
> 
> 
> I would agree to create a branch and add Spark 2.0 to it, while still 
> releasing SystemML 0.12 release with Pip Install Artifact.
> Regarding comment from Mike, that new SystemML release will update PyPy 
> package.Shouldn't it be tagged with version #? Otherwise every release will 
> override previous one.Niketan, any comments?
> -Arvind
> 
>  From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Friday, January 6, 2017 12:52 PM
> Subject: Re: Time To Merge Spark 2.0 Support PR
>   
> +1 on moving to Spark 2.x - I think we delayed this way too long now and 
> there will always be some awesome feature that we'd want to support on 
> older Spark versions too.
> 
> Regards,
> Matthias
> 
>> On 1/6/2017 9:41 PM, Mike Dusenberry wrote:
>> Well to be fair, a user can still use the Python DSL with the SystemML 0.11
>> release by using `pip install -e src/main/python`.  We just didn't place a
>> separate Python binary on the release website.  Keep in mind as well that
>> once we release the next release with Spark 2.x support, a Spark 1.6 will
>> not be able to use `pip install systemml` anyway, as that PyPy package will
>> have been updated to the latest Spark 2.0 release.

Re: Parfor semantics

2016-11-22 Thread dusenberrymw
Also for some context, we're aiming to use this for remote hyperparameter 
tuning over a large dataset.  Specifically, each remote process would train a 
separate model over the full dataset using a mini-batch SGD approach.  Has the 
`parfor` construct been used for this purpose before?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mboe...@googlemail.com> wrote:
> 
> that's a good catch - thanks Felix. It would be great if you could modify 
> rewriteSetExecutionStategy and rewriteSetFusedDataPartitioningExecution in 
> OptimizerConstrained to handle the respective Spark execution types. Thanks.
> 
> Regards,
> Matthias
> 
>> On 11/22/2016 7:54 PM, fschue...@posteo.de wrote:
>> The constrained optimizer doesn't seem to know about a REMOTE_SPARK
>> execution mode and either sets CP or REMOTE_MR. I can open a jira for
>> that and provide a fix.
>> 
>> Felix
>> 
>> Am 22.11.2016 02:07 schrieb Matthias Boehm:
>>> yes, this came up several times - initially we only supported opt=NONE
>>> where users had to specify all other parameters. Meanwhile, there is a
>>> so-called "constrained optimizer" that does the same as the rule-based
>>> optimizer but respects any given parameters. Please try something like
>>> this:
>>> 
>>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) {
>>> // some code here
>>> }
>>> 
>>> 
>>> Regards,
>>> Matthias
>>> 
>>>> On 11/22/2016 12:33 AM, fschue...@posteo.de wrote:
>>>> While debugging some ParFor code it became clear that the parameters for
>>>> parfor can be easily overwritten by the optimizer.
>>>> One example is when I write:
>>>> 
>>>> ```
>>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) {
>>>>// some code here
>>>> }
>>>> ```
>>>> 
>>>> Depending on the data size and cluster resources, the optimizer
>>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be
>>>> done locally and overwrite it to local execution. This might be valid
>>>> and definitely works (in my case) but kind of contradicts what I want
>>>> SystemML to do.
>>>> I wonder if we should disable this optimization in case a concrete
>>>> execution mode is given and go with the mode that is provided.
>>>> 
>>>> Felix
>>>> 
>>>> 
>> 
>> 


Re: [DRAFT] November monthly report

2016-11-01 Thread dusenberrymw
Looks good. We should also include the VLDB paper award. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Nov 1, 2016, at 4:43 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> Hello,
> 
> Here is a draft of the November monthly report due tomorrow that Felix and
> I put together. Feedback is welcome.
> 
> Deron
> 
> 
> 
> SystemML
> 
> SystemML provides declarative large-scale machine learning (ML) that aims at
> flexible specification of ML algorithms and automatic generation of hybrid
> runtime plans ranging from single node, in-memory computations, to
> distributed
> computations running on Apache Hadoop MapReduce and Apache Spark.
> 
> SystemML has been incubating since 2015-11-02.
> 
> Three most important issues to address in the move towards graduation:
> 
> - Grow SystemML community: increase mailing list activity,
>   increase adoption of SystemML for scalable machine learning, encourage
>   data scientists to adopt DML and PyDML algorithm scripts, respond to
>   user feedback to ensure SystemML meets the requirements of real-world
>   situations, write papers, and present talks about SystemML.
> - Continue to produce releases.
> - Increase the diversity of our project's contributors and committers.
> 
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware
> of?
> 
> NONE.
> 
> How has the community developed since the last report?
> Our mailing list from August through October had 375 messages on a wide
> range
> of topics. We have gained 4 new contributors to the main project since
> August
> 1st. Our website has been redesigned with the help of several design
> engineers
> and we have commits from 3 new contributors to the website project. On
> GitHub,
> the project has been starred 417 times and forked 156 times.
> 
> Niketan Pansare gave a talk with the title "Apache SystemML - Declarative
> Machine Learning at Scale" on October 7th in the CS graduate seminar at UC
> Merced. Matthias Boehm gave a talk on "Compressed Linear Algebra for Large-
> Scale Machine Learning" at TU Dresden on August 30th. We presented the
> papers
> "Compressed Linear Algebra for Large-Scale Machine Learning" (research
> paper +
> poster) and "SystemML: Declarative Machine Learning on Spark" (industry
> paper)
> at VLDB'16, gave two 90 minute tutorials at the BOSS'16 workshop,
> co-located
> with VLDB'16, and our paper "SPOOF: Sum-Product Optimization and Operator
> Fusion for Large- Scale Machine Learning" has been accepted at CIDR'17.
> 
> How has the project developed since the last report?
> The main project has had 213 commits since August 1. The website project
> has
> had 51 commits since August 1. Since August 1, 241 issues have been
> reported
> on our JIRA site and 137 issues have been resolved or closed. 79 pull
> requests
> have been created since August 1, and 72 pull requests have been closed.
> 
> Date of last release:
> 
> 2016-06-15 (version 0.10.0-incubating)
> 
> When were the last committers or PMC members elected?
> 
> 2016-05-07 Glenn Weidner
> 2016-05-07 Faraz Makari Manshadi
> 
> 


Re: rc3 source-release.zip artifact

2016-10-31 Thread dusenberrymw
Should we include a README with the release artifacts that describes what each 
one is?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 29, 2016, at 10:55 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
> 
> In my opinion it can be removed.
> 
> Thanks,
> Glenn
> 
> Deron Eriksson ---10/20/2016 01:36:04 PM---The 0.11.0 rc3 artifacts are 
> located at: https://dist.apache.org/repos/dist/dev/incubator/systemml/0
> 
> From: Deron Eriksson <deroneriks...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/20/2016 01:36 PM
> Subject: rc3 source-release.zip artifact
> 
> 
> 
> 
> The 0.11.0 rc3 artifacts are located at:
> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.11.0-incubating-rc3/
> 
> I see the following artifact:
> systemml-0.11.0-incubating-source-release.zip
> 
> I do not recognize this artifact. Can anyone tell me what this artifact is?
> Can it be removed?
> 
> Deron
> 
> 
> 


Re: [DISCUSS] Adding tensorboard-like functionality to SystemML

2016-10-28 Thread dusenberrymw
Visualization is a good topic to bring up for the project. I would like to add 
another possible option of using TensorBoard directly. I have not looked into 
the file format used for TensorBoard, but it may be possible to simple adopt 
that format, and simply write our stats to that type of file. That would allow 
us to reuse that project without having to write our own. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 28, 2016, at 8:13 AM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> Hi Matthias,
> 
> Thanks for your feedback.
> 
> There is a tradeoff between keeping a feature in-house until it is stable, 
> v/s continually getting community feedback as the work is getting done via PR 
> and discussions. I am for the latter as it encourages community feedback as 
> well as participation.
> 
> I agree that our goal should be to complete the features you mentioned asap 
> and yes, we are working hard towards making the GPU backend, the deep 
> learning built-in functions and the algorithm wrappers (ones that are already 
> added) to be 'non-experimental' in the 1.0 release :) ... Also, like you 
> hinted, it is important to explicitly mark the experimental features in the 
> documentation to avoid the 'bad impression'. The Python DSL will remain 
> experimental until there is more interest from the community. I am fine with 
> deleting the debugger since it is rarely used, if at all.
> 
> Keeping inline with the Apache guidelines, this discussion is to allow 
> community to decide on whether SystemML community should consider adding new 
> visualization functionality (since this feature is user facing). If there is 
> no interest, we can either postpone or discard this discussion :)
> 
> Thanks,
> 
> Niketan.
> 
>> On Oct 28, 2016, at 1:24 AM, Matthias Boehm <mboe...@googlemail.com> wrote:
>> 
>> Thanks for putting this together Niketan. However, could we please 
>> postpone this discussion after our 1.0 release? Right now, I'm concerned 
>> to see that we're adding many experimental features without really 
>> getting them done. This includes for example, the GPU backend, the new 
>> MLContext API, the Python DSL, the deep learning builtin functions, the 
>> Scala algorithm wrappers, the old Spark debugger interface, and 
>> compressed linear algebra. I think we should finish these features first 
>> before moving on. If we're not careful about that, it would quickly 
>> create a very bad impression for new users.
>> 
>> Regards,
>> Matthias
>> 
>>> On 10/28/2016 1:20 AM, Niketan Pansare wrote:
>>> 
>>> 
>>> Hi all,
>>> 
>>> To give every context, I am working on a new deep learning API for SystemML
>>> that is backed by the NN library (
>>> https://github.com/apache/incubator-systemml/tree/master/scripts/staging/SystemML-NN/nn
>>> ). This API allows the users to express their model using Caffe
>>> specification and perform fit/predict similar to scikit-learn APIs. I have
>>> created a sample notebook explaining the usage of the API:
>>> https://github.com/niketanpansare/incubator-systemml/blob/1b655ebeec6cdffd66b282eadc4810ecfd39e4f2/samples/jupyter-notebooks/Barista-API-Demo.ipynb
>>> . This API also allows the user to load and store pre-trained models. See
>>> https://github.com/niketanpansare/model_zoo/tree/master/caffe/vision/vgg/ilsvrc12
>>> 
>>> As part of this API, I added a mini-tensorboard like functionality (see
>>> step 6 and 7) using matplotlib. If there is enough interest, we can extend
>>> and standardize the visualization functionality across all over algorithms.
>>> Here are some initial discussion points:
>>> 1. Primary visualization mechanism (Jupyter or a standalone app or both =>
>>> former is useful for cloud offering such as DSX and latter provides the
>>> design team more creative control)
>>> 2. What to plot for each algorithm (data scientists and algorithms
>>> developers will help us here).
>>> 3. Standardize UI (if we decide to go with Jupyter, we need to extend the
>>> code in _visualize method:
>>> https://github.com/niketanpansare/incubator-systemml/blob/1b655ebeec6cdffd66b282eadc4810ecfd39e4f2/src/main/python/systemml/mllearn/estimators.py#L621
>>> )
>>> 4. Primary APIs to target (python, scala, command-line or all)
>>> 
>>> Thanks,
>>> 
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>>> 
>> 
> 


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)

2016-10-28 Thread dusenberrymw
+1

I've been running large scale use-case tests and things ran well on rc3 and on 
this rc4. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 28, 2016, at 4:08 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> Off course, my +1.
> 
> On Mon, Oct 24, 2016 at 4:11 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
> 
>> Please vote on releasing the following candidate as Apache SystemML
>> version 0.11.0-incubating !
>> 
>> The vote is open for at least 72 hours and passes if a majority of at
>> least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache SystemML 0.11.0-incubating
>> [ ] -1 Do not release this package because ...
>> 
>> To learn more about Apache SystemML, please see
>> http://systemml.apache.org/
>> 
>> The tag to be voted on is v0.11.0-incubating-rc4 (
>> 6937683b01a13458990e698b0cf04f4f6ccecde3)
>> 
>> https://github.com/apache/incubator-systemml/tree/
>> 6937683b01a13458990e698b0cf04f4f6ccecde3
>> 
>> The release artifacts can be found at :
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.
>> 11.0-incubating-rc4/
>> 
>> The maven release artifacts, including signatures, digests, etc. can be
>> found at:
>> 
>> https://repository.apache.org/content/repositories/orgapachesystemml-1010/
>> 
>> 
>> =
>> == Apache Incubator release policy ==
>> =
>> Please find below the guide to release management during incubation:
>> http://incubator.apache.org/guides/releasemanagement.html
>> 
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a SystemML user, you can help us test this release by taking an
>> existing Algorithm or workload and running on this release candidate, then
>> reporting any regressions.
>> 
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> -1 votes should only occur for significant stop-ship bugs or legal
>> related issues (e.g. wrong license, missing header files, etc). Minor bugs
>> or regressions should not block this release.
>> 
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>> 
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: Couple of questions on website contents

2016-10-26 Thread dusenberrymw
Overall, the new website looks awesome! Good job everyone!

One concerning issue though is that the site is currently fairly broken on 
mobile.  Can we update the site so that it renders properly on both desktop and 
mobile?

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 26, 2016, at 12:39 AM, Dexter Lesaca <dexter.les...@gmail.com> wrote:
> 
> The site looks really good everyone! Thank to everyone for churning out
> this awesome! Thanks for your diligence during design and development
> process!
> 
> Luciano, your fix is just fine for now until a more encompassing solution
> is designed which we should arrive at soon.
> 
> 
> 
> On Wed, Oct 26, 2016 at 9:20 AM Luciano Resende <luckbr1...@gmail.com>
> wrote:
> 
> I made that change, as I think we need to ve able to list all available
> mailing lists, but I didn't want to use the obsolete docs page.
> 
> Thinking more about this, maybe a meet in the middle approach is to use the
> full details on the community page, and revert the front page to focus on
> the dev list ?
> 
> Tgoughts ?
> 
>> On Wednesday, October 26, 2016, Jason Azares <jason.aza...@gmail.com> wrote:
>> 
>> Hi Deron,
>> 
>> Thanks for publishing the updates, and the site looks great! One thing I
>> noticed is that the "Subscribe to Our Mailing Lists" section does not
>> reflect what the design team originally had. I'm not sure if you were
> aware
>> of this discrepancy.
>> 
>> [image: Inline image 1]
>> 
>> On Tue, Oct 25, 2016 at 9:53 PM, Deron Eriksson <deroneriks...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','deroneriks...@gmail.com');>> wrote:
>> 
>>> Hi Luciano,
>>> 
>>> Since the current website updates are major improvements, I have gone
>>> ahead
>>> and published the new updates. I think we can now start publishing more
>>> frequently since important parts of the codebase have stabilized.
>>> 
>>> Deron
>>> 
>>> 
>>> On Tue, Oct 25, 2016 at 5:40 PM, Deron Eriksson <deroneriks...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','deroneriks...@gmail.com');>>
>>> wrote:
>>> 
>>>> Hi Luciano,
>>>> 
>>>> Several updates to the website were merged today. I think we're at the
>>>> point where we can publish the new website updates. Do you agree?
>>>> 
>>>> Deron
>>>> 
>>>> 
>>>> On Tue, Oct 25, 2016 at 11:02 AM, Jason Azares <jason.aza...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','jason.aza...@gmail.com');>>
>>>> wrote:
>>>> 
>>>>> Hi Luciano,
>>>>> 
>>>>> Initial page:
>>>>>> - What's the intention of the section just above the social banner
>>> ? I
>>>>>> noticed it was actually a copy of a section from the community page,
>>>>> but it
>>>>>> looks like the content was duplicated and not extracted to a banner,
>>>>> and I
>>>>>> have changed the one in community to what I think it better
> clarifies
>>>>> the
>>>>>> mailing list, but I am not sure if that's the same intent of the
>>> banner
>>>>> on
>>>>>> the initial page.
>>>>> 
>>>>> 
>>>>> Thanks for bringing this point to our attention. The content on the
>>>>> initial
>>>>> page is different from the community page. We wanted to have a call to
>>>>> action to get users to subscribe to the mailing list. We are currently
>>>>> designing this section and will send a pull request once completed.
>>>>> 
>>>>> Navigation Menu:
>>>>>> - The community navigation seems to have gone wild with a few
>>>>> duplications.
>>>>>> We have source code and github links, which are both the same. We
>>> also
>>>>> have
>>>>>> the community get involved link that includes a list of committers
>>> using
>>>>>> the new design format, but there is also a link to project
> committers
>>>>> that
>>>>>> include the old page listing all committers.
>>>>> 
>>>>> 
>>>>> Dexter is currently working to resolve this issue. He will send his
>>>>> updates
>>>>> once they are finished.
>>>>> 

Re: SystemML Medium Blog

2016-10-25 Thread dusenberrymw
+1 This sounds like a great idea! It would be nice to include blogs with 
tutorials, fun quick tips and tricks, full case studies, example use cases, 
etc. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 25, 2016, at 1:43 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> On Tue, Oct 25, 2016 at 7:32 PM, Madison Myers <madisonjmy...@gmail.com>
> wrote:
> 
>> Hey everyone,
>> 
>> Just a thought on expanding visibility of SystemML: I know lots of us have
>> written some blogs and articles on SystemML and I think it would be great
>> to get these all in the same spot (and also write more)! I've started a
>> SystemML Medium blog for this and would love to:
>> 
>> 1. republish existing blogs on Medium
>> 2. have volunteers write new blogs
>> 
>> The idea would be to have these be linked directly from the website. If you
>> wouldn't mind, I'd love your feedback! If you're up for me republishing
>> articles that you've already written on the SystemML medium account (the
>> author's name will still be yours), please let me know! Also, if you have
>> ideas on topics that the SystemML community should be writing on and/or are
>> up for writing an article or two, let me know as well!
>> 
>> Luciano, do you see any issues from an Apache standpoint?
>> 
>> Thanks!
>> Madison
> 
> 
> +1, just make sure "republish" are done by the blog authors or with their
> explicit permission archived on this mailing list.
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: [VOTE] SystemML New Logo Ideas

2016-10-25 Thread dusenberrymw
+1 that sounds great to me. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 25, 2016, at 10:45 AM, Madison Myers <madisonjmy...@gmail.com> wrote:
> 
> I agree!
> +1 to using both. I think, like you suggested, that using #1 for headers
> and #4 for other uses sounds fantastic.
> 
> On Tue, Oct 25, 2016 at 10:36 AM, Jason Azares <jason.aza...@gmail.com>
> wrote:
> 
>> Hey guys,
>> 
>> Branding wise, we also feel that #1 and #4 are the best choices. It's great
>> that we're all on the same page. To answer the question of pros and cons of
>> each logo, here is a quick list:
>> 
>> Logo 1:
>> 
>> 
>>   - More versatile because of its scalability; We think logo 4 will be
>>  hard to discern once sized down; Logo 1 looks cleaner in website
>> headers
>>  with text
>>  - Relevant because it has a matrix bracket
>>  - It's a simplified version of the robot. Think of it as the batman
>>  signal and the robot is batman.
>> 
>> Logo 4:
>> 
>> 
>>   - More original because it has a personality
>>  - Diverse in the actions it can perform because it can move, animate,
>>  and be customized based on intent and use
>>  - The robot is kind of cute and approachable
>> 
>> Our suggestion is to use both. Logo 1 is the simplified version of the
>> robot. Logo 4 is the personification of the logo used to explain concepts.
>> 
>> We'd love to hear your thoughts!
>> 
>> Regards,
>> Jason and the design team
>> 
>> P.S. In general, here are our guidelines for creating a great logo:
>> 
>>   - *original* - something that stands out from competitors
>>   - *relevant* - reflects the brand's mission and values
>>   - *versatile* - look good in black and white, in different colors and
>>   sizes depending on context (e.g. billboards, websites, t-shirts, toys,
>>   business cards, etc)
>>   - *memorable* - easily recognizable everywhere (e.g. mickey mouse, nike)
>>   - *timeless* - not just based on what's currently popular
>> 
>> 
>> 
>>> On Tue, Oct 25, 2016 at 9:47 AM, <dusenberr...@gmail.com> wrote:
>>> 
>>> Looks like there is a large amount of support for both #1 and #4. Design
>>> team, could you provide some more thoughts on the pros and cons for each,
>>> and perhaps any thoughts on ways the icons could be used in various
>> project
>>> materials?
>>> 
>>> --
>>> 
>>> Mike Dusenberry
>>> GitHub: github.com/dusenberrymw
>>> LinkedIn: linkedin.com/in/mikedusenberry
>>> 
>>> Sent from my iPhone.
>>> 
>>> 
>>>> On Oct 25, 2016, at 9:41 AM, Acs S <ac...@yahoo.com.INVALID> wrote:
>>>> 
>>>> I like #4 as well.
>>>> +1 on #4.
>>>> 
>>>> -Arvind
>>>> 
>>>> From: Berthold Reinwald <reinw...@us.ibm.com>
>>>> To: dev@systemml.incubator.apache.org
>>>> Sent: Monday, October 24, 2016 12:34 AM
>>>> Subject: Re: [VOTE] SystemML New Logo Ideas
>>>> 
>>>> +1 on #4.
>>>> 
>>>> Regards,
>>>> Berthold Reinwald
>>>> IBM Almaden Research Center
>>>> office: (408) 927 2208; T/L: 457 2208
>>>> e-mail: reinw...@us.ibm.com
>>>> 
>>>> 
>>>> 
>>>> From:  Luciano Resende <luckbr1...@gmail.com>
>>>> To:dev@systemml.incubator.apache.org
>>>> Date:  10/21/2016 04:37 PM
>>>> Subject:Re: [VOTE] SystemML New Logo Ideas
>>>> 
>>>> 
>>>> 
>>>> On Fri, Oct 21, 2016 at 11:27 AM, Frederick R Reiss <
>> frre...@us.ibm.com>
>>>> wrote:
>>>> 
>>>>> These are awesome! I'm more a fan of option #4 myself.
>>>>> 
>>>>> 
>>>> I like option $4 myself as well.
>>>> 
>>>> 
>>>> --
>>>> Luciano Resende
>>>> http://twitter.com/lresende1975
>>>> http://lresende.blogspot.com/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> *Madison J. Myers*
> *UC Berkeley, Master of Information & Data Science '17*
> 
> *King's College London, MA Political Science '14*
> *New York University, BA Political Science '12*
> 
>   -
>  LinkedIn <http://linkedin.com/in/madisonjmyers>


Re: [VOTE] SystemML New Logo Ideas

2016-10-25 Thread dusenberrymw
Looks like there is a large amount of support for both #1 and #4. Design team, 
could you provide some more thoughts on the pros and cons for each, and perhaps 
any thoughts on ways the icons could be used in various project materials?

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 25, 2016, at 9:41 AM, Acs S <ac...@yahoo.com.INVALID> wrote:
> 
> I like #4 as well.
> +1 on #4.
> 
> -Arvind
> 
>  From: Berthold Reinwald <reinw...@us.ibm.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Monday, October 24, 2016 12:34 AM
> Subject: Re: [VOTE] SystemML New Logo Ideas
> 
> +1 on #4.
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:  Luciano Resende <luckbr1...@gmail.com>
> To:dev@systemml.incubator.apache.org
> Date:  10/21/2016 04:37 PM
> Subject:Re: [VOTE] SystemML New Logo Ideas
> 
> 
> 
> On Fri, Oct 21, 2016 at 11:27 AM, Frederick R Reiss <frre...@us.ibm.com>
> wrote:
> 
>> These are awesome! I'm more a fan of option #4 myself.
>> 
>> 
> I like option $4 myself as well.
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> 
> 
> 
> 
> 


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)

2016-10-22 Thread dusenberrymw
+1

I finished running some test jobs in my large scale scenario on this release 
candidate, and I think it is good to go.  Specifically, my scenario involved 
large numerical DataFrames, MLContext, matrices, DML, and multiple script 
invocations involving the various intermediate outputs.

One option would be to release this candidate as 0.11, and then follow up with 
a 0.11.1 release containing any bug fixes. This might make sense for edge-case 
bugs that don't impact normal usage.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 22, 2016, at 2:14 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote:
> 
> -1.
> 
> Transformencode throws an unnecessary error if strings to not comply with 
> the field requirements specified in RFC 4180. Arvind has a fix on the way 
> which should be included in the release. 
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   dusenberr...@gmail.com
> To: dev@systemml.incubator.apache.org
> Date:   10/21/2016 11:25 AM
> Subject:Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
> 
> 
> 
> Okay I found out that the error I was encountering occurred due to passing 
> in a DataFrame with an explicit row index column ("__INDEX") that contain 
> incorrect row indices. Basically, I had taken a large DataFrame with the 
> row index column  and sampled from it, without updating the row indices. 
> Thus, I was effectively left with sparse row indices -- i.e. I may have 
> had rows 2, 18, 587, 398678, etc. The current DataFrame conversion code 
> appears to not yet be able to handle sparse row indices and thus threw an 
> exception. When I correctly re-indexed the sampled DataFrame with dense 
> row indices, everything worked as expected. Of course, our conversion code 
> automatically adds row indices to a given DataFrame during conversion if 
> the user does not supply them explicitly. However, it can save a bit of 
> time on repeated usage if it is done explicitly in one prior batch job.
> 
> I don't think this should block this release, and we should instead think 
> about this for the next release.  I've created SYSTEMML-1053 to track this 
> issue.
> 
> I'm running a few more tests, and then I'll respond today with a vote. 
> 
> -Mike
> 
> --
> 
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
> 
> Sent from my iPhone.
> 
> 
>> On Oct 20, 2016, at 10:48 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
>> 
>> Similar release-process steps executed successfully on Windows.
>> 
>> Performance test suite for large data still running; reviewing of 
> available log files in-progress.
>> 
>> Thanks,
>> Glenn
>> 
>> Nakul Jindal ---10/20/2016 02:43:06 PM---Basic sanity tests pasts on 
> MacOS following the process here: http://apache.github.io/incubator-syst
>> 
>> From: Nakul Jindal <naku...@gmail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 10/20/2016 02:43 PM
>> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
>> 
>> 
>> 
>> 
>> Basic sanity tests pasts on MacOS following the process here:
>> 
> http://apache.github.io/incubator-systemml/release-process.html#all-binaries-execute
> 
>> 
>> (The in-memory jar was removed by [SYSTEMML-741])
>> 
>> +1
>> 
>> Nakul Jindal
>> 
>> 
>>> On Thu, Oct 20, 2016 at 12:18 PM, <dusenberr...@gmail.com> wrote:
>>> 
>>> Okay I've been testing the release candidate on a large-scale problem, 
> and
>>> I'm currently running into a "java.lang.NegativeArraySizeException" in
>>> the SparseBlockMCSR that I do not believe was present previously. I'm
>>> currently investigating, and will post again soon.
>>> 
>>> On another note, I successfully ran all of the Python tests on both 
> Python
>>> 2.7 and 3.5.
>>> 
>>> -Mike
>>> 
>>> --
>>> 
>>> Mike Dusenberry
>>> GitHub: github.com/dusenberrymw
>>> LinkedIn: linkedin.com/in/mikedusenberry
>>> 
>>> Sent from my iPhone.
>>> 
>>> 
>>>> On Oct 19, 2016, at 2:46 PM, Glenn Weidner <gweid...@us.ibm.com> 
> wrote:
>>>> 
>>>> Yes - that is correct for test cases involving ID column for
>>> DataFrameVectorFrameConversionTest, DataFrameVectorScriptTest,
>>> MLContextTest. The four failures for MLContextFrameTest were slightly
>>

Re: [VOTE] SystemML New Logo Ideas

2016-10-21 Thread dusenberrymw
I like all of these options!  I'll give a +1 for #1 as the main logo, and I 
also think it would be great to make use of the rest of the designs throughout 
the website and other project materials.

Thanks!!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 21, 2016, at 1:01 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> All the logos are awesome, thanks design team !! 
> 
> I vote for #4.
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Deron Eriksson ---10/21/2016 12:57:25 PM---Given the overwhelming support for 
> #1, I give my +1 to #1. Deron
> 
> From: Deron Eriksson <deroneriks...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/21/2016 12:57 PM
> Subject: Re: [VOTE] SystemML New Logo Ideas
> 
> 
> 
> 
> Given the overwhelming support for #1, I give my +1 to #1.
> 
> Deron
> 
> 
> On Fri, Oct 21, 2016 at 12:46 PM, Jason Azares <jason.aza...@gmail.com>
> wrote:
> 
> > I vote for #1
> >
> > On Fri, Oct 21, 2016 at 12:27 PM, Matthias Boehm <mboe...@googlemail.com>
> > wrote:
> >
> > > ha, that's interesting - thanks for the pointer Deron, I wasn't expecting
> > > this at all. Somehow my eyes always ignored this.
> > >
> > > Regards,
> > > Matthias
> > >
> > >
> > > On 10/21/2016 9:22 PM, Deron Eriksson wrote:
> > >
> > >> I think they all look fantastic. My untrained eye likes the features of
> > 3
> > >> and 4 but I completely defer to the judgements of others here since I
> > have
> > >> no training in design and the multitude of considerations involved such
> > as
> > >> scalability.
> > >>
> > >> I believe the logo trademark is an official requirement of the ASF (
> > >> http://www.apache.org/foundation/marks/pmcs.html#graphics), although I
> > >> don't know how strict this is.
> > >>
> > >> Deron
> > >>
> > >>
> > >> On Fri, Oct 21, 2016 at 12:15 PM, Matthias Boehm <
> > mboe...@googlemail.com>
> > >> wrote:
> > >>
> > >> Thanks for these proposals. For all the options, I'd prefer to remove
> > the
> > >>> TM - it's just a little odd for an open source project with no
> > intentions
> > >>> to register a trademark. I know, the new Spark logo has it too but it's
> > >>> probably a different context, especially since there are discussions to
> > >>> add
> > >>> SPARC support in Spark 2.1 ;-)
> > >>>
> > >>> Regards,
> > >>> Matthias
> > >>>
> > >>>
> > >>> On 10/21/2016 8:47 PM, Dexter Lesaca wrote:
> > >>>
> > >>> +1 for 1
> > >>>>
> > >>>> On Fri, Oct 21, 2016 at 11:44 AM Jeremy Anderson <
> > >>>> jer...@objectadjective.com>
> > >>>> wrote:
> > >>>>
> > >>>> +1 on option 1 as well.
> > >>>>
> > >>>>>
> > >>>>> For the 4 options, I think it's important that full logo with name
> > and
> > >>>>> mark, scales well. I'm concerned detail will get lost with the other
> > 3,
> > >>>>> at
> > >>>>> small sizes. I would love to use all of the simple and isometric
> > >>>>> versions.
> > >>>>> They make a great family.
> > >>>>>
> > >>>>> ...
> > >>>>>
> > >>>>> Jeremy Anderson
> > >>>>> https://twitter.com/ObjectAdjective
> > >>>>> http://www.linkedin.com/in/objectadjective
> > >>>>>
> > >>>>> On 21 October 2016 at 11:27, Frederick R Reiss <frre...@us.ibm.com>
> > >>>>> wrote:
> > >>>>>
> > >>>>> These are awesome! I'm more a fan of option #4 myself.
> > >>>>>
> > >>>>>>
> > >>>>>> Fred
> > >>>>>>
> > >>>>>> [image: Inactive hide details for Renee Mascarinas ---10/21/2016
> > >>>>>> 11:19:01
> > >>>>>&g

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)

2016-10-20 Thread dusenberrymw
Okay I've been testing the release candidate on a large-scale problem, and I'm 
currently running into a "java.lang.NegativeArraySizeException" in the 
SparseBlockMCSR that I do not believe was present previously. I'm currently 
investigating, and will post again soon.

On another note, I successfully ran all of the Python tests on both Python 2.7 
and 3.5.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 19, 2016, at 2:46 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
> 
> Yes - that is correct for test cases involving ID column for 
> DataFrameVectorFrameConversionTest, DataFrameVectorScriptTest, MLContextTest. 
> The four failures for MLContextFrameTest were slightly different and involve 
> similar fix as done for FrameConverterTest under [SYSTEMML-568] where 
> FrameRDDConverterUtils.csvToRowRDDused to incorporate schema information when 
> converting to JavaRDD.
> 
> Thanks,
> Glenn
> 
> Matthias Boehm ---10/19/2016 12:36:04 PM---Glenn, all these issues were only 
> caused by wrong tests that used an invalid ID schema or populated
> 
> From: Matthias Boehm <mboe...@googlemail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/19/2016 12:36 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
> 
> 
> 
> 
> Glenn, all these issues were only caused by wrong tests that used an 
> invalid ID schema or populated this column incorrectly, right? If so, 
> then I think it's fine to release. However, if we touch it anyway, we 
> should globally change the ID schema from double to long, which is more 
> intuitive when created by hand.
> 
> Regards,
> Matthias
> 
> On 10/19/2016 8:30 PM, Deron Eriksson wrote:
> > OK, so I think it's my understanding that for the 'src' release for rc3,
> > the pom is using Spark 1.4 and the test suite passes for Spark 1.4, so this
> > issue being discussed regarding test cases on Spark 1.6 is not a blocker
> > for this release since the 'src' release builds and all tests pass.
> >
> > If this is not correct, could someone please correct me?
> >
> > Deron
> >
> >
> > On Wed, Oct 19, 2016 at 11:17 AM, Luciano Resende <luckbr1...@gmail.com>
> > wrote:
> >
> >> if tests are consistently failing, then we should cancel the RC and either
> >> fix the test or mark it as @ignored.
> >>
> >> Intermittent fails might be ok, but it's a community decision.
> >>
> >> On Wed, Oct 19, 2016 at 10:50 AM, Deron Eriksson <deroneriks...@gmail.com>
> >> wrote:
> >>
> >>> I believe that for an Apache release, our test suite is supposed to pass
> >>> (although I'm pretty sure random test fails can be ignored).
> >>>
> >>> See 2.1 of Release Check List here:
> >>> http://incubator.apache.org/guides/releasemanagement.html#check-list
> >>>
> >>> "2.1 Build is successful including automated tests.
> >>> The expanded source archive is expected to build and pass tests."
> >>>
> >>> Luciano, do you happen to know if some test failures are acceptable since
> >>> our test suite is so enormous (6300+ tests)?
> >>>
> >>> Deron
> >>>
> >>>
> >>>
> >>> On Wed, Oct 19, 2016 at 3:24 AM, Glenn Weidner <gweid...@us.ibm.com>
> >>> wrote:
> >>>
> >>>> It's a nice-to-have but not a release blocker.
> >>>>
> >>>> Thanks,
> >>>> Glenn
> >>>>
> >>>> [image: Inactive hide details for Niketan Pansare---10/18/2016 05:38:26
> >>>> PM---Glenn: Would you prefer to have https://github.com/apache/]
> >> Niketan
> >>>> Pansare---10/18/2016 05:38:26 PM---Glenn: Would you prefer to have
> >>>> https://github.com/apache/incubator-systemml/pull/269 in 0.11 releas
> >>>>
> >>>> From: Niketan Pansare/Almaden/IBM@IBMUS
> >>>> To: dev@systemml.incubator.apache.org
> >>>> Date: 10/18/2016 05:38 PM
> >>>> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
> >>>> --
> >>>>
> >>>>
> >>>>
> >>>> Glenn: Would you prefer to have
> >>>> *https://github.com/apache/incubator-systemml/pull/269*
> >>>> <https://github.com/apache/incubator-systemml/pull/269> in 0.11
> >> release
> >>> ?
> >>>>
> >>>> Thanks,
>

Re: UX Research

2016-10-18 Thread dusenberrymw
This is awesome!  I really like the storyboards as they describe the types of 
scenarios in which SystemML would be really useful.  We should continue to work 
on making sure all of these are successful stories for the project.

The website layout analysis is great too -- we definitely need to get to the 
point where new users understand the project as quickly as possible.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 18, 2016, at 11:10 AM, Madison Myers <madisonjmy...@gmail.com> wrote:
> 
> +1 Luciano
> 
> On Tue, Oct 18, 2016 at 8:52 AM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
> 
>> Great guys !!! I think having the UX roadmap published is fine, and we
>> could just create a roadmap page where we have sections for development and
>> ux and fixing SYSTEMML-972 (particularly SYSTEMML-974) will make it much
>> simpler to add more contents to the website.
>> 
>> 
>> https://issues.apache.org/jira/browse/SYSTEMML-972
>> https://issues.apache.org/jira/browse/SYSTEMML-974
>> 
>> On Mon, Oct 17, 2016 at 5:42 PM, Jeremy Anderson <
>> jer...@objectadjective.com
>>> wrote:
>> 
>>> Thanks Madison and Felix. To your point Felix, I think you're on the
>> nose.
>>> I am hoping actionable items for both design and dev will emerge from
>> user
>>> research. Ideally, I'd love to see a clear direction and roadmap for the
>>> future of SystemML begin to take shape. This thread is a great start, but
>>> it might also be helpful to start a UX roadmap wiki page. Who can I reach
>>> out to for access to publish to the wiki?
>>> 
>>> Jeremy
>>> 
>>> ...
>>> 
>>> Jeremy Anderson
>>> https://twitter.com/ObjectAdjective
>>> http://www.linkedin.com/in/objectadjective
>>> 
>>>> On 17 October 2016 at 16:33, <fschue...@posteo.de> wrote:
>>>> 
>>>> Jeremy and others, thanks for the detailed presentation!
>>>> The storyboards look great and it would be nice to see SystemML getting
>>> to
>>>> a point where those scenarios just work!
>>>> 
>>>> From the point of development I wonder how much is adding new features
>>>> (that enhance user experience) versus making it more stable/reliable
>> and
>>>> compiling/editing resources and documentation.
>>>> It seems to me that what's currently missing are the easy entry points
>>>> both in documentation and user interfaces (API's, notebooks, quickstart
>>>> guides, ...) that are so perfectly depicted in those storyboards.
>>>> 
>>>> I hope to see an outcome of actionable items for developers from this
>> UX
>>>> research that we can manifest in concrete Jiras to work on.
>>>> 
>>>> Felix
>>>> 
>>>> 
>>>> 
>>>> 
>>>> Am 18.10.2016 00:39 schrieb Jeremy Anderson:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I began working with a few designers on UX research for SystemML. We
>>>>> synthesized some of our early findings to share with everyone. From
>> some
>>>>> of
>>>>> the pain points that surfaced in our research, we began storyboarding
>>> user
>>>>> scenarios and look for ways we might be able to improve user
>> experience
>>>>> and
>>>>> increase adoption. I wanted to start a discussion around this UX and
>>>>> research. Here's a link to the research we've synthesized so. I'd love
>>>>> input/thoughts from everyone.
>>>>> 
>>>>> https://drive.google.com/file/d/0B2__Aw0kKn-uTWJ4S0ZvcHhhTE0/view
>>>>> 
>>>>> Cheers,
>>>>> 
>>>>> Jeremy
>>>>> 
>>>>> ...
>>>>> 
>>>>> Jeremy Anderson
>>>>> https://twitter.com/ObjectAdjective
>>>>> http://www.linkedin.com/in/objectadjective
>>>>> 
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>> 
> 
> 
> 
> -- 
> *Madison J. Myers*
> *UC Berkeley, Master of Information & Data Science '17*
> 
> *King's College London, MA Political Science '14*
> *New York University, BA Political Science '12*
> 
>   -
>  LinkedIn <http://linkedin.com/in/madisonjmyers>


Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-05 Thread dusenberrymw
+1 for SYSTEMML-951

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 5, 2016, at 1:17 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> 
> as the Python DSL is still in experimental status, I don't think that 
> SYSTEMML-1013 is blocking the release. However, there is one more 
> nice-to-have performance feature I'd like to include: SYSTEMML-951 (right 
> indexing via lookup). If nobody objects, we could cut tomorrow once 951 is 
> in; if somebody get's a chance to look into 1013 then we could include this 
> too.
> 
> Regards,
> Matthias 
> 
> Acs S ---10/05/2016 12:19:57 PM---Imran has opened Jira 1013.  -Arvind
> 
> From: Acs S <ac...@yahoo.com.INVALID>
> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org>
> Date: 10/05/2016 12:19 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> 
> 
> 
> 
> Imran has opened Jira 1013. 
> -Arvind
> 
>  From: Matthias Boehm <mbo...@us.ibm.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Tuesday, October 4, 2016 5:43 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
>   
> ok, SYSTEMML-1009 has been resolved too. 
> 
> Regards,
> Matthias
> 
> Acs S ---10/04/2016 05:30:52 PM---There is one more issue I am aware of:   
> Imran facing max recursion issue in toNumPyArray().Not sure
> 
> From: Acs S <ac...@yahoo.com.INVALID>
> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org>
> Date: 10/04/2016 05:30 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> 
> 
> 
> There is one more issue I am aware of:   Imran facing max recursion issue in 
> toNumPyArray().Not sure if Imran has opened Jira or not, but we need 
> resolution for it.
> -Arvind
>  From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@systemml.incubator.apache.org 
> Sent: Tuesday, October 4, 2016 5:03 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
>   
> Ok, so looks like we are down to waiting on SYSTEMML-1009.
> 
> On Tue, Oct 4, 2016 at 4:44 PM, <dusenberr...@gmail.com> wrote:
> 
> > The Python test failure issue has been resolved in SYSTEMML-1005. From my
> > end, we are ready to go.
> >
> > -Mike
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On Oct 4, 2016, at 2:02 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> > >
> > > apart from the recently resolved SYSTEMML-1004 and SYSTEMML-1008, there
> > is one more performance fix I'd like to get in: SYSTEMML-1009.
> > >
> > > Regards,
> > > Matthias
> > >
> > > Luciano Resende ---10/04/2016 12:29:12 PM---Mike, are these Python
> > failures still blocking the next RC ? Please let me know, as I am waiting
> > for
> > >
> > > From: Luciano Resende <luckbr1...@gmail.com>
> > > To: dev@systemml.incubator.apache.org
> > > Date: 10/04/2016 12:29 PM
> > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> > >
> > >
> > >
> > >
> > > Mike, are these Python failures still blocking the next RC ? Please let
> > me
> > > know, as I am waiting for the green light to cut the RC2.
> > >
> > > On Mon, Oct 3, 2016 at 9:41 AM, <dusenberr...@gmail.com> wrote:
> > >
> > > > Yeah I can confirm that all of those issues are now resolved, which is
> > > > great!  However, I'm seeing a test failure in the Python mllearn tests
> > > > today that I want to look into before we cut.
> > > >
> > > > -Mike
> > > >
> > > > --
> > > >
> > > > Mike Dusenberry
> > > > GitHub: github.com/dusenberrymw
> > > > LinkedIn: linkedin.com/in/mikedusenberry
> > > >
> > > > Sent from my iPhone.
> > > >
> > > >
> > > > > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com>
> > wrote:
> > > > >
> > > > > yes, I just closed them - I left them open for Mike to confirm, but
> > we
> > > > resolved all known issues yesterday together. We should be good to go.
> > > > >
> > > > > Regards,
> > > > > Matthias
> > > > >
> > > > > Luciano Resende ---10/02/2016 08:30:37 PM---I still see the following
&g

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-04 Thread dusenberrymw
The Python test failure issue has been resolved in SYSTEMML-1005. From my end, 
we are ready to go.

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 4, 2016, at 2:02 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> 
> apart from the recently resolved SYSTEMML-1004 and SYSTEMML-1008, there is 
> one more performance fix I'd like to get in: SYSTEMML-1009.
> 
> Regards,
> Matthias
> 
> Luciano Resende ---10/04/2016 12:29:12 PM---Mike, are these Python failures 
> still blocking the next RC ? Please let me know, as I am waiting for
> 
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/04/2016 12:29 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> 
> 
> 
> 
> Mike, are these Python failures still blocking the next RC ? Please let me
> know, as I am waiting for the green light to cut the RC2.
> 
> On Mon, Oct 3, 2016 at 9:41 AM, <dusenberr...@gmail.com> wrote:
> 
> > Yeah I can confirm that all of those issues are now resolved, which is
> > great!  However, I'm seeing a test failure in the Python mllearn tests
> > today that I want to look into before we cut.
> >
> > -Mike
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> > >
> > > yes, I just closed them - I left them open for Mike to confirm, but we
> > resolved all known issues yesterday together. We should be good to go.
> > >
> > > Regards,
> > > Matthias
> > >
> > > Luciano Resende ---10/02/2016 08:30:37 PM---I still see the following
> > jiras, which were mentioned on this thread, open: https://issues.apache.or
> > >
> > > From: Luciano Resende <luckbr1...@gmail.com>
> > > To: dev@systemml.incubator.apache.org
> > > Date: 10/02/2016 08:30 PM
> > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> > >
> > >
> > >
> > >
> > > I still see the following jiras, which were mentioned on this thread,
> > open:
> > >
> > > https://issues.apache.org/jira/browse/SYSTEMML-993
> > > https://issues.apache.org/jira/browse/SYSTEMML-994
> > > https://issues.apache.org/jira/browse/SYSTEMML-995
> > >
> > > Did folks forgot to clode the jiras ? Or are there things that still need
> > > to be handled here ?
> > >
> > >
> > > On Sat, Oct 1, 2016 at 2:41 PM, Matthias Boehm <mbo...@us.ibm.com>
> > wrote:
> > >
> > > > ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved -
> > > > from my perspective we're ready to cut a new RC.
> > > >
> > > > Regards,
> > > > Matthias
> > > >
> > > > [image: Inactive hide details for Matthias Boehm---09/29/2016 10:44:51
> > > > PM---just a quick update: SYSTEMML-969 has been resolved too.
> > Th]Matthias
> > > > Boehm---09/29/2016 10:44:51 PM---just a quick update: SYSTEMML-969 has
> > been
> > > > resolved too. The open issues are SYSTEMML-993, SYSTEMML-
> > > >
> > > > From: Matthias Boehm/Almaden/IBM@IBMUS
> > > > To: dev@systemml.incubator.apache.org
> > > > Date: 09/29/2016 10:44 PM
> > > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> > > > --
> > > >
> > > >
> > > >
> > > > just a quick update: SYSTEMML-969 has been resolved too. The open
> > issues
> > > > are SYSTEMML-993, SYSTEMML-994, and the new SYSTEMML-995. We should be
> > able
> > > > to resolve them by tomorrow to give everybody a chance of testing a
> > new RC
> > > > over the weekend.
> > > >
> > > > Regards,
> > > > Matthias
> > > >
> > > > Acs S ---09/29/2016 05:31:23 PM---SYSTEMML-964 being addressed (I added
> > > > changes and with UTF support Matthias added he reverted change
> > > >
> > > > From: Acs S <ac...@yahoo.com.INVALID>
> > > > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.
> > apache.org
> > > > >
> > > > Date: 09/29/2016 05:31 PM
> > > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)

2016-10-03 Thread dusenberrymw
Yeah I can confirm that all of those issues are now resolved, which is great!  
However, I'm seeing a test failure in the Python mllearn tests today that I 
want to look into before we cut. 

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> 
> yes, I just closed them - I left them open for Mike to confirm, but we 
> resolved all known issues yesterday together. We should be good to go.
> 
> Regards,
> Matthias 
> 
> Luciano Resende ---10/02/2016 08:30:37 PM---I still see the following jiras, 
> which were mentioned on this thread, open: https://issues.apache.or
> 
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 10/02/2016 08:30 PM
> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> 
> 
> 
> 
> I still see the following jiras, which were mentioned on this thread, open:
> 
> https://issues.apache.org/jira/browse/SYSTEMML-993
> https://issues.apache.org/jira/browse/SYSTEMML-994
> https://issues.apache.org/jira/browse/SYSTEMML-995
> 
> Did folks forgot to clode the jiras ? Or are there things that still need
> to be handled here ?
> 
> 
> On Sat, Oct 1, 2016 at 2:41 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
> 
> > ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved -
> > from my perspective we're ready to cut a new RC.
> >
> > Regards,
> > Matthias
> >
> > [image: Inactive hide details for Matthias Boehm---09/29/2016 10:44:51
> > PM---just a quick update: SYSTEMML-969 has been resolved too. Th]Matthias
> > Boehm---09/29/2016 10:44:51 PM---just a quick update: SYSTEMML-969 has been
> > resolved too. The open issues are SYSTEMML-993, SYSTEMML-
> >
> > From: Matthias Boehm/Almaden/IBM@IBMUS
> > To: dev@systemml.incubator.apache.org
> > Date: 09/29/2016 10:44 PM
> > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> > --
> >
> >
> >
> > just a quick update: SYSTEMML-969 has been resolved too. The open issues
> > are SYSTEMML-993, SYSTEMML-994, and the new SYSTEMML-995. We should be able
> > to resolve them by tomorrow to give everybody a chance of testing a new RC
> > over the weekend.
> >
> > Regards,
> > Matthias
> >
> > Acs S ---09/29/2016 05:31:23 PM---SYSTEMML-964 being addressed (I added
> > changes and with UTF support Matthias added he reverted change
> >
> > From: Acs S <ac...@yahoo.com.INVALID>
> > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org
> > >
> > Date: 09/29/2016 05:31 PM
> > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> > --
> >
> >
> >
> > SYSTEMML-964 being addressed (I added changes and with UTF support
> > Matthias added he reverted changes)
> >
> > -Arvind
> >
> > From: "dusenberr...@gmail.com" <dusenberr...@gmail.com>
> > To: dev@systemml.incubator.apache.org
> > Sent: Thursday, September 29, 2016 2:31 PM
> > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
> >
> > I've also opened SYSTEMML-993 that relates to poor performance for vector
> > DataFrame conversions, as well as SYSTEMML-994 for GC OOM on SystemML
> > matrix to frame conversions that would both be good to work on.
> >
> > --
> >
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone.
> >
> >
> > > On Sep 29, 2016, at 12:32 PM, Luciano Resende <luckbr1...@gmail.com>
> > wrote:
> > >
> > >> On Thu, Sep 29, 2016 at 11:11 AM, Matthias Boehm <mbo...@us.ibm.com>
> > wrote:
> > >>
> > >> SYSTEMML-968 has been resolved too but we're still waiting for
> > >> SYSTEMML-964. Furthermore, there is also a nice-to-have feature we want
> > to
> > >> get it in: SYSTEMML-969 (extended dataframe - frame converter).
> > >>
> > >> Regards,
> > >> Matthias
> > > Great progress !!!
> > >
> > > Matthias, please let us know when these issues get resolved and I will
> > work
> > > on RC2.
> > >
> > > --
> > > Luciano Resende
> > > *http://twitter.com/lresende1975* <http://twitter.com/lresende1975>
> > > *http://lresende.blogspot.com/* <http://lresende.blogspot.com/>
> >
> >
> >
> >
> >
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> 
> 


Re: [DISCUSS] Apache SystemML Release 1.0.0

2016-08-24 Thread dusenberrymw
Yes I'm also in favor of moving to a 1.0 version for our upcoming release 
targeting the Spark 1.x series. Since we'll also be subsequently releasing a 
version targeting the Spark 2.x series, I would also like to suggest that we 
name that version 2.0. This version naming scheme would allow us to easily 
associate a SystemML version with the Spark series that it targets, thus 
reducing confusion for a user. Rather than view a 2.0 version as a successor to 
1.0, let's view it instead as simply a naming scheme that corresponds to the 
targeted version of Spark.

So, 1.0 would be our upcoming release targeting Spark 1.x, and 2.0 would be our 
upcoming release targeting Spark 2.x. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Aug 24, 2016, at 4:53 PM, Frederick R Reiss <frre...@us.ibm.com> wrote:
> 
> I would favor declaring a 1.0 release. Having two digits in the minor release 
> is a bit awkward, and the project has progressed enough in terms of 
> functionality and stability to warrant a major release number bump.
> 
> Fred
> 
> Luciano Resende ---08/24/2016 11:19:20 AM---With the decision to have sort of 
> two code streams, one to support 1.0x and another to support 2.x,
> 
> From: Luciano Resende <luckbr1...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 08/24/2016 11:19 AM
> Subject: [DISCUSS] Apache SystemML Release 1.0.0
> 
> 
> 
> 
> With the decision to have sort of two code streams, one to support 1.0x and
> another to support 2.x, I was wondering that we should call the next 1.x
> release our SystemML 1.0.0 release.
> 
> Thoughts ?
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
> 
> 
> 


Re: [DISCUSS] SystemML with Spark 2.0 support and roadmap

2016-08-24 Thread dusenberrymw
I think this is a great idea so that we can simplify the official release and 
reduce confusion for potential users.  Certainly we can still retain the 
potential to build the extra artifacts locally, just like Spark current does.

I would also like to suggest that we move away from the current Standalone 
package that is designed to be used with Java, and instead move to simply using 
Spark in local mode for all "standalone" applications. Since running Spark 
locally on a laptop consists of simply downloading a release binary and running 
it, without any installation, I think this is a much cleaner way now. This 
would allow us to immediately move to the goal of only releasing a single JAR 
file, as that same JAR file could be used in Spark locally, Spark on a cluster, 
and Hadoop on a cluster.  Then we could just release the single JAR file and a 
folder of scripts as our official release. All other special artifacts could be 
kept as "download and build" artifacts.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Aug 23, 2016, at 5:22 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> On Tue, Aug 23, 2016 at 3:51 PM, Deron Eriksson <deroneriks...@gmail.com>
> wrote:
> 
>> To simplify release candidate validation, I would like to propose that the
>> distribution profile only builds the following 7 (out of the current
>> included 10) artifacts:
>> 
>> systemml-0.11.0-incubating-SNAPSHOT-javadoc.jar
>> systemml-0.11.0-incubating-SNAPSHOT-sources.jar
>> systemml-0.11.0-incubating-SNAPSHOT-src.tar.gz
>> systemml-0.11.0-incubating-SNAPSHOT-src.zip
>> systemml-0.11.0-incubating-SNAPSHOT-standalone.tar.gz (rename w/o
>> "-standalone")
>> systemml-0.11.0-incubating-SNAPSHOT-standalone.zip (rename w/o
>> "-standalone")
>> systemml-0.11.0-incubating-SNAPSHOT.jar
>> 
>> The following could still be built using maven profiles but would not be in
>> the distribution profile:
>> 
>> systemml-0.11.0-incubating-SNAPSHOT-standalone.jar
>> systemml-0.11.0-incubating-SNAPSHOT.tar.gz (also rename)
>> systemml-0.11.0-incubating-SNAPSHOT.zip (also rename)
>> 
>> This would decrease the number of our artifacts by 30% which means that we
>> can validate the release faster, and the release candidate will also be
>> more likely to pass external validation/voting.
>> 
>> Deron
> +1
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: Preview tag, was Re: [2/2] incubator-systemml git commit: Preparing SystemML development version 0.11.0-incubating-SNAPSHOT.

2016-08-17 Thread dusenberrymw
Thanks, Luciano for pointing this out. As you mentioned, the intent was 
definitely just to tag a commit that was known to be stable on the Spark 1.x 
line. I've deleted the existing tag, and created a new "spark-1.x-stable" tag 
simply pointing to a previous commit that was tested on Spark 1.x. 

Thanks!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Aug 17, 2016, at 11:18 AM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> -1
> 
> Sorry Folks, this isn't a voted release and thus creating a tag without
> SNAPSHOT is not valid. Please delete this tag.
> 
> If what is wanted is to have a stable point in the codebase where folks can
> go back if a release is needed for 1.x, then just create a branch/tag with
> a descriptive name (e.g. spark_1.x_stable).
> 
> If you actually want a release, there is a need to follow the Apache
> Release vote process (e.g. see
> https://www.mail-archive.com/dev%40spark.apache.org/msg14223.html for Spark
> preview release vote)
> 
> Thanks
> 
> 
>> On Wed, Aug 17, 2016 at 1:21 PM, <dusenberr...@apache.org> wrote:
>> 
>> Preparing SystemML development version 0.11.0-incubating-SNAPSHOT.
>> 
>> 
>> Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo
>> Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/
>> commit/b6bde0d4
>> Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/
>> tree/b6bde0d4
>> Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/
>> diff/b6bde0d4
>> 
>> Branch: refs/heads/master
>> Commit: b6bde0d4599d551cf1dc903c72662888abc22787
>> Parents: 05b6da0
>> Author: Mike Dusenberry <mwdus...@us.ibm.com>
>> Authored: Wed Aug 17 10:17:52 2016 -0700
>> Committer: Mike Dusenberry <mwdus...@us.ibm.com>
>> Committed: Wed Aug 17 10:17:52 2016 -0700
>> 
>> --
>> pom.xml | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>> --
>> 
>> 
>> http://git-wip-us.apache.org/repos/asf/incubator-systemml/
>> blob/b6bde0d4/pom.xml
>> --
>> diff --git a/pom.xml b/pom.xml
>> index aba8808..a4c66a1 100644
>> --- a/pom.xml
>> +++ b/pom.xml
>> @@ -25,7 +25,7 @@
>>18
>>
>>org.apache.systemml
>> -   0.11.0-incubating-preview
>> +   0.11.0-incubating-SNAPSHOT
>>systemml
>>jar
>>SystemML
>> @@ -41,7 +41,7 @@
>>scm:git:g...@github.com:apache/incubator-
>> systemml
>>scm:git:h
>> ttps://git-wip-us.apache.org/repos/asf/incubator-systemml> developerConnection>
>>https://git-wip-us.apache.org/repos/asf?p=
>> incubator-systemml.git
>> -   0.11.0-incubating-preview
>> +   HEAD
>>
>>
>>JIRA
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: [DISCUSS] Migration to Spark 2.0.0

2016-08-17 Thread dusenberrymw
Yes, I think this approach sounds great.  To that end, I created a new tag 
"0.11.0-incubating-preview" that points to a specific commit that contains new 
features that will be in the 0.11 release with specific support for the Spark 
1.x line.


- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Aug 16, 2016, at 4:44 PM, Frederick R Reiss <frre...@us.ibm.com> wrote:
> 
> I think the approach Glenn proposes here is fine.
> 
> Fred
> 
> Deron Eriksson ---08/16/2016 02:41:51 PM---Hi Glenn, I am fine with this 
> approach. If this approach is taken, I would like to
> 
> From: Deron Eriksson <deroneriks...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 08/16/2016 02:41 PM
> Subject: Re: [DISCUSS] Migration to Spark 2.0.0
> 
> 
> 
> 
> Hi Glenn,
> 
> I am fine with this approach. If this approach is taken, I would like to
> set the documentation version in _config.yml to 0.10.x before the project
> is tagged (I recently set it to 0.11).
> 
> Deron
> 
> 
> On Thu, Aug 11, 2016 at 3:40 PM, Glenn Weidner <gweid...@us.ibm.com> wrote:
> 
> > I would like to propose an alternative to supporting Spark 2.0 and Spark
> > 1.x within single stream.
> >
> > 1) Capture snapshot and establish label of current Apache SystemML master
> > which includes new features added since 0.10.0 release.
> >
> > 2) After step 1 completed, enable master to move forward with support for
> > Spark 2.x only.
> >
> > This is similar to what Fred initially proposed except step 1 would not
> > involve a separate release. The 0.11 release of Apache SystemML would be
> > compatible for Spark 2.0 and Scala 2.11.
> >
> > Thanks,
> > Glenn
> >
> > [image: Inactive hide details for Glenn Weidner---08/08/2016 03:33:43
> > PM---As a preliminary experiment in attempt to compile against bo]Glenn
> > Weidner---08/08/2016 03:33:43 PM---As a preliminary experiment in attempt
> > to compile against both Spark 2.0.0 and Spark 1.6.2 from same
> >
> > From: Glenn Weidner/Silicon Valley/IBM@IBMUS
> > To: dev@systemml.incubator.apache.org
> > Date: 08/08/2016 03:33 PM
> > Subject: Re: [DISCUSS] Migration to Spark 2.0.0
> > --
> >
> >
> >
> > As a preliminary experiment in attempt to compile against both Spark 2.0.0
> > and Spark 1.6.2 from same code base, I made another set of changes for
> > comparison against previous proposed changes for [SYSTEMML-776].
> > This experimental set can be viewed here:
> >
> > *https://github.com/gweidner/incubator-systemml/commit/0611f0c197e4a0e816b3325093168bc5162d62c0*
> > <https://github.com/gweidner/incubator-systemml/commit/0611f0c197e4a0e816b3325093168bc5162d62c0>
> >
> > This compiles against Spark 2.0.0 and Spark 1.6.2 except for fit/transform
> > overrides in LogisticRegression.scala due to:
> > SPARK-14500 Accept Dataset[] instead of DataFrame in MLlib APIs
> >
> > Detailed code comments and suggestions to try out can be made in the
> > branch commit instead of this mail thread.
> >
> > Thanks,
> > Glenn
> >
> > Deron Eriksson ---08/05/2016 02:02:10 PM---I am open to the idea of
> > supporting Spark 2 and Spark<2 concurrently if someone shows that it can be
> >
> > From: Deron Eriksson <deroneriks...@gmail.com>
> > To: dev@systemml.incubator.apache.org
> > Date: 08/05/2016 02:02 PM
> > Subject: Re: [DISCUSS] Migration to Spark 2.0.0
> > --
> >
> >
> >
> > I am open to the idea of supporting Spark 2 and Spark<2 concurrently if
> > someone shows that it can be accomplished with minimal inconvenience.
> >
> > However, I would lean towards Fred's approach (Spark 1.6 release followed
> > shortly by a Spark 2 release). If possible, I want to be able to focus most
> > of our efforts towards the future rather than the past.
> >
> > Deron
> >
> >
> > On Thu, Aug 4, 2016 at 10:59 AM, Luciano Resende <luckbr1...@gmail.com>
> > wrote:
> >
> > > That was going to be my suggestion... In Zeppelin, we just introduced
> > > support for different versions of scala and added support for spark 2.0
> > > based on profiles and a bit of reflections...
> > >
> > > Do we have to do anything related to Scala versions as well ?
> > >
> > > On Thursday, August 4, 2016, Matthias Boehm <mbo...@us.ibm.com> wrote:
> > >
> > > > I would recommend to start an in

0.10 Maintenance Branch

2016-07-29 Thread dusenberrymw
Hi all,

Just FYI, I created a new "branch-0.10" branch to track any bug fixes that we 
would like to eventually release in a 0.10.1 release. Moving forward, please 
push any future bug fixes that would be applicable to the 0.10 series to this 
branch, in addition to the master branch. Additionally, please run tests on 
both branches.  I've started with a bug fix to our existing Python API that 
prevented usage in Python 3. 

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: Build failed in Jenkins: SystemML-DailyTest #340

2016-06-25 Thread dusenberrymw
Just FYI, I pushed a hotfix for the RAT failures, which were due to a couple of 
new Jupyter notebooks I added yesterday.

Thanks!
-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jun 25, 2016, at 12:31 AM, jenk...@spark.tc wrote:
> 
> See <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/340/changes>
> 
> Changes:
> 
> [Glenn Weidner] [SYSTEMML-771] Fix warnings in CsplineCG.dml and CsplineDS.dml
> 
> [mwdusenb] [SYSTEMML-618] SystemML-NN: Adding an MNIST softmax classifier 
> example,
> 
> [mwdusenb] [SYSTEMML-618] SystemML-NN: Updating the MNIST softmax classifier
> 
> [mwdusenb] [SYSTEMML-618] SystemML-NN: Adding an MNIST "LeNet" neural net 
> example,
> 
> [Matthias Boehm] [SYSTEMML-556] Simplified json meta data string 
> construction, for apis
> 
> [Matthias Boehm] [SYSTEMML-630] Fix robustness csv frame readers (count num 
> columns)
> 
> --
> [...truncated 300 lines...]
> [INFO] Copying core-1.1.2.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/core-1.1.2.jar>
> [INFO] Copying jetty-util-6.1.26.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jetty-util-6.1.26.jar>
> [INFO] Copying jackson-core-2.4.4.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jackson-core-2.4.4.jar>
> [INFO] Copying test-interface-1.0.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/test-interface-1.0.jar>
> [INFO] Copying snappy-java-1.1.1.7.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/snappy-java-1.1.1.7.jar>
> [INFO] Copying hamcrest-core-1.3.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/hamcrest-core-1.3.jar>
> [INFO] Copying uncommons-maths-1.2.2a.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/uncommons-maths-1.2.2a.jar>
> [INFO] Copying jsp-api-2.1.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jsp-api-2.1.jar>
> [INFO] Copying jersey-server-1.9.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jersey-server-1.9.jar>
> [INFO] Copying pyrolite-4.4.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/pyrolite-4.4.jar>
> [INFO] Copying compress-lzf-1.0.3.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/compress-lzf-1.0.3.jar>
> [INFO] Copying xmlenc-0.52.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/xmlenc-0.52.jar>
> [INFO] Copying zookeeper-3.4.5.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/zookeeper-3.4.5.jar>
> [INFO] Copying jasper-runtime-5.5.23.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jasper-runtime-5.5.23.jar>
> [INFO] Copying hadoop-hdfs-2.4.1.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/hadoop-hdfs-2.4.1.jar>
> [INFO] Copying antlr4-runtime-4.3.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/antlr4-runtime-4.3.jar>
> [INFO] Copying curator-framework-2.4.0.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/curator-framework-2.4.0.jar>
> [INFO] Copying jodd-core-3.6.3.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jodd-core-3.6.3.jar>
> [INFO] Copying commons-net-2.2.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/commons-net-2.2.jar>
> [INFO] Copying json4s-ast_2.10-3.2.10.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/json4s-ast_2.10-3.2.10.jar>
> [INFO] Copying commons-lang3-3.3.2.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/commons-lang3-3.3.2.jar>
> [INFO] Copying py4j-0.8.2.1.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/py4j-0.8.2.1.jar>
> [INFO] Copying stream-2.7.0.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/stream-2.7.0.jar>
> [INFO] Copying hadoop-mapreduce-client-shuffle-2.4.1.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/hadoop-mapreduce-client-shuffle-2.4.1.jar>
> [INFO] Copying slf4j-api-1.7.10.jar to 
> <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/slf4j-api-1.7.10.jar>

Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)

2016-06-01 Thread dusenberrymw
+1

Tested the main JAR with a PySpark Jupyter notebook. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jun 1, 2016, at 12:16 PM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> +1, but please note following findings:
> 
> 1. Is the *source-release.zip artifact unnecessary, since we have
> src.tar.gz and src.zip artifacts? Also, it contains the Hadoop binaries.
> So, it can't be used as the "source release" artifact.
> 2. No standalone uberjar is present (I am happy with this since no one to
> my knowledge is using it and the LICENSE/NOTICE may need updating. I would
> like to remove this artifact forever.)
> 3. No in-memory jar is present (I am happy with this too since this
> artifact is not very lightweight as it was probably initially meant to be.)
> 
> Deron
> 
> 
> 
> 
> 
> On Wed, Jun 1, 2016 at 10:01 AM, Frederick R Reiss <frre...@us.ibm.com>
> wrote:
> 
>> +1
>> 
>> Sent from my iPhone using IBM Verse
>> 
>> On Jun 1, 2016, 9:31:36 AM, reinw...@us.ibm.com wrote:
>> 
>> From: reinw...@us.ibm.com
>> To: dev@systemml.incubator.apache.org
>> Cc:
>> Date: Jun 1, 2016 9:31:36 AM
>> Subject: Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)
>> 
>> 
>>   +1
>>  Regards,
>>  Berthold Reinwald
>>  IBM Almaden Research Center
>>  office: (408) 927 2208; T/L: 457 2208
>>  e-mail: reinw...@us.ibm.com
>>  From:   Shirish Tatikonda
>>  To: dev@systemml.incubator.apache.org
>>  Date:   06/01/2016 12:47 AM
>>  Subject:Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)
>>  +1
>>>  On Jun 1, 2016 12:40 AM, "Matthias Boehm"  wrote:
>>> +1, but if there is a third rc, let us please create a branch or cut
>> the
>>> release as of today to ensure no new features are leaking in.
>>> 
>>> Regards,
>>> Matthias
>>> 
>>> [image: Inactive hide details for Luciano Resende ---05/31/2016
>> 10:05:48
>>> PM---Please vote on releasing the following candidate as Apach]Luciano
>>> Resende ---05/31/2016 10:05:48 PM---Please vote on releasing the
>>  following
>>> candidate as Apache SystemML version 0.10.0-incubating !
>>> 
>>> From: Luciano Resende
>>> To: dev@systemml.incubator.apache.org
>>> Date: 05/31/2016 10:05 PM
>>> Subject: [VOTE] Apache SystemML 0.10.0-incubating (RC2)
>>> --
>>> 
>>> 
>>> 
>>> Please vote on releasing the following candidate as Apache SystemML
>>  version
>>> 0.10.0-incubating !
>>> 
>>> The vote is open for at least 72 hours and will close on Saturday,
>>> Wednesday 25 and passes if a majority of at least 3 +1 PMC votes are
>>  cast.
>>> 
>>> [ ] +1 Release this package as Apache SystemML 0.10.0-incubating
>>> [ ] -1 Do not release this package because ...
>>> 
>>> To learn more about Apache SystemML, please see
>>> http://systemml.apache.org/
>>> 
>>> The tag to be voted on is v0.10.0-incubating-rc2
>>> (3d5f9b11741f6d6ecc6af7cbaa1069cde32be838)
>> 
>> https://github.com/apache/incubator-systemml/tree/3d5f9b11741f6d6ecc6af7cbaa1069cde32be838
>>> 
>>> The release artifacts can be found at :
>> 
>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.10.0-incubating-rc2/
>>> 
>>> The maven release artifacts, including signatures, digests, etc. can be
>>> found at:
>> 
>> https://repository.apache.org/content/repositories/orgapachesystemml-1006/
>>> 
>>> 
>>> =
>>> == Apache Incubator release policy ==
>>> =
>>> Please find below the guide to release management during incubation:
>>> http://incubator.apache.org/guides/releasemanagement.html
>>> 
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a SystemML user, you can help us test this release by taking
>>  an
>>> existing Algorithm or workload and running on this release candidate,
>>  then
>>> reporting any regressions.
>>> 
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> -1 votes should only occur for significant stop-ship bugs or legal
>>  related
>>> issues (e.g. wrong license, missing header files, etc). Minor bugs or
>>> regressions should not block this release.
>>> 
>>> --
>>> Luciano Resende
>>> http://twitter.com/lresende1975
>>> http://lresende.blogspot.com/
>> 


Re: Discussion on GPU backend

2016-05-25 Thread dusenberrymw
In my opinion, the problem with using a separate branch with longer-term work, 
rather than smaller PRs into the master, is that after several commits, say 10 
or 20, it becomes much more difficult to rebase without running into nasty 
merge conflicts, especially when those conflicts are on an intermediate commit 
so one would have to remember what the code looked like at that point in time 
to properly fix the conflicts. To me, this invites issues such as duplicated 
code and slower progress.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On May 25, 2016, at 9:01 AM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> On Wed, May 25, 2016 at 6:03 AM, Berthold Reinwald <reinw...@us.ibm.com>
> wrote:
> 
>> the discussion is less about (1), (2), or (3). As practiced so far, (3) is
>> the way to go.
>> 
>> The question is about (A) or (B). Curious was the Apache suggested
>> practice is.
> Apache is key on fostering open collaboration, so specifically about
> branching, having a SystemML branch that is used for
> collaboration/experimentation is probably preferable, as it gives
> visibility to others on the community, enables iterative development trough
> review of small patches, while shield the trunk of issues these experiments
> can cause.
> 
> I would just recommend to avoid making the branch stale, and keep rebasing
> it with latest master, which will make integration much easier in the
> future.
> 
> 
> 
> -- 
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: Discussion on GPU backend

2016-05-25 Thread dusenberrymw
Yeah to do this in the most "Apache Way (TM)", as well as to maintain sanity, 
we should definitely use JIRA issues (ideally actual "sub tasks") and PRs to 
split up major features. It would also be great to split it up into chunks of 
varying complexity that do not block others, so that we could gather more 
contributors of various SystemML experience levels. The JIRA issues should be 
used to divvy up tasks, and PRs should be used to propose an implementation for 
that task, which would be followed by the usual comments from other 
contributors. 

As for a few other best practices with PRs, the PRs should also be merged with 
a "Closes #172." line appended to the end, where the number reflects the GitHub 
PR number, so that the conversations on a PR are linked to the final merged 
commit. Also, any necessary rebasing on a PR should be done by simply 
overwriting that PR branch (which exists on the contributor's fork of 
SystemML), which allows GitHub to keep the same PR open, and thus the entire 
conversation can be followed. 

Excited about the GPU work!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On May 25, 2016, at 8:08 AM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> Thanks Berthold and Matthias for your suggestions. It is important to note 
> whether we go with (A) or (B), the initial PR will be squashed in one commit 
> and individual commits by external contributor will be lost in the process. 
> However, since we are planning to go with option (3), the impact won't be too 
> severe.
> 
> Matthias: Here are my thoughts regarding the unknowns for GPU backend:
> 1. Handling of native libraries:
> Both JCuda and Nvidia provide shared libraries/DLL for most OS/platforms 
> along with installation instructions.
> 
> For deployment:
> As per the previous email, the native libraries will be treated as an 
> external dependency, just like hadoop/spark. For example: if someone 
> executes: "hadoop jar SystemML.jar -f test.dml -exec hybrid_spark", she will 
> get "Class Not Found" exception. In similar fashion, if the user doesnot 
> include JCu*.jar or provide native libraries (JCu*.dll/so or CUDA or CuDNN) 
> and supplies "-accelerator" flag, a "Class not found" or "Cannot load .." 
> exception will be thrown respectively. If user doesnot supply "-accelerator" 
> flag, SystemML will proceed will normal execution as it does today. 
> 
> For dev:
> We are planning to host jcu*.jar into one of maven repository. Once that's 
> done, the "system" scope in pom will be replaced by "provided" scope and the 
> jcu*.jars will be deleted from PR. Like deployment, it is responsibility of 
> the developer to install native libraries if she intends to work on GPU 
> backend.
> 
> For testing:
> The user can set the environment variable "CUDA_PATH" and set TEST_GPU flag 
> to enable GPU tests (Please see 
> https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113).
>  The PR will be accompanied by additional tests which will be enabled only 
> when TEST_GPU is set. Having TEST_GPU flag allows users without Nvidia GPU to 
> run the integration test. Like deployment, it is responsibility of the 
> developer to install native libraries for testing with TEST_GPU flag. 
> 
> The first version will not contain custom native kernels. 
> 
> 2. I can add the summary of the performance comparisons in the PR :)
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Berthold Reinwald---05/25/2016 06:03:55 AM---the discussion is less about 
> (1), (2), or (3). As practiced so far, (3) is the way to go.
> 
> From: Berthold Reinwald/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 05/25/2016 06:03 AM
> Subject: Re: Discussion on GPU backend
> 
> 
> 
> 
> the discussion is less about (1), (2), or (3). As practiced so far, (3) is 
> the way to go.
> 
> The question is about (A) or (B). Curious was the Apache suggested 
> practice is.
> 
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
> 
> 
> 
> From:   Matthias Boehm/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date:   05/24/2016 09:10 PM
> Subject:Re: Discussion on GPU backend
> 
> 
> 
> Generally, I think we should really stick to (3) as done in the past, 
> i.e., bring up major features in the roadmap discussions, create jira 
> epics

Re: Draft - May 2016 SystemML Incubator Podling Report

2016-05-03 Thread dusenberrymw
I might add that we are preparing to release our next version very soon. 
Otherwise, LGTM. Thanks, Deron!

-Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On May 2, 2016, at 1:11 PM, Niketan Pansare <npan...@us.ibm.com> wrote:
> 
> Hi Deron,
> 
> Thanks for writing the draft. I also presented SystemML at Rice University. 
> Can you please add it to the report ?
> 
> Link to the event on Rice CS calendar: 
> https://calendar.google.com/calendar/render?eid=MTdqZnJmZHZqM2ExNWlkbWtwa2czZXFzYmcgZnBoYmd1b3JsbzM2azJ0MWk4djk5ODcwbWtAZw=America/Chicago=true=xml#eventpage_6
> 
> Thanks,
> 
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> 
> Deron Eriksson ---05/02/2016 12:36:06 PM---Hi, I created a draft for the May 
> 2016 SystemML podling report that is due this
> 
> From: Deron Eriksson <deroneriks...@gmail.com>
> To: dev@systemml.incubator.apache.org
> Date: 05/02/2016 12:36 PM
> Subject: Draft - May 2016 SystemML Incubator Podling Report
> 
> 
> 
> 
> Hi,
> 
> 
> I created a draft for the May 2016 SystemML podling report that is due this
> Wednesday. Please provide feedback if you'd like anything updated. For PMC
> members, if the issue is private related to the project, please use the
> private mailing list for discussion.
> 
> Thanks!
> 
> Deron
> 
> 
> 
> SystemML
> 
> 
> SystemML provides declarative large-scale machine learning (ML) that aims at
> 
> flexible specification of ML algorithms and automatic generation of hybrid
> 
> runtime plans ranging from single node, in-memory computations, to
> 
> distributed computations running on Apache Hadoop MapReduce and Apache
> 
> Spark.
> 
> 
> SystemML has been incubating since 2015-11-02.
> 
> 
> Three most important issues to address in the move towards graduation:
> 
> 
>  - Grow SystemML community: increase mailing list activity,
> 
>increase adoption of SystemML for scalable machine learning, encourage
> 
>data scientists to adopt DML and PyDML algorithm scripts, respond to
> 
>user feedback to ensure SystemML meets the requirements of real-world
> 
>situations, write papers, and present talks about SystemML.
> 
>  - Continue to produce releases.
> 
>  - Increase the diversity of our project's contributors and committers.
> 
> 
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> 
> aware of?
> 
> 
>  NONE.
> 
> 
> How has the community developed since the last report?
> 
> 
>  Our mailing list from February through April had 199 messages involving
> 
>  topics such as algorithms, DML functionality, usability, and bug fixes. In
> 
>  addition, we have had many discussions on our JIRA site and in pull
> request
> 
>  conversations. Fred Reiss presented at Spark Summit East on February 17
> about
> 
>  SystemML internals. Berthold Reinwald spoke at the Spark Technology
> Center on
> 
>  March 9 about scalable machine learning with SystemML. Niketan Pansare
> spoke
> 
>  on April 28 at Datapalooza in Austin about declarative machine learning at
> 
>  scale with SystemML. Researchers in Germany are working to add Flink as an
> 
>  additional SystemML backend. On GitHub, the project has been starred 267
> 
>  times and forked 92 times.
> 
> 
> How has the project developed since the last report?
> 
> 
>  We produced our first Apache release, version 0.9.0-incubating. Numerous
> 
>  additions have been made to the project, including core functionality,
> 
>  usability improvements, and documentation. The project has had 204 commits
> 
>  since February 1. In the same time frame, 155 new issues have been
> reported
> 
>  on our JIRA site and 77 issues have been resolved. 114 pull requests
> opened
> 
>  since Febrary 1 have been closed.
> 
> 
> Date of last release:
> 
> 
>  2016-02-15 (version 0.9.0-incubating)
> 
> 
> When were the last committers or PMC members elected?
> 
> 
>  NONE
> 
> 
> 


Deprecate `ppred(...)` Built-in Function

2016-05-03 Thread dusenberrymw
Hi all,

The `ppred(...)` built-in function (`ppred(X, 0, ">")`) is no longer necessary 
as relational comparison operators are supported natively in the language (`X > 
0`) and follow R's semantics.

SYSTEMML-657 had been created to track the deprecation of this function, and is 
currently open if anyone would like to take it on.  We'd like to add deprecated 
warnings to the parser and documentation, and replace all current uses of the 
function in our DML scripts. 


Thanks!

- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: remove castAsScalar?

2016-04-22 Thread dusenberrymw
Yeah those both sound great.  Even if we have to possibly support old DML code 
outside the project, we can certainly aim to keep our DML code as modern and 
clean as possible. 

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Apr 22, 2016, at 11:29 AM, Deron Eriksson <deroneriks...@gmail.com> wrote:
> 
> In that case, perhaps I could create JIRAs to:
> 1) replace all castAsScalar's in the project with as.scalar's
> 2) if castAsScalar is used in a DML file, issue a log warning such as
> 'castAsScalar has been deprecated, please replace with as.scalar'
> 3) update docs to say castAsScalar has been deprecated.
> 
> That way, we maintain backwards compatibility with older DML outside the
> project while replacing the castAsScalar's in the project.
> 
> Deron
> 
> 
> 
>> On Thu, Apr 21, 2016 at 5:42 PM, Matthias Boehm <mbo...@us.ibm.com> wrote:
>> 
>> Let's be careful not to unnecessarily break backwards compatibility. How
>> about we collect all instances of language builtin functions that we want
>> to remove and clean them up with our 1.0 release later this year? There are
>> other instances like ppred that do not exist in R and meanwhile redundant
>> in DML (but still heavily used).
>> 
>> Regards,
>> Matthias
>> 
>> [image: Inactive hide details for Deron Eriksson ---04/21/2016 05:33:56
>> PM---Hi, In the ongoing discussion concerning printing a matrix]Deron
>> Eriksson ---04/21/2016 05:33:56 PM---Hi, In the ongoing discussion
>> concerning printing a matrix (at
>> 
>> From: Deron Eriksson <deroneriks...@gmail.com>
>> To: dev@systemml.incubator.apache.org
>> Date: 04/21/2016 05:33 PM
>> Subject: remove castAsScalar?
>> --
>> 
>> 
>> 
>> Hi,
>> 
>> In the ongoing discussion concerning printing a matrix (at
>> https://github.com/apache/incubator-systemml/pull/120), I noticed that
>> castAsScalar was introduced to the language as a mistake. It has been
>> replaced by as.scalar but castAsScalar has been kept around until now for
>> historical reasons. Since it is redundant and we are an open source
>> project, can we now go ahead and remove it, since having two ways to
>> accomplish the same thing (as.scalar and castAsScalar) can be confusing to
>> new users?
>> 
>> Deron
>> 
>> 
>> 


Re: [VOTE] Release SystemML 0.9.0-incubating (RC2)

2016-01-28 Thread dusenberrymw
FYI, the issue was due to a leftover Git cache from before the addition of the 
`.gitattributes` file that fixed these line endings.  The cache on the machine 
used for cutting the release still contained the files with the Windows-style 
line-endings.  Since the `.gitattributes` file is present, these incorrect 
line-endings would not have made their way back into the official repo, but 
when building the release distributions locally, they were simply copied over.  
The solution was to instruct Git to remove its local cache as follows, which 
may be beneficial for everyone to perform:

- `git rm --cached -r .`
- `git reset --hard`

Just note that this will remove any changes that have not yet been committed or 
stashed locally.


- Mike

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 28, 2016, at 5:07 PM, Luciano Resende <luckbr1...@gmail.com> wrote:
> 
> Vote canceled, as it seems we have been hit by end of line problems again.
> New vote coming up shortly.
> 
> On Thu, Jan 28, 2016 at 2:02 PM, Luciano Resende <luckbr1...@gmail.com>
> wrote:
> 
>> Please vote on releasing the following candidate as Apache SystemML
>> version 0.9.0!
>> 
>> The vote is open for at least 72 hours and passes if a majority of at
>> least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache SystemML 0.9.0
>> [ ] -1 Do not release this package because ...
>> 
>> To learn more about Apache SystemML, please see
>> http://systemml.apache.org/
>> 
>> The tag to be voted on is v0.9.0-rc2
>> (6da9d60db4a5a7adfcc943d954f41153e496866f)
>> 
>> 
>> https://github.com/apache/incubator-systemml/tree/6da9d60db4a5a7adfcc943d954f41153e496866f
>> 
>> The release files, including signatures, digests, etc. can be found at:
>> 
>> https://repository.apache.org/content/repositories/orgapachesystemml-1002/
>> 
>> The distribution is also available at:
>> 
>> http://people.apache.org/~lresende/systemml/0.9.0-rc2/
>> 
>> =
>> == Apache Incubator release policy ==
>> =
>> Please find below the guide to release management during incubation:
>> http://incubator.apache.org/guides/releasemanagement.html
>> 
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a SystemML user, you can help us test this release by taking
>> an existing Algorithm or workload and running on this release candidate,
>> then reporting any regressions.
>> 
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> -1 votes should only occur for significant stop-ship bugs or legal
>> related issues (e.g. wrong license, missing header files, etc). Minor bugs
>> or regressions should not block this release.
>> 
>> 
>> --
>> Luciano Resende
>> http://people.apache.org/~lresende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
> 
> 
> 
> -- 
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Future Release Package Naming & Structure

2016-01-25 Thread dusenberrymw
Hi all,

A discussion regarding the release package structure started on pull request 54 
[https://github.com/apache/incubator-systemml/pull/54].  Currently, we have a 
"distributed" release for running SystemML on a cluster* using Spark or Hadoop, 
as well as a "standalone" release for running SystemML on a single node with 
Java (no Spark or Hadoop installation necessary).  Given this, two questions 
were raised during the discussion:

  1. Should we name our releases as "*-cluster" and "*-standalone", or just 
distinguish the standalone version as "*" and "*-standalone"?
  2. Should we maintain the two separate releases ("distributed" and 
"standalone"), or should we move to have one single release with one JAR that 
works in all environments and execution modes?

The consensus was that there are pros and cons for each option, and that this 
discussion would be more appropriate for the mailing list.

Thoughts?

Thanks,
- Mike

* Yes, SystemML can still be run in single node execution mode even on Spark or 
Hadoop.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.



Re: [VOTE] Release SystemML 0.9-incubating (RC1)

2016-01-20 Thread dusenberrymw
+1

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jan 20, 2016, at 3:39 AM, Frederick R Reiss <frre...@us.ibm.com> wrote:
> 
> 
> +1
> 
> Sent from my iPhone
> 
>>> On Jan 20, 2016, at 11:06 AM, Shirish Tatikonda
>> <shirish.tatiko...@gmail.com> wrote:
>> 
>> +1
>> 
>> 
>> 
>> On Tue, Jan 19, 2016 at 9:46 PM, Luciano Resende <luckbr1...@gmail.com>
>> wrote:
>> 
>>> Please vote on releasing the following candidate as Apache SystemML
> version
>>> 0.9.0!
>>> 
>>> The vote is open for at least 72 hours and will close on Saturday,
> January
>>> 23 and passes if a majority of at least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache SystemML 0.9.0
>>> [ ] -1 Do not release this package because ...
>>> 
>>> To learn more about Apache SystemML, please see
>>> http://systemml.apache.org/
>>> 
>>> The tag to be voted on is v0.9.0-rc1
>>> (3e7e5cf6ca697ec247a7dc4e005a7f7b1cb18856)
> https://github.com/apache/incubator-systemml/tree/3e7e5cf6ca697ec247a7dc4e005a7f7b1cb18856
> 
>>> 
>>> The release files, including signatures, digests, etc. can be found at:
> https://repository.apache.org/content/repositories/orgapachesystemml-1001/
>>> 
>>> 
>>> =
>>> == Apache Incubator release policy ==
>>> =
>>> Please find below the guide to release management during incubation:
>>> http://incubator.apache.org/guides/releasemanagement.html
>>> 
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a SystemML user, you can help us test this release by taking
> an
>>> existing Algorithm or workload and running on this release candidate,
> then
>>> reporting any regressions.
>>> 
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> -1 votes should only occur for significant stop-ship bugs or legal
> related
>>> issues (e.g. wrong license, missing header files, etc). Minor bugs or
>>> regressions should not block this release.
>>> 
>>> --
>>> Luciano Resende
>>> http://people.apache.org/~lresende
>>> http://twitter.com/lresende1975
>>> http://lresende.blogspot.com/