Re: New Google Summer of Code 2017 Student - Krishna Kalyan
Welcome, Krishna! Looking forward to working with you! For a bit of my background related to the project, I've been heavily focused on deep learning by building a DML library for DL (in `scripts/nn`) and working on an applied DL project (in `projects/breast_cancer`). I've also worked on the engine optimizer a bit, added a few new built-in ops to the engine, and run the perf tests previously. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On May 6, 2017, at 3:18 AM, Arvind Surve <ac...@yahoo.com.INVALID> wrote: > > Welcome Krishna > Arvind Surve | Spark Technology Center | http://www.spark.tc/ > > From: Niketan Pansare <npan...@us.ibm.com> > To: dev@systemml.incubator.apache.org > Sent: Friday, May 5, 2017 3:45 PM > Subject: Re: New Google Summer of Code 2017 Student - Krishna Kalyan > > Welcome Krishna !! > > > Krishna Kalyan ---05/05/2017 03:36:59 PM---Thank you so much, Looking forward > to work with every one in this community. Thank you for all > > From: Krishna Kalyan <krishnakaly...@gmail.com> > To: Nakul Jindal <naku...@gmail.com> > Cc: dev@systemml.incubator.apache.org > Date: 05/05/2017 03:36 PM > Subject: Re: New Google Summer of Code 2017 Student - Krishna Kalyan > > > > Thank you so much, > Looking forward to work with every one in this community. Thank you for all > the feedback and this amazing opportunity. > > Regards, > Krishna > > > > > > On May 5, 2017 19:05, "Nakul Jindal" <naku...@gmail.com> wrote: > > Hi All, > > Let us all welcome Krishna Kalyan as a student of Google Summer of Code to > work on SystemML. > He will be working on automating the performance testing process of > SystemML. > > His project proposal is attached and the JIRA tracking his project can be > found at https://issues.apache.org/jira/browse/SYSTEMML-1451 > > He has already been active with the community (https://www.mail-archive.com/ > dev@systemml.incubator.apache.org/msg01209.html) since January. > > @Krishna - Even though I am officially the mentor, I encourage you to > address questions to various members of the community with issues you > encounter throughout the project. Dig through Pull Requests and discussions > to figure out who is familiar with which components. > > (I can help a cbit with my background - I have worked on the DML grammar and > ANTLR parser layer previously and am working on the GPU backend now. I also > ran the perf tests and am somewhat familiar with the work needed to > automate it.) > > Welcome! > > -Nakul > > > > >
Re: [DISCUSS] Remove old MLContext API
+1 -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On May 1, 2017, at 5:13 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > > > Hi all, > > The old MLContext API (org.apache.sysml.api.MLContext, org.apache.sysml.api > .MLContextProxy, org.apache.sysml.api.MLMatrix, org.apache.sysml.api. > MLOutput and org.apache.sysml.api.MLBlock) has been deprecated for a while. > I would recommend removing it from our source code. Please email back if > you have concerns or objections. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
Re: [NOTICE] New Apache SystemML Committer and PPMC Member
Welcome, Felix! -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On May 1, 2017, at 4:23 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Congratulations Felix !! > > > Luciano Resende ---05/01/2017 04:21:30 PM---Welcome Felix. On Mon, May 1, > 2017 at 4:18 PM, Arvind Surve <ac...@yahoo.com.invalid> > > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org, Arvind Surve <ac...@yahoo.com> > Date: 05/01/2017 04:21 PM > Subject: Re: [NOTICE] New Apache SystemML Committer and PPMC Member > > > > > Welcome Felix. > > On Mon, May 1, 2017 at 4:18 PM, Arvind Surve <ac...@yahoo.com.invalid> > wrote: > > > I would like to welcome Felix Schueler as a new > > Committer and PPMC member of Apache SystemML. > > > > Thanks for all your work, and welcome !!! > > > > Arvind Surve | Spark Technology Center | http://www.spark.tc/ > > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > >
Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4)
+1 Grabbed the tar binary and the tar source and tested various local scripts in Scala & Python 2 + 3, and those ran fine. However, I did run the MNIST LeNet demo on both our 0.13 release and this 0.14 candidate, and I noticed a regression in 0.14. For the same script run back to back, the 0.14 candidate took longer, and looking into the stats, on 0.13 there were 864 Spark instructions executed, while on this 0.14 there were 2513 Spark instructions executed. This also brought the `sp_mapmm` and `sp_sel+` instructions into the top 10 heavy hitters. This could be related to the issue that I am seeing in SYSTEMML-1561. Regardless, I'm still fine with releasing this, since the deep learning support is still experimental for 0.14. For our upcoming 1.0 release, all engine bugs and issues related to deep learning need to be fixed. Most of these bugs are generally applicable to all algorithms, so it is in the benefit of the project to fix them. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 28, 2017, at 10:37 AM, Arvind Surve <ac...@yahoo.com.INVALID> wrote: > > +1 > Completed following verifications - License and Notice validations - > Binary runtime validations- Source code compilation and runtime > validations - Python scripts validations using Python 2 Arvind Surve | > Spark Technology Center | http://www.spark.tc/ > > From: Glenn Weidner <gweid...@us.ibm.com> > To: dev@systemml.incubator.apache.org > Sent: Monday, April 24, 2017 9:30 PM > Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4) > > +1 > > Successfully ran Linear Regression, Logistic Regression, Naive Bayes, SVM in > Python notebooks with Spark 2.0.2 (in cloud environment) and Spark 2.1 (on > local test cluster) after pip install of RC4 python artifact > systemml-0.14.0-incubating-python.tgz. Also ran Linear Regression Conjugate > Gradient in Scala notebooks. > > Regards, > Glenn > > Matthias Boehm ---04/24/2017 02:02:12 AM---+1 I ran large-scale experiments > on Spark 2.1 for L2SVM, GLM, MLogreg, > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 04/24/2017 02:02 AM > Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC4) > > > > +1 > > I ran large-scale experiments on Spark 2.1 for L2SVM, GLM, MLogreg, > LinregCG, LinregDS, and PCA over scaled versions of MNIST and ImageNet (up > to 1TB, with uncompressed and compressed linear algebra) without any > issues. > > Compared to previous experiments with SystemML 0.11 and Spark 1.6, I've > seen substantial performance improvements of >2x for iterative algorithms > with RDD operations in the inner loop over out-of-core datasets. > > Regards, > Matthias > > On Wed, Apr 19, 2017 at 4:17 PM, Arvind Surve <ac...@yahoo.com.invalid> > wrote: > >> Please vote on releasing the following candidate as Apache SystemML >> version 0.14.0-incubating ! >> The vote is open for at least 72 hours and passes if a majority of at >> least 3 +1 PMC votes are cast. >> [ ] +1 Release this package as Apache SystemML 0.14.0-incubating[ ] -1 Do >> not release this package because ... >> To learn more about Apache SystemML, please see http://systemml.apache. >> org/ >> The tag to be voted on is v0.14.0-incubating-rc4 ( >> 8bdcf106ca9bd04c0f68924ad5827eb7d7d54952) >> https://github.com/apache/incubator-systemml/commit/ >> 8bdcf106ca9bd04c0f68924ad5827eb7d7d54952 >> >> The release artifacts can be found at :https://dist.apache.org/ >> repos/dist/dev/incubator/systemml/0.14.0-incubating-rc4/ >> The maven release artifacts, including signatures, digests, etc. can >> be found at:https://repository.apache.org/content/repositories/ >> orgapachesystemml-1021/org/apache/systemml/systemml/0.14.0-incubating/ >> === Apache Incubator release policy >> ===Please find below the guide to >> release management during incubation:http://incubator.apache.org/guides/ >> releasemanagement.html >> = How can I help test this >> release? =If you are a SystemML >> user, you can help us test this release by taking an existing Algorithm or >> workload and running on this release candidate, thenreporting any >> regressions. >> == What justifies a -1 >> vote for this release? ==-1 >> votes should only occur for significant stop-ship bugs or legal >> related issues (e.g. wrong license, missing header files, etc). Minor bugs >> or regressions should not block this release. >> -Arvind >> Arvind Surve | Spark Technology Center | http://www.spark.tc/ > > > > >
Re: Build passed/failed messages for pull requests
I would prefer option 2. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 28, 2017, at 12:40 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: > > My preference is option 3. > > Thanks, > Glenn > > > Arvind Surve ---04/28/2017 11:09:48 AM---Agree, these messages are > distractions. Arvind Surve | Spark Technology Center | http://www.spark. > > From: Arvind Surve <ac...@yahoo.com.INVALID> > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org> > Date: 04/28/2017 11:09 AM > Subject: Re: Build passed/failed messages for pull requests > > > > > Agree, these messages are distractions. > Arvind Surve | Spark Technology Center | http://www.spark.tc/ > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Sent: Friday, April 28, 2017 11:05 AM > Subject: Re: Build passed/failed messages for pull requests > > as I commented on one of these github comments, I'm strongly against > these kind of unnecessary messages because they distract from the actual > discussions. I already had to change my notification settings > accordingly - essentially I'm not watching SystemML's PR activity any > more. > > Regards, > Matthias > > On 4/28/2017 10:42 AM, Deron Eriksson wrote: > > Hi, > > > > When a pull request is created or another commit is pushed to that pull > > request, a build including running our test suite is performed (Jenkins at > > https://sparktc.ibmcloud.com/jenkins/job/SystemML-PullRequestBuilder/). > > This is the same model that other projects such as Apache Spark use > > (Jenkins at > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/). > > > > A few days ago, automated build passed/failed pull request messages were > > introduced to our pull requests, following the same type of Spark model. > > A) SystemML example: https://github.com/apache/incubator-systemml/pull/442 > > B) Spark example: https://github.com/apache/spark/pull/17765 > > > > Personally I like these messages because for contributors that do pull > > requests, it automatically tells them the status of the build for their > > pull requests and gives them a direct link to the build/test results. An > > opposing viewpoint would be that these messages are somewhat like spam. > > > > So we should make a public decision on the mailing list what to do about > > these automated build status messages. > > > > Some options: > > (1) keep the automated messages exactly as they are > > (2) keep the automated messages, but consolidate the two messages into one > > (such as "Build successful" and "Refer to this link..."). > > (3) get rid of the automated messages > > > > I like (2). Any other opinions or options? > > > > Thoughts? > > > > Deron > > > > > > > > >
Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project
Hi Aishwarya, Yes, it is quite strange that Jupyter isn't running on the PySpark kernel even though it's being started in that manner. The good news is that we do use this everyday, so once we find the root issue with your Jupyter, it should work great! Let's try temporarily removing all of the existing Jupyter/IPython settings & kernels and basically start fresh. Assuming you are on OS X / macOS or Linux, can you do the following? (Please double check the exact paths, as I'm typing on a phone.) * Stop Jupyter, and make sure that it is not running. * Temporarily remove the Jupyter kernels. First, you will need to see where they are installed, and then just rename that path. `jupyter kernelspec list` # look at paths above. For example, on macOS, it may be located at ~/Library/Jupyter/kernels, and thus to move it, you would use the following. Update this as needed for the exact paths listed above `mv ~/Library/Jupyter/kernels ~/Library/Jupyter_OLD/kernels` * Temporarily remove the Jupyter & IPython settings: `mv ~/.jupyter ~/.jupyter_OLD` `mv ~/.ipython ~/.ipython_OLD` * Make sure Jupyter is up to date: `pip3 install -U ipython jupyter` After that, please ensure that Jupyter is not running, then start it in the context of PySpark as sent previously. Once Jupyter is started this time, there should only be one kernel listed, and `sc` should be available. Can you try that? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 26, 2017, at 2:13 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> > wrote: > > Hi sir, > The sc NameError persists. > > (1) There is only one jupyter server running. And that was started with the > pyspark command in the previous mail. > (2) Two kernels are appearing in the change kernel option - Python3 and > Python2. Tried with both of them and the result is the same. > > How is jupyter not being able to run on the pyspark kernel when we have > started the notebook with the pyspark command only? > > Is it possible to create a .py file of MachineLearning.ipynb like was done > with preprocessing.ipynb with explicitly creating a SparkContext() ? > >> On 25-Apr-2017 11:57 PM, <dusenberr...@gmail.com> wrote: >> >> Hi Aishwarya, >> >> Unfortunately this mailing list removes all images, so I can't view your >> screenshot. I'm assuming that it is the same issue with the missing >> SparkContext `sc` object, but please let me know if it is a different >> issue. This sounds like it could be an issue with multiple kernels >> installed in Jupyter. When you start the notebook, can you see if there >> are multiple kernels listed in the "Kernel" -> "Change Kernel" menu? If >> so, please try one of the other kernels to see if Jupyter is starting by >> default with a non-spark kernel. Also, is it possible that you have more >> than one instance of the Jupyter server running? I.e. for this scenario, >> we start Jupyter itself directly via pyspark using the command sent >> previously, whereas usually Jupyter can just be started with `jupyter >> notebook`. In the latter case, PySpark (and thus `sc`) would *not* be >> available (unless you've set up special PySpark kernels separately). In >> summary, can you (1) check for other kernels via the menus, and (2) check >> for other running Jupyter servers that are non-PySpark? >> >> As for the other inquiry, great question! When training models, it's >> quite useful to track the loss and other metrics (i.e. accuracy) from >> *both* the training and validation sets. The reasoning is that it allows >> for a more holistic view of the overall learning process, such as >> evaluating whether any overfitting or underfitting is occurring. For >> example, say that you train a model and achieve an accuracy of 80% on the >> validation set. Is this good? Is this the best that can be done? Without >> also tracking performance on the training set, it can be difficult to make >> these decisions. Say that you then measure the performance on the training >> set and find that the model achieves 100% accuracy on that data. That >> might be a good indication that your model is overfitting the training set, >> and that a combination of more data, regularization, and a smaller model >> may be helpful in raising the generalization performance, i.e. the >> performance on the validation set and future real examples on which you >> wish to make predictions. If on the other hand, the model achieved an 82% >> on the training set, this could be a good indication that the model is >> underfitting, and that a combination of a more expre
Re: Please reply ASAP : Regarding incubator systemml/breast_cancer project
Hi Aishwarya, Unfortunately this mailing list removes all images, so I can't view your screenshot. I'm assuming that it is the same issue with the missing SparkContext `sc` object, but please let me know if it is a different issue. This sounds like it could be an issue with multiple kernels installed in Jupyter. When you start the notebook, can you see if there are multiple kernels listed in the "Kernel" -> "Change Kernel" menu? If so, please try one of the other kernels to see if Jupyter is starting by default with a non-spark kernel. Also, is it possible that you have more than one instance of the Jupyter server running? I.e. for this scenario, we start Jupyter itself directly via pyspark using the command sent previously, whereas usually Jupyter can just be started with `jupyter notebook`. In the latter case, PySpark (and thus `sc`) would *not* be available (unless you've set up special PySpark kernels separately). In summary, can you (1) check for other kernels via the menus, and (2) check for other running Jupyter servers that are non-PySpark? As for the other inquiry, great question! When training models, it's quite useful to track the loss and other metrics (i.e. accuracy) from *both* the training and validation sets. The reasoning is that it allows for a more holistic view of the overall learning process, such as evaluating whether any overfitting or underfitting is occurring. For example, say that you train a model and achieve an accuracy of 80% on the validation set. Is this good? Is this the best that can be done? Without also tracking performance on the training set, it can be difficult to make these decisions. Say that you then measure the performance on the training set and find that the model achieves 100% accuracy on that data. That might be a good indication that your model is overfitting the training set, and that a combination of more data, regularization, and a smaller model may be helpful in raising the generalization performance, i.e. the performance on the validation set and future real examples on which you wish to make predictions. If on the other hand, the model achieved an 82% on the training set, this could be a good indication that the model is underfitting, and that a combination of a more expressive model and better data could be helpful. In summary, tracking performance on both the training and validation datasets can be useful for determining ways in which to improve the overall learning process. - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 25, 2017, at 8:47 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> > wrote: > > We had another query, sir. We read the entire MachineLearning.ipynb code. > in it the training samples and the validation samples have both been > evaluated separately and their respective losses and accuracies obtained. > Why are the training samples being evaluated again if they were used to > train the model in the first place? Shouldn't only the validation data > frames be evaluated to find out the loss and accuracy? > > Thank you > > On 25-Apr-2017 4:00 PM, "Aishwarya Chaurasia" <aishwarya2...@gmail.com> > wrote: > >> Hello sir, >> >> The NameError is occuring again sir. Why does it keep resurfacing? >> >> Attaching the screenshot of the error. >> >>> On 25-Apr-2017 2:50 AM, <dusenberr...@gmail.com> wrote: >>> >>> Hi Aishwarya, >>> >>> For the error message, that just means that the SystemML jar isn't being >>> found. Can you add a `--driver-class-path >>> $SYSTEMML_HOME/target/SystemML.jar` >>> to the invocation of Jupyter? I.e. `PYSPARK_PYTHON=python3 >>> PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS="notebook" >>> pyspark --jars $SYSTEMML_HOME/target/SystemML.jar --driver-class-path >>> $SYSTEMML_HOME/target/SystemML.jar`. There was a PySpark bug that was >>> supposed to have been fixed in Spark 2.x, but it's possible that it is >>> still an issue. >>> >>> As for the output, the notebook will create SystemML `Matrix` objects for >>> all of the weights and biases of the trained models. To save, please >>> convert each one to a DataFrame, i.e. `Wc1.toDF()` and repeated for each >>> matrix, and then simply save the DataFrames. This could be done all at >>> once like this for a SystemML Matrix object `Wc1`: >>> `Wc1.toDf().write.save("path/to/save/Wc1.parquet", format="parquet")`. >>> Just repeat for each matrix returned by the "Train" code for the >>> algorithms. At that point, you will have a set of saved DataFrames >
Evaluate a scalar DAG during compilation
During compilation, is it possible to evaluate a scalar sub-DAG of scalar operations in which all leaf nodes are literals to allow for replacement with a literal? For example, in our `nn` library, our convolution and pooling layers have to pass around the spatial dimensions (height and width) of the images that are stretched out into rows of the input/output matrices. These output dimensions are computed within the forward functions of the above layers as small scalar equations. From a mathematical standpoint, these sizes can be determined at compile time, and it is nice to have these size equations in DML (v.s. hiding them inside the engine within built-in functions). However, we do not currently evaluate these expressions during compilation, and thus we are left with unknown sizes even during recompilation. This naturally leads to max memory estimates and thus often leads to unnecessary distributed runtime ops rather than simple CP ones. Thoughts? -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone.
Re: function default parameters
Yeah we should adopt the syntax that R and Python both use, in which default arguments are defined in the function definition. Primitive types such as ints and strings can be set in the function definition, and more complex types such as matrices can simply use a null value as the default in the function definition, followed by an actual assignment within the function body. In R: ``` f <- function(x=3) x f() # 3 f(2) # 2 ``` ``` f <- function(x=NULL) { if (is.null(x)) x = matrix(4, 1, 10) x } f() # matrix of 4's f(matrix(2, 5, 12)) # matrix of 2's ``` Same thing in Python, except it uses `None` instead of `NULL`: ``` def f(x=3): return x f() # 3 f(2) # 2 ``` ``` def f(x=None): if x is None: x = [1,2,3] return x f() # list [1,2,3] f([4,5,6]) # list [4,5,6] ``` -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 21, 2017, at 5:40 PM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > BTW, that is assuming our algorithms have been converted to functions. > Deron > > > On Fri, Apr 21, 2017 at 5:37 PM, Deron Eriksson <deroneriks...@gmail.com> > wrote: > >> Thank you Matthias. I highly agree with your idea about having a default >> specification similar to R WRT the function signatures for default values. >> >> This becomes a significant issue for some of our algorithms, where they >> might take in 10 arguments but default values are should typically be used >> for 6+ or 7+ of the arguments. >> >> Deron >> >> >> On Fri, Apr 21, 2017 at 5:25 PM, Matthias Boehm <mboe...@googlemail.com> >> wrote: >> >>> well, for arguments passed into dml scripts there is of course ifdef($b, >>> 2) >>> but for functions there is indeed no good support. At runtime level we >>> still support default parameters for scalar arguments at the tail of the >>> parameter list but I guess at one point the corresponding parser support >>> was discontinued. >>> >>> I personally would like a default specification similar to R in the >>> function signature with the corresponding function calls that bind values >>> to a subset of parameters. >>> >>> Regards, >>> Matthias >>> >>> On Fri, Apr 21, 2017 at 4:18 PM, Deron Eriksson <deroneriks...@gmail.com> >>> wrote: >>> >>>> Is there a way to set default parameter values using DML? I believe >>> both R >>>> and Python offer this capability. >>>> >>>> The only solution I could come up with using DML is to pass in a >>> variable >>>> that is NaN and cast this to a string and use this string in an if >>>> conditional statement. >>>> >>>> addone = function(double b) return (double a) { >>>>c = ''+b; >>>>if (c == 'NaN') { >>>>b = 2.0 >>>>} >>>>a = b + 1; >>>> } >>>> >>>> z=0.0/0.0; >>>> x = addone(z); >>>> print(x); >>>> y = addone(4.0); >>>> print(y); >>>> >>>> Is there a cleaner way to accomplish this, or is DML lacking this R >>>> feature? >>>> >>>> Deron >>>> >>>> -- >>>> Deron Eriksson >>>> Spark Technology Center >>>> http://www.spark.tc/ >>>> >>> >> >> >> >> -- >> Deron Eriksson >> Spark Technology Center >> http://www.spark.tc/ >> >> > > > -- > Deron Eriksson > Spark Technology Center > http://www.spark.tc/
Re: Regarding incubator systemml/breast_cancer project
Hi Aishwarya, Looks like you've just encountered an out of memory error on one of the executors. Therefore, you just need to adjust the `spark.executor.memory` and `spark.driver.memory` settings with higher amounts of RAM. What is your current setup? I.e. are you using a cluster of machines, or a single machine? We generally use a large driver on one machine, and then a single large executor on each other machine. I would give a sizable amount of memory to the driver, and about half the possible memory on the executors so that the Python processes have enough memory as well. PySpark has JVM and Python components, and the Spark memory settings only pertain to the JVM side, thus the need to save about half the executor memory for the Python side. Thanks! - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 19, 2017, at 5:53 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> > wrote: > > Hello sir, > > We also wanted to ensure that the spark-submit command we're using is the > correct one for running 'preprocess.py'. > Command : /home/new/sparks/bin/spark-submit preprocess.py > > > Thank you. > Aishwarya Chaurasia. > > On 19-Apr-2017 3:55 PM, "Aishwarya Chaurasia" <aishwarya2...@gmail.com> > wrote: > > Hello sir, > On running the file preprocess.py we are getting the following error : > > https://paste.fedoraproject.org/paste/IAvqiiyJChSC0V9eeETe2F5M1UNdIG > YhyRLivL9gydE= > > Can you please help us by looking into the error and kindly tell us the > solution for it. > Thanks a lot. > Aishwarya Chaurasia > > >> On 19-Apr-2017 12:43 AM, <dusenberr...@gmail.com> wrote: >> >> Hi Aishwarya, >> >> Certainly, here is some more detailed information about`preprocess.py`: >> >> * The preprocessing Python script is located at >> https://github.com/apache/incubator-systemml/blob/master/ >> projects/breast_cancer/preprocess.py. Note that this is different than >> the library module at https://github.com/apache/incu >> bator-systemml/blob/master/projects/breast_cancer/breastc >> ancer/preprocessing.py. >> * This script is used to preprocess a set of histology slide images, >> which are `.svs` files in our case, and `.tiff` files in your case. >> * Lines 63-79 contain "settings" such as the output image sizes, folder >> paths, etc. Of particular interest, line 72 has the folder path for the >> original slide images that should be commonly accessible from all machines >> being used, and lines 74-79 contain the names of the output DataFrames that >> will be saved. >> * Line 82 performs the actual preprocessing and creates a Spark >> DataFrame with the following columns: slide number, tumor score, molecular >> score, sample. The "sample" in this case is the actual small, chopped-up >> section of the image that has been extracted and flattened into a row >> Vector. For test images without labels (`training=false`), only the slide >> number and sample will be contained in the DataFrame (i.e. no labels). >> This calls the `preprocess(...)` function located on line 371 of >> https://github.com/apache/incubator-systemml/blob/master/ >> projects/breast_cancer/breastcancer/preprocessing.py, which is a >> different file. >> * Line 87 simply saves the above DataFrame to HDFS with the name from >> line 74. >> * Line 93 splits the above DataFrame row-wise into separate "training" >> and "validation" DataFrames, based on the split percentage from line 70 >> (`train_frac`). This is performed so that downstream machine learning >> tasks can learn from the training set, and validate performance and >> hyperparameter choices on the validation set. These DataFrames will start >> with the same columns as the above DataFrame. If `add_row_indices` from >> line 69 is true, then an additional row index column (`__INDEX`) will be >> pretended. This is useful for SystemML in downstream machine learning >> tasks as it gives the DataFrame row numbers like a real matrix would have, >> and SystemML is built to operate on matrices. >> * Lines 97 & 98 simply save the training and validation DataFrames using >> the names defined on lines 76 & 78. >> * Lines 103-137 create smaller train and validation DataFrames by taking >> small row-wise samples of the full train and validation DataFrames. The >> percentage of the sample is defined on line 111 (`p=0.01` for a 1% >> sample). This is generally useful for quicker downstream tasks without >> having to load in the larger DataFrames, assuming
Re: Regarding incubator systemml/breast_cancer project
Hi Aishwarya, Certainly, here is some more detailed information about`preprocess.py`: * The preprocessing Python script is located at https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/preprocess.py. Note that this is different than the library module at https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/breastcancer/preprocessing.py. * This script is used to preprocess a set of histology slide images, which are `.svs` files in our case, and `.tiff` files in your case. * Lines 63-79 contain "settings" such as the output image sizes, folder paths, etc. Of particular interest, line 72 has the folder path for the original slide images that should be commonly accessible from all machines being used, and lines 74-79 contain the names of the output DataFrames that will be saved. * Line 82 performs the actual preprocessing and creates a Spark DataFrame with the following columns: slide number, tumor score, molecular score, sample. The "sample" in this case is the actual small, chopped-up section of the image that has been extracted and flattened into a row Vector. For test images without labels (`training=false`), only the slide number and sample will be contained in the DataFrame (i.e. no labels). This calls the `preprocess(...)` function located on line 371 of https://github.com/apache/incubator-systemml/blob/master/projects/breast_cancer/breastcancer/preprocessing.py, which is a different file. * Line 87 simply saves the above DataFrame to HDFS with the name from line 74. * Line 93 splits the above DataFrame row-wise into separate "training" and "validation" DataFrames, based on the split percentage from line 70 (`train_frac`). This is performed so that downstream machine learning tasks can learn from the training set, and validate performance and hyperparameter choices on the validation set. These DataFrames will start with the same columns as the above DataFrame. If `add_row_indices` from line 69 is true, then an additional row index column (`__INDEX`) will be pretended. This is useful for SystemML in downstream machine learning tasks as it gives the DataFrame row numbers like a real matrix would have, and SystemML is built to operate on matrices. * Lines 97 & 98 simply save the training and validation DataFrames using the names defined on lines 76 & 78. * Lines 103-137 create smaller train and validation DataFrames by taking small row-wise samples of the full train and validation DataFrames. The percentage of the sample is defined on line 111 (`p=0.01` for a 1% sample). This is generally useful for quicker downstream tasks without having to load in the larger DataFrames, assuming you have a large amount of data. For us, we have ~7TB of data, so having 1% sampled DataFrames is useful for quicker downstream tests. Once again, the same columns from the larger train and validation DataFrames will be used. * Lines 146 & 147 simply save these sampled train and validation DataFrames. As a summary, after running `preprocess.py`, you will be left with the following saved DataFrames in HDFS: * Full DataFrame * Training DataFrame * Validation DataFrame * Sampled training DataFrame * Sampled validation DataFrame As for visualization, you may visualize a "sample" (i.e. small, chopped-up section of original image) from a DataFrame by using the `breastcancer.visualization.visualize_sample(...)` function. You will need to do this after creating the DataFrames. Here is a snippet to visualize the first row sample in a DataFrame, where `df` is one of the DataFrames from above: ``` from breastcancer.visualization import visualize_sample visualize_sample(df.first().sample) ``` Please let me know if you have any additional questions. Thanks! - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 15, 2017, at 4:38 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> > wrote: > > Hello sir, > Can you please elaborate more on what output we would be getting because we > tried executing the preprocess.py file using spark submit it keeps on > adding the tiles in rdd and while running the visualisation.py file it > isn't showing any output. Can you please help us out asap stating the > output we will be getting and the sequence of execution of files. > Thank you. > >> On 07-Apr-2017 5:54 AM, <dusenberr...@gmail.com> wrote: >> >> Hi Aishwarya, >> >> Thanks for sharing more info on the issue! >> >> To facilitate easier usage, I've updated the preprocessing code by pulling >> out most of the logic into a `breastcancer/preprocessing.py` module, >> leaving just the execution in the `Preprocessing.ipynb` notebook. There is >> also a `pr
Re: [VOTE] Apache SystemML 0.14.0-incubating (RC3)
+1 and please call it `branch-0.14`. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 17, 2017, at 8:50 AM, Arvind Surve <ac...@yahoo.com.INVALID> wrote: > > I will create next RC (RC4) for SystemML 0.14 in day or two and create a > branch. > Arvind Surve | Spark Technology Center | http://www.spark.tc/ > > From: Niketan Pansare <npan...@us.ibm.com> > To: dev@systemml.incubator.apache.org > Cc: Arvind Surve <ac...@yahoo.com> > Sent: Sunday, April 16, 2017 11:57 AM > Subject: Re: [VOTE] Apache SystemML 0.14.0-incubating (RC3) > >> we should create a 0.14 branch along with it to unblock ongoing >> development > > +1 > >> On Apr 15, 2017, at 9:27 PM, Matthias Boehm <mboe...@googlemail.com> wrote: >> >> I think SYSTEMML-1518 and SYSTEMML-1520 require a new RC and I agree that >> we should create a 0.14 branch along with it to unblock ongoing >> development. I'm happy to backport any additional fixes into this branch >> until we have a solid release candidate. >> >> Regards, >> Matthias >> >> On Thu, Apr 13, 2017 at 5:34 PM, Arvind Surve <ac...@yahoo.com.invalid> >> wrote: >> >>> Please vote on releasing the following candidate as Apache SystemML >>> version 0.14.0-incubating ! >>> >>> The vote is open for at least 72 hours and passes if a majority of at >>> least 3 +1 PMC votes are cast. >>> >>> [ ] +1 Release this package as Apache SystemML 0.14.0-incubating >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache SystemML, please see http://systemml.apache. >>> org/ >>> >>> The tag to be voted on is v0.14.0-incubating-rc3 ( >>> fe6d887420143277aa8930cbea6d43a460ae7789) >>> >>> https://github.com/apache/incubator-systemml/commit/ >>> fe6d887420143277aa8930cbea6d43a460ae7789 >>> >>> >>> The release artifacts can be found at : >>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0. >>> 14.0-incubating-rc3/ >>> >>> The maven release artifacts, including signatures, digests, etc. can >>> be found at: >>> https://repository.apache.org/content/repositories/ >>> orgapachesystemml-1020/org/apache/systemml/systemml/0.14.0-incubating/ >>> >>> = >>> == Apache Incubator release policy == >>> = >>> Please find below the guide to release management during incubation: >>> http://incubator.apache.org/guides/releasemanagement.html >>> >>> === >>> == How can I help test this release? == >>> === >>> If you are a SystemML user, you can help us test this release by taking >>> an existing Algorithm or workload and running on this release candidate, >>> then >>> reporting any regressions. >>> >>> >>> == What justifies a -1 vote for this release? == >>> >>> -1 votes should only occur for significant stop-ship bugs or legal >>> related issues (e.g. wrong license, missing header files, etc). Minor bugs >>> or regressions should not block this release. >>> -Arvind Arvind Surve | Spark Technology Center | http://www.spark.tc/ > >
Re: Regarding incubator systemml/breast_cancer project
Hi Aishwarya, Thanks for sharing more info on the issue! To facilitate easier usage, I've updated the preprocessing code by pulling out most of the logic into a `breastcancer/preprocessing.py` module, leaving just the execution in the `Preprocessing.ipynb` notebook. There is also a `preprocess.py` script with the same contents as the notebook for use with `spark-submit`. The choice of the notebook or the script is just a matter of convenience, as they both import from the same `breastcancer/preprocessing.py` package. As part of the updates, I've added an explicit SparkSession parameter (`spark`) to the `preprocess(...)` function, and updated the body to use this SparkSession object rather than the older SparkContext `sc` object. Previously, the `preprocess(...)` function accessed the `sc` object that was pulled in from the enclosing scope, which would work while all of the code was colocated within the notebook, but not if the code was extracted and imported. The explicit parameter now allows for the code to be imported. Can you please try again with the latest updates? We are currently using Spark 2.x with Python 3. If you use the notebook, the pyspark kernel should have a `spark` object available that can be supplied to the functions (as is done now in the notebook), and if you use the `preprocess.py` script with `spark-submit`, the `spark` object will be created explicitly by the script. For a bit of context to others, Aishwarya initially reached out to find out if our breast cancer project could be applied to TIFF images, rather than the SVS images we are currently using (the answer is "yes" so long as they are "generic tiled TIFF images, according to the OpenSlide documentation), and then followed up with Spark issues related to the preprocessing code. This conversation has been promptly moved to the mailing list so that others in the community can benefit. Thanks! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 6, 2017, at 5:09 AM, Aishwarya Chaurasia <aishwarya2...@gmail.com> > wrote: > > Hey, > > The object sc is already defined in pyspark and yet this name error keeps > occurring. We are using spark 2.* > > Here is the link to error that we are getting : > https://paste.fedoraproject.org/paste/89iQODxzpNZVbSfgwocH8l5M1UNdIGYhyRLivL9gydE=
Re: Java compiler for code generation
Using Janino sounds like a great idea. As for the footprint size for Java-only execution modes, it might make sense to do an audit of our current dependencies to see if anything can be removed to make up for the additional amount. Then we could just use it in all scenarios without worry. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Mar 31, 2017, at 9:25 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > > that is a good question. Yes, if we want to enable code generation in such > a scenario it would also need Janino, which increases our footprint by > roughly 0.6MB. > > Btw, Janino fits much better into such an in-memory deployment because it > compiles classes in-memory without the need to write class files into a > local working directory. The same could be done for > javax.tools.JavaCompiler, but would require to custom in-memory > JavaFileManager. > > Regards, > Matthias > > On Fri, Mar 31, 2017 at 9:14 PM, Berthold Reinwald <reinw...@us.ibm.com> > wrote: > >> Sounds like a good idea. >> >> Wrt codegen, in a pure Java scoring environment w/o Spark and Hadoop, will >> the dependency on Janino still be there (that question applies to JDK as >> well), and what is the footprint? >> >> Regards, >> Berthold Reinwald >> IBM Almaden Research Center >> office: (408) 927 2208; T/L: 457 2208 >> e-mail: reinw...@us.ibm.com >> >> >> >> From: Matthias Boehm <mboe...@googlemail.com> >> To: dev@systemml.incubator.apache.org >> Date: 03/31/2017 08:17 PM >> Subject:Java compiler for code generation >> >> >> >> Hi all, >> >> currently, our new code generator for operator fusion, uses the >> programmatic javax.tools.JavaCompiler, which is Java's standard API for >> compilation. Despite a plan cache that mitigates unnecessary compilation >> and recompilation overheads, we still see significant end-to-end overhead >> especially for small input data. >> >> Moving forward, I'd like to switch to Janino >> (org.codehaus.janino.SimpleCompiler), which is a fast in-memory Java >> compiler with restricted language support. The advantages are >> >> (1) Reduced compilation overhead: On end-to-end scenarios for L2SVM, GLM, >> and MLogreg, Janino improved total javac compilation time from 2.039 to >> 0.195 (14 operators), from 8.134 to 0.411 (82 operators), and from 4.854 >> to >> 0.283 (46 operators), respectively. At the same time, there was no >> measurable impact on runtime efficiency, but even slightly reduced JIT >> compilation overhead. >> >> (2) Removed JDK requirement: Using the standard javax.tools.JavaCompiler >> requires the existence of a JDK, while Janino only requires a JRE, which >> means it makes it easier to apply code generation by default. >> >> However, I'm raising this here as Janino would add another explicit >> dependency (with BSD license). Fortunately, Spark also uses Janino for >> whole-stage-codegen. So we should be able to mark Janino as provided >> library. The only issue is a pure Hadoop environment, where we still want >> to use code generation for CP operations. To simplify the build, I could >> imagine using the javax.tools.JavaCompiler for hadoop execution types, but >> Janino by default. >> >> If you have any concerns, please let me know by Monday; otherwise I'd like >> to push this change into our upcoming 0.14 release. >> >> >> Regards, >> Matthias >> >> >> >> >>
Re: UDFs Within Expressions
Great, we should definitely add this to the 1.0 release in order to allow for more expressivity in our DML, and to allow for the cleanup of existing DML that has had to code around this, such as the `nn` library. I will add a JIRA (or search for one) and tag it for 1.0. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Mar 29, 2017, at 4:18 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > > Well, this would indeed be a very useful extension - I've actually seen > many use cases, where new users ran into issues with simple expressions > like X[i,i] = foo(). In the general case, the problem with UDFs is that > they can have - in contrast to builtin functions - multiple returns. These > multiple returns would translate to HOPs with multiple outputs, which in > turn cannot be represented with our current HOP DAG representation because > a HOP represents an operation and the characteristics of a single output, > but potentially with many consumers. This is also the reason why builtin > functions with multiple outputs (i.e., lu, eigen, qr) are internally mapped > to FunctionOps with the same restrictions. > > However, we could actually allow UDFs with a single output in expressions. > This would require a generalization of how results variables are bound but > should not take too much effort. Additional it would require a full pass > through the compiler to remove any assumptions that FunctionOps always > appear as DAG root nodes. Bottom line: We could realistically add it to our > feature list for the 1.0 release. > > Regards, > Matthias > >> On Wed, Mar 29, 2017 at 3:55 PM, <dusenberr...@gmail.com> wrote: >> >> Currently, it is not possible to use UDFs within an expression. I.e. I'd >> like to be able to use something like `out = (-1/2) * >> util::my_function(x)`. This would of course extend to more elaborate >> expressions. Also, note that we *are* able to use built-in functions >> within expressions. >> >> I think it would be good to allow for this. Are there any issues that >> would make this difficult? >> >> -Mike >> >> -- >> >> Mike Dusenberry >> GitHub: github.com/dusenberrymw >> LinkedIn: linkedin.com/in/mikedusenberry >> >> Sent from my iPhone. >> >>
Re: Release cadence
+1 for immediately starting work on SystemML 1.0 as our next release. At this point, the project and our users will benefit most from a thorough cleanup, as it will make the project simpler to use and easier to maintain. Simplicity will allow users and maintainers to regain focus on ML research and products, which is a win for the entire community. We should create a solid list of items that we, and the rest of the community, want to address for the 1.0 release and make sure that they are indeed completed. At the same time, we should ensure that we don't drag out the release process. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Mar 6, 2017, at 10:14 AM, Luciano Resende <luckbr1...@gmail.com> wrote: > > +1 for SystemML 1.0 as the next release. > > On Sat, Mar 4, 2017 at 10:23 AM, Deron Eriksson <deroneriks...@gmail.com> > wrote: > >> Personally I would like the next release to be 1.0. We have been an >> incubator project since November 2015 and I believe that after over 1,000 >> commits since then that the project is about ready for a 1.0 release. >> >> I agree with Matthias that we need to make a decision regarding this topic. >> For new issues and fixed issues in JIRA, we need to be able to assign the >> correct version, or else someone potentially needs to go through and fix >> the version numbers, as Glenn has been doing. Additionally, it would be >> nice to do some of the 1.0 code updates (such as removing the old >> MLContext) now rather than waiting additional months. Also I would like to >> be able to correctly identify our next version in the online documentation. >> >> > How about just make SystemML Next and change the release name when we do > the release ? > > > >> Deron >> >> >> On Sat, Mar 4, 2017 at 12:47 AM, Matthias Boehm <mboe...@googlemail.com> >> wrote: >> >>> thanks Arvind for bringing some structure to the release process. I >> think a >>> fixed cadence of 2 months is useful as it makes upcoming releases more >>> predictable for devs and users. >>> >>> However, we're discussing a major 1.0 release for a while now. I think it >>> would be useful to come to an agreement if we go for 1.0 in April or not. >>> There are some pending changes such as removing the old MLContext, >> removing >>> the file-based transform, isolating the matrix block library, and some >>> language changes that should only be addresses in a major release as they >>> break backwards compatibility. Right now, we can't touch these changes >>> without knowing the target release. >>> >>> Personally, I don't see a good reason why we should wait. Postponing this >>> major release just creates unnecessary overhead in maintaining these old >>> components that will be removed eventually. Since we cut RC for 0.13 on >> Feb >>> 20, I think having an RC around April 20 would be a good target for this >>> 1.0 release. >>> >>> >>> Regards, >>> Matthias >>> >>> >>> On Fri, Mar 3, 2017 at 5:44 PM, Arvind Surve <ac...@yahoo.com.invalid> >>> wrote: >>> >>>> Based on last couple of release cycles, we will continue with 2 months >>>> release cycles.We will do first RC build by end of first week of second >>>> month. >>>> We will plan on releasing next release by end of April 2017.We will >> have >>>> RC build on ~April 6th. -Arvind >>>> Arvind Surve | Spark Technology Center | http://www.spark.tc/ >>>> >>>> From: Acs S <ac...@yahoo.com.INVALID> >>>> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator. >>>> apache.org> >>>> Sent: Monday, January 9, 2017 11:41 AM >>>> Subject: Re: Release cadence >>>> >>>> We need to release SystemML on more frequent basis to get community >>>> engaged. It will provide us more feedback on functionality we add.While >>>> releasing SystemML on monthly basis is challenge due to longer phase of >>>> validation process we need to find a way to be quicker. >>>> I can propose options to get closer to monthly release if acceptable. >>>> Make every two releases available on monthly basis and third on two >>> months >>>> basis. This cycle will continue. >>>> 1. Do minimal testing on two releases (minor releases) and release them >>> on >>>> monthly basis. Pe
Re: Dropping Java 6 and 7 support
+1 -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Mar 7, 2017, at 10:49 AM, Niketan Pansare <npan...@us.ibm.com> wrote: > > +1 > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > Berthold Reinwald---03/06/2017 11:16:19 PM---+1 on removing java 6 and 7. > Regards, > > From: Berthold Reinwald/Almaden/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 03/06/2017 11:16 PM > Subject: Re: Dropping Java 6 and 7 support > > > > > +1 on removing java 6 and 7. > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 03/06/2017 10:58 PM > Subject:Dropping Java 6 and 7 support > > > > Hi all, > > I'd like to drop the support for Java 6 and 7 in our SystemML 1.0 release. > Our build still refers to a java compliance level 6, which has not been > changed for more than 5 years now. Spark >= 1.5 anyway requires Java 7 and > there has been some discussion on removing Java 7 as well because it > reached end of life in April 2015. Moving to Java 8 would allow us to > modernize the code base going forward and the 1.0 release would be the > perfect time for this change. > > Regards, > Matthias > > > > > > >
Re: [DISCUSS] SystemML Graduation
+1 Thanks for bringing up this topic, Luciano. I definitely think it is the right time to start discussing graduation. The past 16 months have shown a sustained and growing level of commitment to the project, with several exciting new areas of development that the community is continuing to work on. As a community, we've grown to value and embrace the Apache process, and it's allowed us to hold effective public discussions on code, branding, etc., to the benefit of the project. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Mar 3, 2017, at 5:41 PM, Nakul Jindal <naku...@gmail.com> wrote: > > +1 > > Thank you Luciano for starting this discussion and the guidance you've > provided on this project. > In addition to the aforementioned accomplishments of the project, the > roadmap (which has been on the mailing list) also directs us towards making > continued healthy progress. > > Nakul Jindal > > >> On Fri, Mar 3, 2017 at 5:00 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: >> >> +1 >> >> Thank you Luciano for starting the discussion and for all the guidance >> you've provided from the beginning of the project. I agree that the Apache >> SystemML community has grown and achieved many exciting things during >> incubation. For example, today we completed our fifth release of Apache >> SystemML after releasing previous version in February. Graduating to a >> top-level project will be another important accomplishment and help >> continue momentum with developers and users. >> >> Regards, >> Glenn >> >> >> [image: Inactive hide details for Deron Eriksson ---03/03/2017 12:02:35 >> PM---+1 Thank you for starting this important discussion Lucian]Deron >> Eriksson ---03/03/2017 12:02:35 PM---+1 Thank you for starting this >> important discussion Luciano, and thank you for >> >> From: Deron Eriksson <deroneriks...@gmail.com> >> To: dev@systemml.incubator.apache.org >> Date: 03/03/2017 12:02 PM >> Subject: Re: [DISCUSS] SystemML Graduation >> -- >> >> >> >> +1 >> >> Thank you for starting this important discussion Luciano, and thank you for >> all the guidance that you have provided us regarding the Apache Incubator, >> the Apache Software Foundation, and open-source software development! I'd >> also like to thank Henry for all the great assistance and hard work since >> becoming an additional mentor for the project. >> >> I believe that we may indeed be ready to graduate to a top level project >> due both to our technical efforts and our community efforts. Since we >> became an incubator project, in terms of code we have consistently >> demonstrated a high level of excellent activity from a wide range of >> contributors. We have 1,065 commits since we became an incubator project >> and have closed 391 pull requests in that time. Additionally, over time, we >> have all learned many best practices and Apache guidelines, for example how >> to properly validate our source releases in terms of content and licenses. >> We have also learned the processes involved with topics such as JIRA, >> GitHub, Git, Subversion, and software releases, and how to interact with >> groups such as Apache infrastructure to effectively develop open-source >> software following the Apache way. >> >> I think everyone on the SystemML project has also worked hard to build an >> open community around the project. We have open discussions on technical >> matters, especially in the area of pull requests, and these discussions >> demonstrate a consistent ability to reach consensus while allowing >> respectful disagreement. I believe our mailing list could be used more >> frequently, since it offers a more centralized location for discussions >> (compared to pull request discussions), which could be an addition way to >> help the community. However, we do have important discussions on the >> mailing list, for example in regards to questions from users, and >> communication on the mailing list is positive and encouraging to community >> growth. >> >> Deron >> >> >> On Thu, Mar 2, 2017 at 5:14 PM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >> >>> It has been an exciting 16 months so far, and the project has >> accomplished >>> 4 official Apache Releases and is currently requesting the IPMC to >> approve >>> the 5th release. We have voted 3 new committers and PPMC members and >>> welcomed a ne
Re: [VOTE] Apache SystemML 0.13.0-incubating (RC2)
+1 Installed the Python package using the URL and ran a quick sanity test using MLContext. The Python package is installed correctly, and the JAR is seamlessly installed in the background as desired. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 23, 2017, at 1:46 PM, Nakul Jindal <naku...@gmail.com> wrote: > > +1 > > Basic sanity tests pass on Mac. > > On Thu, Feb 23, 2017 at 1:14 PM, Deron Eriksson <deroneriks...@gmail.com> > wrote: > >> +1 >> >> Performed the following validations for artifacts at >> https://dist.apache.org/repos/dist/dev/incubator/systemml/0. >> 13.0-incubating-rc2/ >> : >> >> 1. -bin.tgz/-bin.zip contain disclaimer, license, notice >> 2. -bin.tgz/-bin.zip licenses reference all included dependencies with >> correct licenses >> 3. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar contains >> disclaimer, license, notice >> 3. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar contains antlr >> runtime and wink classes >> 4. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar license references >> antlr runtime and wink >> 5. -python.tgz contains disclaimer, license notice >> 6. -python.tgz license references antlr runtime and wink with correct >> licenses >> 7. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar >> contains disclaimer, license, notice >> 8. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar >> contains antlr runtime and wink classes >> 9. -python.tgz systemml/systemml-java/systemml-0.13.0-incubating.jar >> license references antlr runtime and wink >> 10. -src.tgz/-src.zip contain disclaimer, license, notice >> 11. -src.tgz/-src.zip licenses reference all included projects (jquery, >> etc) with correct licenses >> 12. -src.tgz/-src.zip contain no binaries (dll, exe, pdb, lib) >> 13. -src.tgz/-src.zip build project artifacts (mvn clean package -P >> distribution) >> 14. -src.tgz/-src.zip SystemML jar runs (hello world) >> 15. -src.tgz/-src.zip test suite runs (mvn verify) >> 16. -bin.tgz/-bin.zip runStandaloneSystemML.sh (hello world) >> 17. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit >> 2.0.2 >> (hello world) >> 18. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit >> 2.1.0 >> (hello world) >> 19. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 (hello >> world) >> 20. -bin.tgz/-bin.zip runStandaloneSystemML.sh (univar stats, haberman >> data) >> 21. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit >> 2.0.2 >> (univar stats, generated data) >> 22. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar spark-submit >> 2.1.0 >> (univar stats, generated data) >> 23. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 >> default >> exec mode (univar stats, generated data) >> 24. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar hadoop 2.7 hadoop >> exec mode (univar stats, generated data) >> 25. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar MLContext >> spark-shell 2.0.2 (univar stats, haberman data) >> 26. -bin.tgz/-bin.zip lib/systemml-0.13.0-incubating.jar MLContext >> spark-shell 2.1.0 (univar stats, haberman data) >> >> >> >> On Wed, Feb 22, 2017 at 7:23 PM, Arvind Surve <ac...@yahoo.com.invalid> >> wrote: >> >>> Please vote on releasing the following candidate as Apache SystemML >>> version 0.13.0-incubating ! >>> >>> The vote is open for at least 72 hours and passes if a majority of at >>> least 3 +1 PMC votes are cast. >>> >>> [ ] +1 Release this package as Apache SystemML 0.13.0-incubating >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache SystemML, please see http://systemml.apache. >>> org/ >>> >>> The tag to be voted on is v0.13.0-incubating-rc2 ( >>> ff3e741694e507f64a6b52ee71638bddecabe7af) >>> >>> https://github.com/apache/incubator-systemml/commit/ >>> ff3e741694e507f64a6b52ee71638bddecabe7af >>> >>> The release artifacts can be found at : >>> https://dist.apache.org/repos/dist/dev/incubator/systemml/0. >>> 13.0-incubating-rc2/ >>> >>> The maven release artifacts, including signatures, digests, etc. can >>> be found at: >>> >>> https://repository.apache.org/content/repositories/
Re: Proposal to add 'accuracy test suite' before 1.0 release
There is also the possibility of writing the correctness tests completely in DML itself, thus allowing an ML researcher / data scientist to easily create the tests. For example, the SystemML-NN library has a full test suite written entirely in DML in the `nn/test/` directory (i.e. no Java tests) that tests mathematical correctness of gradients, as well as general correctness of various layers as needed. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 17, 2017, at 5:46 PM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > +1 for creating tests for the main algorithm scripts. This would be a great > addition to the project. > > Note that the creation of tests (junit) typically requires some Java skills > (and knowledge of ml algorithms) whereas a new algorithm script typically > requires R/Python skills. Therefore, testing of algorithms probably > requires some focused coordination between 'data scientists' and > 'developers' to occur for this to happen smoothly for new algorithms. > > Deron > > >> On Fri, Feb 17, 2017 at 5:28 PM, <dusenberr...@gmail.com> wrote: >> >> +1 for testing our actual (vs simplified test version) scripts against >> some metric of choice. This will allow us to (1) ensure that each script >> does not have a showstopper bug (engine bug), and (2) that this script is >> still producing a reasonable mathematical result (math bug). >> >> -Mike >> >> -- >> >> Mike Dusenberry >> GitHub: github.com/dusenberrymw >> LinkedIn: linkedin.com/in/mikedusenberry >> >> Sent from my iPhone. >> >> >>> On Feb 17, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote: >>> >>> For now, I have updated our python mllearn tests to compare the >> prediction of our algorithm to that of scikit-learn: >> https://github.com/apache/incubator-systemml/blob/ >> master/src/main/python/tests/test_mllearn_numpy.py#L81 >>> >>> The test now uses scikit-learn predictions as the baseline and computes >> the scores (accuracy score for classifiers and r2 score for regressors). If >> the score is greater than 95%, the test pass. Though using this approach, >> we do not measure the generalization capability of our algorithm, we at >> least ensure that our algorithm performs no worse than scikit-learn under >> default setting. We can make the testing even more rigorous later. The next >> step would be to enable these python tests through jenkins. >>> >>> Thanks, >>> >>> Niketan Pansare >>> IBM Almaden Research Center >>> E-mail: npansar At us.ibm.com >>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar >>> >>> Matthias Boehm ---02/17/2017 11:54:02 AM---Yes, this has been discussed >> a couple of times now, most recently in SYSTEMML-546. It takes quite s >>> >>> From: Matthias Boehm <mboe...@googlemail.com> >>> To: dev@systemml.incubator.apache.org >>> Date: 02/17/2017 11:54 AM >>> Subject: Re: Proposal to add 'accuracy test suite' before 1.0 release >>> >>> >>> >>> >>> Yes, this has been discussed a couple of times now, most recently in >>> SYSTEMML-546. It takes quite some effort though to create a >>> sophisticated algorithm-level test suite as done for GLM. So by all >>> means, please, go ahead and add these tests. >>> >>> However, I would not impose any constraints on the contribution of new >>> algorithms in that regard, or similarly on tests with simplified >>> algorithms because it would raise the bar to high. >>> >>> Regards, >>> Matthias >>> >>> >>>> On 2/17/2017 10:48 AM, Niketan Pansare wrote: >>>> >>>> >>>> Hi all, >>>> >>>> We currently test the correctness of individual runtime operators >> using our >>>> integration tests but not the "released" algorithms. To be fair, we do >> test >>>> a subset of "simplified" algorithms on synthetic datasets and compare >> the >>>> accuracy with R. Also, we are testing subset of released algorithms >> using >>>> our Python tests, but it's intended purpose is to only test the >> integration >>>> of the APIs: >>>> Simplified algorithms: >>>> https://github.com/apache/incubator-systemml/tree/ >> master/src/test/scripts/applications >>>> Released algorithms:
Re: Removal of workaround flags
Yeah I want us to look heavily into this problem in the context of deep learning algorithms. I think we should plan on having first-class support for DL in our 1.0 release, including efficient (distributed SGD) training (+GPUs) and efficient distributed scoring. Nice thing too is that when we achieve this, we'll end up benefiting most of our existing algorithms as well. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 15, 2017, at 12:22 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Hi Matthias, > > I am OK with removing this flag, but would prefer that we keep the JIRA open > until we are sure that caching is not a bottleneck. I have noticed that the > gradients turns to sparse as we execute more iterations. Also, cache release > time is dependent on the memory budget. Here are the statistics running Lenet > on MNIST using > https://github.com/apache/incubator-systemml/tree/master/scripts/staging/SystemML-NN/examples > > With 20G driver memory, the statistics after running 10 epochs are as follows: > Epoch: 10, Iter: 700, Train Loss: 0.20480149054528493, Train Accuracy: > 0.984375, Val Loss: 0.026928755962383588, Val Accuracy: 0.9922 > Epoch: 10, Iter: 800, Train Loss: 0.20165772217976913, Train Accuracy: 1.0, > Val Loss: 0.027878978005867083, Val Accuracy: 0.9922 > 17/02/14 16:06:58 INFO DMLScript: SystemML Statistics: > Total elapsed time: 12687.863 sec. > Total compilation time: 2.168 sec. > Total execution time: 12685.694 sec. > Number of compiled Spark inst: 147. > Number of executed Spark inst: 4. > Cache hits (Mem, WB, FS, HDFS): 1096424/0/0/2. > Cache writes (WB, FS, HDFS): 603950/15/8. > Cache times (ACQr/m, RLS, EXP): 3.704/0.336/61.831/1.242 sec. > HOP DAGs recompiled (PRED, SB): 0/154885. > HOP DAGs recompile time: 28.663 sec. > Functions recompiled: 1. > Functions recompile time: 0.024 sec. > Spark ctx create time (lazy): 1.009 sec. > Spark trans counts (par,bc,col):0/0/2. > Spark trans times (par,bc,col): 0.000/0.000/3.433 secs. > Total JIT compile time: 44.711 sec. > Total JVM GC count: 7459. > Total JVM GC time: 166.26 sec. > Heavy hitter instructions (name, time, count): > -- 1) train 12138.979 sec 1 > -- 2) conv2d_bias_add 10876.708 sec 17362 > -- 3) conv2d_backward_filter 421.303 sec 17200 > -- 4) sel+ 239.660 sec 25881 > -- 5) update 226.687 sec 68800 > -- 6) update_nesterov 223.775 sec 68800 > -- 7) maxpooling_backward 136.709 sec 17200 > -- 8) conv2d_backward_data 134.315 sec 8600 > -- 9) ba+* 118.897 sec 51762 > -- 10) relu_maxpooling 112.283 sec 17362 > -- 11) relu_backward 107.483 sec 34400 > -- 12) uack+ 89.258 sec 34400 > -- 13) r' 74.304 sec 43000 > -- 14) +* 57.193 sec 34400 > -- 15) * 16.493 sec 95178 > -- 16) rand 16.038 sec 8613 > -- 17) / 8.352 sec 86492 > -- 18) rangeReIndex 6.628 sec 17208 > -- 19) + 3.054 sec 96528 > -- 20) uark+ 2.219 sec 43241 > -- 21) sp_csvrblk 2.183 sec 2 > -- 22) rmvar 1.517 sec 1451571 > -- 23) write 1.250 sec 9 > -- 24) - 1.059 sec 86486 > -- 25) createvar 1.026 sec 587259 > -- 26) exp 0.663 sec 17281 > -- 27) *2 0.361 sec 2 > -- 28) uasqk+ 0.277 sec 320 > -- 29) log 0.200 sec 160 > -- 30) uarmax 0.191 sec 17281 > > With 5G driver memory, the statistics after running 10 epochs are as follows: > Epoch: 10, Iter: 700, Train Loss: 0.19313544015858036, Train Accuracy: 1.0, > Val Loss: 0.025943927403263182, Val Accuracy: 0.993 > Epoch: 10, Iter: 800, Train Loss: 0.1883995965207449, Train Accuracy: 1.0, > Val Loss: 0.0260796819319468, Val Accuracy: 0.9916 > 17/02/14 20:16:40 INFO DMLScript: SystemML Statistics: > Total elapsed time: 13886.763 sec. > Total compilation time: 2.148 sec. > Total execution time: 13884.615 sec. > Number of compiled Spark inst: 147. > Number of executed Spark inst: 4. > Cache hits (Mem, WB, FS, HDFS): 1096422/0/2/2. > Cache writes (WB, FS, HDFS): 603868/2176/8. > Cache times (ACQr/m, RLS, EXP): 3.883/0.343/271.757/1.312 sec. > HOP DAGs recompiled (PRED, SB): 0/154885. > HOP DAGs recompile time: 28.290 sec. > Functions recompiled: 1. > Functions recompile time: 0.023 sec. > Spark ctx create time (lazy): 0.981 sec. > Spark trans counts (par,bc,col):0/0/2. > Spark trans times (par,bc,col): 0.000/0.000/3.501 secs. > Total JIT compile time: 45.131 sec. > Total JVM GC count: 7605. > Total JVM GC time: 157.716 sec. > Heavy hitter instructions (name, time, count): > -- 1) train 13301.811 sec 1 > -- 2) conv2d_bias_add 11890.291 sec 17362 > -- 3) conv2d_backward_filter 416.645 sec 17200 > -- 4) ba+* 252.966 sec 51762 > -- 5) sel+ 237.334 sec 25881 > -- 6) update 228.261 sec 68800 > -- 7) update_nesterov 225.383 sec 68800 > -- 8) m
Re: Namespace handling w/ imports
Thanks, Matthias for bringing this up. As Glenn pointed out, the full file path as the namespace is needed so that we can effectively build libraries/packages for SystemML, rather than just single-file scripts. If you truncate the namespace down to just the name of the specific file, then you prevent the ability to build a library in which the same file name is used in multiple folders. Another note with the example presented. Assuming that you are running the `mnist_lenet-train.dml` script, the `train` and `predict` functions are defined in `mnist_lenet.dml`, which is imported, so those functions would not be in the default namespace. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 12, 2017, at 10:15 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: > > Use of source filenames instead of default namespace helped address various > issues and tasks under https://issues.apache.org/jira/browse/SYSTEMML-590 > that were encountered when creating the SystemML-NN script library. Unit > tests were also added to cover different import scenarios. As I recall, > function name conflicts could potentially occur between independent source > files when global default namespace used. It also helped simplify calling > dml-bodied functions when a file was imported by another. > > Thanks, > Glenn > > > Matthias Boehm ---02/12/2017 12:30:35 AM---While debugging our mnist_lenet > script, I encountered an issue with our namespace handling with imp > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 02/12/2017 12:30 AM > Subject: Namespace handling w/ imports > > > > > While debugging our mnist_lenet script, I encountered an issue with our > namespace handling with imports. Here is the related function call graph > (after inlining): > > FUNCTION CALL GRAPH > --MAIN PROGRAM > .\mnist_lenet.dml::train > --.\nn/layers/dropout.dml::forward > --.\mnist_lenet.dml::predict > > but it should read as follows > > FUNCTION CALL GRAPH > --MAIN PROGRAM > .defaultNS::train > --dropout::forward > --.defaultNS::predict > > The namespace handling was changed a while ago. So my question is: was > there a necessity to encode the filenames in the namespace or is this > just a bug? > > > Regards, > Matthias > > > >
Re: Pull Request Reviews
Thanks, Deron, for bringing up this topic! PRs, and the associated discussions, are a critical part of any modern, successful open source project. As Deron stated, anyone in the community should feel free to review PRs -- we want your thoughts and opinions and greatly appreciate your help! - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 3, 2017, at 6:55 PM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > Hi, > > Reviewing pull requests is a great way to contribute to the success of > SystemML. If you are involved in any way with SystemML, please consider > reviewing pull requests. Everyone can review pull requests, and it is a > great way to gain experience with the project. > > Thanks! > Deron > > > Username PRs Reviewed > mboehm7 134 > dusenberrymw 112 > deroneriksson 110 > niketanpansare 40 > gweidner 31 > shirisht 26 > akchinSTC 25 > nakul02 23 > bertholdreinwald 15 > lresende 12 > frreiss 12 > fschueler 9 > Wenpei 7 > asurve 5 > iyounus 4 > MechCoder 3 > MadisonJMyers 3 > oza 2 > fmakari 2 > rightwaitforyou 2 > ethanyxu 1 > ckadner 1 > petro-rudenko 1 > hsaputra 1 > FelixNeutatz 1 > nishi-t 0 > sandeep-n 0 > romeokienzler 0 > tgamal 0 > taasawat 0 > sourav-mazumder 0 > kevin-bates 0 > kakal 0 > GrapeBaBa 0 > objectadjective 0 > nmanchev 0 > jodersky 0 > jdyer1 0 > gmlewis 0 > aloknsingh 0 > akunft 0 > ahmaurya 0 > > > -- > Deron Eriksson > Spark Technology Center > http://www.spark.tc/
Re: Removal of workaround flags
Thanks for bringing up the topic. Our deep learning scripts (i.e. algorithms with several intermediate transformations) have shown cache release times to be a major bottleneck, thus leading to the creation of SYSTEMML-1140. Specifically, what did you use to attempt to reproduce 1140? -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 12, 2017, at 12:30 AM, Matthias Boehm <mboe...@googlemail.com> wrote: > > SYSTEMML-1140
Re: Remove documentation for old MLContext API
+1 for removing that old documentation. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 2, 2017, at 3:54 PM, fschue...@posteo.de wrote: > > As a step to deprecate the old MLContext API, I suggest to remove its > documentation for the next release (together with a deprecation of the actual > API so that we can remove it in 1.0). > > Currently the section about the old API is placed in between up-to-date > documentation and makes it pretty confusing to see what is old and what is > new. > > Any objections? Alternatively we could put it all the way to the end or in a > separate document. > > -Felix
Re: February Podling Report
LGTM. Thanks, Deron! -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Feb 1, 2017, at 2:33 AM, Matthias Boehm <mboe...@googlemail.com> wrote: > > optionally, we could include the following paper that we presented at CIDR'17 > in January. > > Tarek Elgamal, Shangyu Luo, Mattias Boehm, Alexandre V. Evfimievski, Shirish > Tatikonda, Berthold Reinwald, Prithviraj Sen: SPOOF: Sum-Product Optimization > and Operator Fusion for Large-Scale Machine Learning, CIDR 2017. > > Regards, > Matthias > >> On 2/1/2017 7:30 AM, Deron Eriksson wrote: >> Hi, >> >> I posted our SystemML podling report for February to: >> https://wiki.apache.org/incubator/February2017 >> >> Please feel free to make any additions or modifications, such as individual >> efforts to help build our project community. If you don't have write access >> to the wiki, please request write access or ask Mike, Luciano, or me to >> make any additions or modifications. >> >> Thanks, >> Deron >>
Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2)
+1 -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 31, 2017, at 4:12 PM, Berthold Reinwald <reinw...@us.ibm.com> wrote: > > +1 > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: Glenn Weidner/Silicon Valley/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 01/31/2017 08:55 AM > Subject:Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2) > > > > Yes (used same data as in > https://github.com/apache/incubator-systemml/tree/master/src/main/python/tests > ). > > +1 > > Thanks, > Glenn > > > Berthold Reinwald---01/31/2017 08:36:24 AM---Thanks, Glenn. Did you run > LinearRegression, etc. in the Python Notebook? Regards, > > From: Berthold Reinwald/Almaden/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 01/31/2017 08:36 AM > Subject: Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2) > > > > Thanks, Glenn. Did you run LinearRegression, etc. in the Python Notebook? > > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: Glenn Weidner/Silicon Valley/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 01/28/2017 11:55 AM > Subject:Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2) > > > > Verified python artifact functionality in python 2.7 notebook with new > spark 1.6 instance via: > > !pip install > https://dist.apache.org/repos/dist/dev/incubator/systemml/0.12.0-incubating-rc2/systemml-0.12.0-incubating-python.tgz > > > > Successfully ran LinearRegression, LogisticRegression, NaiveBayes, SVM. > > Regards, > Glenn > > Glenn Weidner---01/27/2017 05:46:55 PM---Thank you Matthias! I definitely > agree with bringing the S, M, L scenarios to within a few days. > > From: Glenn Weidner/Silicon Valley/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 01/27/2017 05:46 PM > Subject: Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2) > > > > Thank you Matthias! I definitely agree with bringing the S, M, L scenarios > > to within a few days. > > Yes, for m-svm, the classes argument was default of 150 whereas maxiter > was set to 3 (instead of 20). I ran tests with both 0.11 and 0.12 RC1/RC2 > on same cluster for comparison and will share results separately. > > Thanks, > Glenn > > > Matthias Boehm ---01/27/2017 03:45:47 PM---Thanks Glenn. Could you please > also share the measurements (maybe in a jira). > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 01/27/2017 03:45 PM > Subject: Re: [VOTE] Apache SystemML 0.12.0-incubating (RC2) > > > > Thanks Glenn. Could you please also share the measurements (maybe in a > jira). > > Furthermore, seeing that you ran only a subset of multinomial > experiments, makes me wonder if you used the current default > configuration of 150 classes? In the recent past, we usually ran this > perftest with a reasonable number of about 20 which significantly > impacts performance because broadcast constraints are exceeded. Given > the goal of a fast release process, we might want to update the perftest > to bring scenarions S, M, and L to something like 2 days. > > > Regards, > Matthias > >> On 1/28/2017 12:10 AM, Glenn Weidner wrote: >> Successfully completed performance testing including medium and large > data >> sets for Binomial, Clustering, Multinomial (subset), Regression, and >> Statistics tests. >> >> Regards, >> Glenn >> >> >> >> >> From: Arvind Surve <ac...@yahoo.com.INVALID> >> To: Dev <dev@systemml.incubator.apache.org> >> Date: 01/26/2017 02:50 PM >> Subject: [VOTE] Apache SystemML 0.12.0-incubating (RC2) >> >> >> >> Please vote on releasing the following candidate as Apache SystemML >> version 0.12.0-incubating ! >> >> The vote is open for at least 72 hours and passes if a majority of at >> least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache SystemML 0.12.0-incubating >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache SystemML, please see > http://systemml.apache.org/ >> >> The tag to be voted on is v0.12.0-incubating-rc2 >> (d96a17f64cef7f251d9592679ecdee7ac17feb04) >> >> > https://github.com/apache/incuba
Re: Jira Notifications
I was under the impression that the issues mailing list contained all the general JIRA notifications. From what you said, it sounds like that may not be the case anymore. Perhaps we should open a ticket with Infrastructure? -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 23, 2017, at 3:04 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > Few questions about Jira Notifications > > 1- When are notifications being Sent (at least when they get created ?) > 2- Which list ? (I search dev and issues for an example and didn't find > SYSTEMML-1191) > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Status of `mlpipeline_test` branch
Hi all, On our Git repo, there is currently a `mlpipeline_test` branch. Is this still needed? If not, I would like to delete it. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone.
Re: [DISCUSS] Roadmap SystemML 1.0
Yeah using the target release would be good. Actually, with that in mind, I believe that we have been marking closed issues since the 0.11 release as targeting an upcoming "1.0" release, but it would probably be more correct to update those to "0.12" since we decided to release 0.12. In addition, we should set the target of the Spark 2.x support issue to "0.13". As for the roadmap, it would be good to update the website with a high-level overview, with links to associated JIRA issues. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 16, 2017, at 7:35 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > Instead of Epic, we could use the target release ? Also, we have a roadmap > page on the site and we should keep that up to date, or get rid of that and > use roadmap on jira. > >> On Mon, Jan 16, 2017 at 6:20 PM <dusenberr...@gmail.com> wrote: >> >> Now that we've had some discussion here, it would be good to transfer this >> discussion into a JIRA epic, containing sub tasks. That way, we can >> properly track our progress on these items and facilitate contributions >> from the community. Note that some of the sub tasks may already exist as >> individual issues. >> >> >> >> Would anyone in the community like to volunteer for creating these issues? >> >> >> >> - Mike >> >> >> >> -- >> >> >> >> Mike Dusenberry >> >> GitHub: github.com/dusenberrymw >> >> LinkedIn: linkedin.com/in/mikedusenberry >> >> >> >> Sent from my iPhone. >> >> >> >> >> >>>> On Jan 4, 2017, at 6:00 PM, dusenberr...@gmail.com wrote: >>> >>> >> >>> Overall, this is a good list of items that should be worked on, >> particularly because it contains several user-facing items. However, to >> echo what Luciano said, I'm also concerned about the timeline. At this >> stage, I agree that we need to release more often, and with a more >> user-oriented "product" focus as a guide for timelines. I.e. we should >> orient our release timelines around items that focus on the "product" of >> allowing the user to work on a wide range of ML problems in a simple and >> easy manner on top of Spark. >> >>> >> >>> With that in mind, I agree that a focus on a subset of (1) and (2) would >> be good for an immediate release, with a particular focus on Spark 2.0 >> support as a priority. >> >>> >> >>> How about we aim for a February 1st release date for the initial items? >> >>> >> >>> -Mike >> >>> >> >>> -- >> >>> >> >>> Mike Dusenberry >> >>> GitHub: github.com/dusenberrymw >> >>> LinkedIn: linkedin.com/in/mikedusenberry >> >>> >> >>> Sent from my iPhone. >> >>> >> >>> >> >>>> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote: >> >>>> >> >>>> Hi Matthias, >> >>>> >> >>>> Thanks for the detailed roadmap. >> >>>> >> >>>> +1 for all the items with few modifications. >> >>>> >> >>>> 1) APIs and Language: >> >>>> * Cleanup new MLContext (matrix/frame data types, move tests, etc) >> >>>>>> Ensure Python and Scala MLContext have same API capability. >> >>>> >> >>>> * Remove old MLContext >> >>>> * Consolidate MLContext and JMLC >> >>>> * Full support for Scala/Python DSLs >> >>>>>> +1 for Python DSL except for push-down of loop structures and >> functions. >> >>>> >> >>>> * Remove old file-based transform >> >>>> * Scala/Python wrappers for all existing algorithms >> >>>> * Data converters (additional formats: e.g., libsvm; performance) >> >>>> >> >>>> 2) Updated Dependencies: >> >>>> * Spark 2.0 support >> >>>> * Matrix block library (isolated jar) >> >>>> >> >>>> 3) Compiler/Runtime Features: >> >>>> * GPU support (full compiler and runtime support) >> >>>>>> Can we break this down into phases: >> https://issues.apache.o
Re: Broken Website Menu On iOS
Awesome! Thanks, Jeremy (& Dexter)! I just discovered it, so there's not an issue created yet -- can you create one? Thanks! -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 16, 2017, at 6:40 PM, Jeremy Anderson <jer...@objectadjective.com> > wrote: > > Dexter and I will pick this up. Is there an issue for this already? > > ... > > Jeremy Anderson > > Github: https://github.com/objectadjective > Twitter: https://twitter.com/ObjectAdjective > LinkedIn: http://www.linkedin.com/in/objectadjective > >> On 16 January 2017 at 18:27, <dusenberr...@gmail.com> wrote: >> >> Hi all, >> >> It appears that the main website drop-down menus (Community, Apache) are >> broken on iOS browsers (iPhone). By "broken", I mean that it is not >> possible to click on the down-arrow to expand those drop-down menus. >> >> 1. Can someone check if this is also the case on Android browsers? In >> Chrome with mobile rendering? >> 2. Would someone like to volunteer to fix this? >> >> -Mike >> >> -- >> >> Mike Dusenberry >> GitHub: github.com/dusenberrymw >> LinkedIn: linkedin.com/in/mikedusenberry >> >> Sent from my iPhone. >> >>
Broken Website Menu On iOS
Hi all, It appears that the main website drop-down menus (Community, Apache) are broken on iOS browsers (iPhone). By "broken", I mean that it is not possible to click on the down-arrow to expand those drop-down menus. 1. Can someone check if this is also the case on Android browsers? In Chrome with mobile rendering? 2. Would someone like to volunteer to fix this? -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone.
Re: [DISCUSS] Roadmap SystemML 1.0
Now that we've had some discussion here, it would be good to transfer this discussion into a JIRA epic, containing sub tasks. That way, we can properly track our progress on these items and facilitate contributions from the community. Note that some of the sub tasks may already exist as individual issues. Would anyone in the community like to volunteer for creating these issues? - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 4, 2017, at 6:00 PM, dusenberr...@gmail.com wrote: > > Overall, this is a good list of items that should be worked on, particularly > because it contains several user-facing items. However, to echo what Luciano > said, I'm also concerned about the timeline. At this stage, I agree that we > need to release more often, and with a more user-oriented "product" focus as > a guide for timelines. I.e. we should orient our release timelines around > items that focus on the "product" of allowing the user to work on a wide > range of ML problems in a simple and easy manner on top of Spark. > > With that in mind, I agree that a focus on a subset of (1) and (2) would be > good for an immediate release, with a particular focus on Spark 2.0 support > as a priority. > > How about we aim for a February 1st release date for the initial items? > > -Mike > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > >> On Jan 3, 2017, at 4:17 PM, Niketan Pansare <npan...@us.ibm.com> wrote: >> >> Hi Matthias, >> >> Thanks for the detailed roadmap. >> >> +1 for all the items with few modifications. >> >> 1) APIs and Language: >> * Cleanup new MLContext (matrix/frame data types, move tests, etc) >> >> Ensure Python and Scala MLContext have same API capability. >> >> * Remove old MLContext >> * Consolidate MLContext and JMLC >> * Full support for Scala/Python DSLs >> >> +1 for Python DSL except for push-down of loop structures and functions. >> >> * Remove old file-based transform >> * Scala/Python wrappers for all existing algorithms >> * Data converters (additional formats: e.g., libsvm; performance) >> >> 2) Updated Dependencies: >> * Spark 2.0 support >> * Matrix block library (isolated jar) >> >> 3) Compiler/Runtime Features: >> * GPU support (full compiler and runtime support) >> >> Can we break this down into phases: >> >> https://issues.apache.org/jira/browse/SYSTEMML-445 ? We can discuss the >> >> timeline of the phases in the JIRA. >> >> * Compressed linear algebra v2 >> * Code generation (automatic operator fusion) >> * Extended parfor (full spark exploitation, micro-batch support) >> * Scale-up architecture (large dense blocks, numa)? >> >> 4) Tools >> * Extended stats (task locality, shuffle, etc) >> * Cloud resource advisor (extended resource optimizer)? >> >> 5) Algorithms >> * Graduate "staging" algorithms (robustness/performance) >> * Perftest: include all algorithms into automated performance tests >> >> via spark-submit + via Scala/Python wrappers >> >> * Simplify usage decision trees, random forest, mlogreg, msvm >> (preprocessing, label representation, etc) >> >> + command-line variable naming. For example: maxi, maxiter, etc. >> >> Thanks, >> >> Niketan Pansare >> IBM Almaden Research Center >> E-mail: npansar At us.ibm.com >> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar >> >> Matthias Boehm ---01/03/2017 02:44:39 PM---Yes indeed, most of (3) and (4) >> can be done incrementally. For (5), some of the changes might also >> >> From: Matthias Boehm <mboe...@googlemail.com> >> To: dev@systemml.incubator.apache.org >> Date: 01/03/2017 02:44 PM >> Subject: Re: [DISCUSS] Roadmap SystemML 1.0 >> >> >> >> >> Yes indeed, most of (3) and (4) can be done incrementally. For (5), some >> of the changes might also modify the signature of algorithms (i.e., >> parameters and required input data) but it would help, for example with >> decision trees, as users no longer need to dummy code their inputs. >> >> Generally, I'm fine with making (3), (4), and part of (5) optional and >> let the "must-have" features from (1) and (2) determine the timeline. >> >> Regards, >> Matthias >> >> On 1/3/2017 11:27 PM, Lucian
Re: SystemML Branch for any fixes related to Spark 1.6x
Well, I think the final consensus in the community was that the 0.12 release would be the final line that supports Spark 1.6.x. All future releases will be on Spark 2.x. The idea of supporting both simultaneously was considered, but ultimately it was agreed that it just wouldn't be sustainable. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 13, 2017, at 3:46 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > The changes here are related to 1.x spark releases, right? So the idea here > is that this becomes a dev stream for Spark 1.6 support and you guys can > have 0.13, 0.14, 0.15, as required from this branch. > > If you guys want to change, I don't have any objections, please go ahead > and change. > >> On Fri, Jan 13, 2017 at 1:55 PM, <dusenberr...@gmail.com> wrote: >> >> Thanks, Luciano for creating the branch. Could we rename it to >> "branch-0.12" to better reflect that any changes that are added would only >> apply to future bug fix releases on the 0.12.x line? This would be more in >> line with the naming scheme that Spark uses for its branches, and should >> cause less confusion. >> >> -- >> >> Mike Dusenberry >> GitHub: github.com/dusenberrymw >> LinkedIn: linkedin.com/in/mikedusenberry >> >> Sent from my iPhone. >> >> >>> On Jan 13, 2017, at 1:50 PM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >>> >>> We have created the following branch to track Spark 1.6 fixes : >>> origin/branch-systemml-spark-1.6 >>> >>> Note that, fixes that go into master, and are also affecting 1.6, they >>> should be cherry-picked to the 1.6 branch as well. >>> >>> As for checking out, you will need to do something like the steps below >>> (your preference might change some steps) >>> >>> git checkout -b branch-systemml-spark-1.6 origin/branch-systemml-spark- >> 1.6 >>> git branch --set-upstream-to origin/branch-systemml-spark-1.6 >>> branch-systemml-spark-1.6 >>> >>> this last one is like: >>> >>> git branch --set-upstream-to origin/my_remote_branch my_local_branch >>> >>> For creating dev branches for 1.6, first go to you local 1.6 branch and >>> continue with your regular steps such as git branch -b JIRA-222 >>> >>> And good luck !!! >>> >>> -- >>> Luciano Resende >>> http://twitter.com/lresende1975 >>> http://lresende.blogspot.com/ >> > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: SystemML Branch for any fixes related to Spark 1.6x
Thanks, Luciano for creating the branch. Could we rename it to "branch-0.12" to better reflect that any changes that are added would only apply to future bug fix releases on the 0.12.x line? This would be more in line with the naming scheme that Spark uses for its branches, and should cause less confusion. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 13, 2017, at 1:50 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > We have created the following branch to track Spark 1.6 fixes : > origin/branch-systemml-spark-1.6 > > Note that, fixes that go into master, and are also affecting 1.6, they > should be cherry-picked to the 1.6 branch as well. > > As for checking out, you will need to do something like the steps below > (your preference might change some steps) > > git checkout -b branch-systemml-spark-1.6 origin/branch-systemml-spark-1.6 > git branch --set-upstream-to origin/branch-systemml-spark-1.6 > branch-systemml-spark-1.6 > > this last one is like: > > git branch --set-upstream-to origin/my_remote_branch my_local_branch > > For creating dev branches for 1.6, first go to you local 1.6 branch and > continue with your regular steps such as git branch -b JIRA-222 > > And good luck !!! > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: GSoc 2017
Yeah helping to build out our Python DSL into a full-out replacement for the current "DML" language would be great, and we'd be quite supportive! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 12, 2017, at 2:58 PM, fschue...@posteo.de wrote: > > Hi Krishna, > > cool to see that you're interested in SystemML! > > From your list I personally think that a) and d) would be well suited for > projects, especially a good python DSL is a high priority. > > We will apply as an organization to GSoC once organization applications are > open (Jan. 19th) and I think we will find mentors for at least a) and d). If > you already want to take a look at what is currently there, I suggest to look > at our python APIs and documentation. If you want to take on the DSL project > it might also be a good idea to look into the DML documentation and related > papers to see what we need to support. > > The proposals will probably circulate on the mailinglist, too, so keep an eye > on that :) > > -Felix > > Am 12.01.2017 23:13 schrieb Krishna Kalyan: >> Hello All, >> Thank you for your wonderful replies. >> Tasks that I am interested in: >> a) Support for Python DSLs >> b) Python wrappers for all existing algorithms >> c) GPU support >> d) Perftest : automated performance tests of algorithms >> I am also willing to work on the tasks that SystemML community think are >> important. >> Regards, >> Krishna >> On Fri, Jan 6, 2017 at 10:14 PM, Mike Dusenberry <dusenberr...@gmail.com> >> wrote: >>> Hi Krishna! Welcome, and thanks for your interest! >>> We would definitely be excited to collaborate with you on a GSOC project. >>> We've started another thread to discuss possible new proposals, and we >>> would also be quite interested in any particular proposal that you might >>> like to generate tailored towards your interests. Copied from the other >>> thread, some possible ideas could include: building out a full ML demo to >>> solve a real, large-scale problem that would benefit from a distributed >>> approach; overall performance improvements that address a full class, or >>> wider area, of ML algorithms, rather than a single, specific script; >>> infrastructure for [performance] testing, and identification of wide areas >>> of improvement; helping with building out fully-featured, clean, >>> well-tested DSLs in Python & Scala (we've started, but it would be good to >>> continue stressing them -- we could even aim to replace DML with the DSLs); >>> etc. Overall, we want to improve the ability of the user to work on a wide >>> range of large-scale, distributed ML problems in a simple and easy manner >>> on top of Spark. >>> In the meantime, you could explore our recent open issues [1] and even >>> begin discussions or contributions on any of the items. You could also >>> view our recent roadmap discussion thread on the mailing list, starting >>> with the first email [2]: >>> [1]: >>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SYSTEMML%20AND% >>> 20resolution%20%3D%20Unresolved%20ORDER%20BY%20updated%20DESC%2C% >>> 20priority%20DESC >>> [2]: >>> http://mail-archives.apache.org/mod_mbox/incubator- >>> systemml-dev/201701.mbox/%3C9eb780f0-ff28-c702-117c- >>> bad740599...@gmail.com%3E >>> - Mike >>> -- >>> Michael W. Dusenberry >>> GitHub: github.com/dusenberrymw >>> LinkedIn: linkedin.com/in/mikedusenberry >>> On Fri, Jan 6, 2017 at 12:34 PM, Luciano Resende <luckbr1...@gmail.com> >>> wrote: >>> > As some folks have described on this thread, it would be great to get you >>> > familiarized with SystemML. >>> > >>> > In parallel, I would look for a mentor from the active committer list and >>> > start working on a project proposal which could be based on the recent >>> > Roadmap discussion [1]. >>> > >>> > If you are looking for some guidance on how Apache participate on GSOC, >>> > take a look at the following resources [2] and [3], and don't hesitate to >>> > ask questions here. >>> > >>> > >>> > [1] >>> > https://www.mail-archive.com/dev@systemml.incubator.apache.o >>> > rg/msg01199.html >>> > [2] http://community.apache.org/gsoc.html >>> > [3] >>> > http://www.slideshare.net/luckbr1975/how-mentoring-can-help- >>> >
Re: Time To Merge Spark 2.0 Support PR
Let's cut a 0.12 branch tomorrow, and then submit it for the release process on Friday. Thoughts? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 9, 2017, at 12:40 PM, Arvind Surve <ac...@yahoo.com.INVALID> wrote: > > ok, Thanks > > Arvind SurveSpark Technology Centerhttp://www.spark.tc/ > > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org > Sent: Monday, January 9, 2017 12:33 PM > Subject: Re: Time To Merge Spark 2.0 Support PR > >> On Mon, Jan 9, 2017 at 12:28 PM, <dusenberr...@gmail.com> wrote: >> >> Right, so we can cut a 0.12 release branch now, and then release from >> that, while work moves forward on the master branch, including 2.0 support. >> >> > Exactly, 0.12 release will come from a brach that we will create and Spark > 2.0 support gets merged into Master. > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > >
Re: Time To Merge Spark 2.0 Support PR
Right, so we can cut a 0.12 release branch now, and then release from that, while work moves forward on the master branch, including 2.0 support. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 9, 2017, at 12:26 PM, Acs S <ac...@yahoo.com.INVALID> wrote: > > As I already mentioned we need to have proper way for Spark 1.6 users to use > SystemML Python DSL. Pip Install Artifact was missing from SystemML 0.11 > release and it needs to be added in SystemML 0.12 release. > Arvind SurveSpark Technology Centerhttp://www.spark.tc > From: "dusenberr...@gmail.com" <dusenberr...@gmail.com> > To: dev@systemml.incubator.apache.org > Sent: Monday, January 9, 2017 12:18 PM > Subject: Re: Time To Merge Spark 2.0 Support PR > > Just to be clear, instead of creating a branch to merge the 2.0 support, we > will want to merge the 2.0 support into the master branch. > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > >> On Jan 9, 2017, at 12:02 PM, Acs S <ac...@yahoo.com.INVALID> wrote: >> >> Based on discussion thread we will start creating SystemML release based on >> Spark 2.0. >> There are bunch of activities need to be completed and we need volunteer for >> most of them. >> Activity >> >> Volunteer1. Create a branch based on SystemML 0.12 release to merge Spark >> 2.0 codeLuciano2. Get Spark 2.0 PR merged to >> this new branch. >>Glenn3. Do build changes to have both Spark 1.6 and 2.0 >> builds for release and PR. (Someone needs to work >> with Alan) >> 4. Setup Spark 2.0 cluster (One of the Almaden cluster updated with Spark >> 2.0)5. Create Release Candidate >> Glenn, >> Deron, Arvind6. Performance Testing7. Notebook testing >> >> Arvind8. Python DSL verification (2.x and 3.x)9. >> Scala DSL verification10. Artifacts verification11. Documentation update. >> >> -Arvind SurveSpark Technology Centerhttp://www.spark.tc >> >> From: Niketan Pansare <npan...@us.ibm.com> >> To: dev@systemml.incubator.apache.org >> Sent: Friday, January 6, 2017 1:12 PM >> Subject: Re: Time To Merge Spark 2.0 Support PR >> >> I am fine with creating a branch for Spark 1.6 support and merging Spark 2.0 >> PR then. Like Luciano said, we can creating a release 0.12 from our Spark >> 1.6 branch. >> >> Overriding previous release is common practice for pip installer, however >> pypi does maintain the history of releases. Once a release candidate 0.12 is >> created, the user can install SystemML python package in three ways: >> 1. From source by checking out the branch and executing: mvn package -P >> distribution, followed by pip install >> target/systemml-0.12.0-incubating-python.tgz >> 2. From Apache site, pip install >> http://www.apache.org/dyn/closer.lua/incubator/systemml/0.12.0-incubating/systemml-0.12.0-incubating-python.tgz >> 3. From pypi by specifying the version, pip install -I >> systemml_incubating==0.12 >> >> As long as we ensure that version of the python package on pypi matches our >> release version and we document the Spark support in our release notes, >> there should not be any confusion on usage :) >> >> Thanks, >> >> Niketan Pansare >> IBM Almaden Research Center >> E-mail: npansar At us.ibm.com >> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar >> >> Acs S ---01/06/2017 12:57:53 PM---I would agree to create a branch and add >> Spark 2.0 to it, while still releasing SystemML 0.12 releas >> >> From: Acs S <ac...@yahoo.com.INVALID> >> To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org> >> Date: 01/06/2017 12:57 PM >> Subject: Re: Time To Merge Spark 2.0 Support PR >> >> >> >> I would agree to create a branch and add Spark 2.0 to it, while still >> releasing SystemML 0.12 release with Pip Install Artifact. >> Regarding
Re: Time To Merge Spark 2.0 Support PR
Just to be clear, instead of creating a branch to merge the 2.0 support, we will want to merge the 2.0 support into the master branch. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 9, 2017, at 12:02 PM, Acs S <ac...@yahoo.com.INVALID> wrote: > > Based on discussion thread we will start creating SystemML release based on > Spark 2.0. > There are bunch of activities need to be completed and we need volunteer for > most of them. > Activity > > Volunteer1. Create a branch based on SystemML 0.12 release to merge Spark 2.0 > codeLuciano2. Get Spark 2.0 PR merged to this > new branch. > Glenn3. Do build changes to have both Spark 1.6 and 2.0 builds > for release and PR. (Someone needs to work with Alan) > 4. Setup Spark 2.0 cluster (One of the Almaden cluster updated with Spark > 2.0)5. Create Release Candidate > Glenn, Deron, > Arvind6. Performance Testing7. Notebook testing > > Arvind8. Python DSL verification (2.x and 3.x)9. Scala DSL > verification10. Artifacts verification11. Documentation update. > > -Arvind SurveSpark Technology Centerhttp://www.spark.tc > > From: Niketan Pansare <npan...@us.ibm.com> > To: dev@systemml.incubator.apache.org > Sent: Friday, January 6, 2017 1:12 PM > Subject: Re: Time To Merge Spark 2.0 Support PR > > I am fine with creating a branch for Spark 1.6 support and merging Spark 2.0 > PR then. Like Luciano said, we can creating a release 0.12 from our Spark 1.6 > branch. > > Overriding previous release is common practice for pip installer, however > pypi does maintain the history of releases. Once a release candidate 0.12 is > created, the user can install SystemML python package in three ways: > 1. From source by checking out the branch and executing: mvn package -P > distribution, followed by pip install > target/systemml-0.12.0-incubating-python.tgz > 2. From Apache site, pip install > http://www.apache.org/dyn/closer.lua/incubator/systemml/0.12.0-incubating/systemml-0.12.0-incubating-python.tgz > 3. From pypi by specifying the version, pip install -I > systemml_incubating==0.12 > > As long as we ensure that version of the python package on pypi matches our > release version and we document the Spark support in our release notes, there > should not be any confusion on usage :) > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > Acs S ---01/06/2017 12:57:53 PM---I would agree to create a branch and add > Spark 2.0 to it, while still releasing SystemML 0.12 releas > > From: Acs S <ac...@yahoo.com.INVALID> > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org> > Date: 01/06/2017 12:57 PM > Subject: Re: Time To Merge Spark 2.0 Support PR > > > > I would agree to create a branch and add Spark 2.0 to it, while still > releasing SystemML 0.12 release with Pip Install Artifact. > Regarding comment from Mike, that new SystemML release will update PyPy > package.Shouldn't it be tagged with version #? Otherwise every release will > override previous one.Niketan, any comments? > -Arvind > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Sent: Friday, January 6, 2017 12:52 PM > Subject: Re: Time To Merge Spark 2.0 Support PR > > +1 on moving to Spark 2.x - I think we delayed this way too long now and > there will always be some awesome feature that we'd want to support on > older Spark versions too. > > Regards, > Matthias > >> On 1/6/2017 9:41 PM, Mike Dusenberry wrote: >> Well to be fair, a user can still use the Python DSL with the SystemML 0.11 >> release by using `pip install -e src/main/python`. We just didn't place a >> separate Python binary on the release website. Keep in mind as well that >> once we release the next release with Spark 2.x support, a Spark 1.6 will >> not be able to use `pip install systemml` anyway, as that PyPy package will >> have been updated to the latest Spark 2.0 release.
Re: Parfor semantics
Also for some context, we're aiming to use this for remote hyperparameter tuning over a large dataset. Specifically, each remote process would train a separate model over the full dataset using a mini-batch SGD approach. Has the `parfor` construct been used for this purpose before? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Nov 22, 2016, at 2:01 PM, Matthias Boehm <mboe...@googlemail.com> wrote: > > that's a good catch - thanks Felix. It would be great if you could modify > rewriteSetExecutionStategy and rewriteSetFusedDataPartitioningExecution in > OptimizerConstrained to handle the respective Spark execution types. Thanks. > > Regards, > Matthias > >> On 11/22/2016 7:54 PM, fschue...@posteo.de wrote: >> The constrained optimizer doesn't seem to know about a REMOTE_SPARK >> execution mode and either sets CP or REMOTE_MR. I can open a jira for >> that and provide a fix. >> >> Felix >> >> Am 22.11.2016 02:07 schrieb Matthias Boehm: >>> yes, this came up several times - initially we only supported opt=NONE >>> where users had to specify all other parameters. Meanwhile, there is a >>> so-called "constrained optimizer" that does the same as the rule-based >>> optimizer but respects any given parameters. Please try something like >>> this: >>> >>> parfor (i in 1:10, opt=CONSTRAINED, par=10, mode=REMOTE_SPARK) { >>> // some code here >>> } >>> >>> >>> Regards, >>> Matthias >>> >>>> On 11/22/2016 12:33 AM, fschue...@posteo.de wrote: >>>> While debugging some ParFor code it became clear that the parameters for >>>> parfor can be easily overwritten by the optimizer. >>>> One example is when I write: >>>> >>>> ``` >>>> parfor (i in 1:10, par=10, mode=REMOTE_SPARK) { >>>>// some code here >>>> } >>>> ``` >>>> >>>> Depending on the data size and cluster resources, the optimizer >>>> (OptimizerRuleBased.java, line 844) will recognize that the work can be >>>> done locally and overwrite it to local execution. This might be valid >>>> and definitely works (in my case) but kind of contradicts what I want >>>> SystemML to do. >>>> I wonder if we should disable this optimization in case a concrete >>>> execution mode is given and go with the mode that is provided. >>>> >>>> Felix >>>> >>>> >> >>
Re: [DRAFT] November monthly report
Looks good. We should also include the VLDB paper award. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Nov 1, 2016, at 4:43 PM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > Hello, > > Here is a draft of the November monthly report due tomorrow that Felix and > I put together. Feedback is welcome. > > Deron > > > > SystemML > > SystemML provides declarative large-scale machine learning (ML) that aims at > flexible specification of ML algorithms and automatic generation of hybrid > runtime plans ranging from single node, in-memory computations, to > distributed > computations running on Apache Hadoop MapReduce and Apache Spark. > > SystemML has been incubating since 2015-11-02. > > Three most important issues to address in the move towards graduation: > > - Grow SystemML community: increase mailing list activity, > increase adoption of SystemML for scalable machine learning, encourage > data scientists to adopt DML and PyDML algorithm scripts, respond to > user feedback to ensure SystemML meets the requirements of real-world > situations, write papers, and present talks about SystemML. > - Continue to produce releases. > - Increase the diversity of our project's contributors and committers. > > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be aware > of? > > NONE. > > How has the community developed since the last report? > Our mailing list from August through October had 375 messages on a wide > range > of topics. We have gained 4 new contributors to the main project since > August > 1st. Our website has been redesigned with the help of several design > engineers > and we have commits from 3 new contributors to the website project. On > GitHub, > the project has been starred 417 times and forked 156 times. > > Niketan Pansare gave a talk with the title "Apache SystemML - Declarative > Machine Learning at Scale" on October 7th in the CS graduate seminar at UC > Merced. Matthias Boehm gave a talk on "Compressed Linear Algebra for Large- > Scale Machine Learning" at TU Dresden on August 30th. We presented the > papers > "Compressed Linear Algebra for Large-Scale Machine Learning" (research > paper + > poster) and "SystemML: Declarative Machine Learning on Spark" (industry > paper) > at VLDB'16, gave two 90 minute tutorials at the BOSS'16 workshop, > co-located > with VLDB'16, and our paper "SPOOF: Sum-Product Optimization and Operator > Fusion for Large- Scale Machine Learning" has been accepted at CIDR'17. > > How has the project developed since the last report? > The main project has had 213 commits since August 1. The website project > has > had 51 commits since August 1. Since August 1, 241 issues have been > reported > on our JIRA site and 137 issues have been resolved or closed. 79 pull > requests > have been created since August 1, and 72 pull requests have been closed. > > Date of last release: > > 2016-06-15 (version 0.10.0-incubating) > > When were the last committers or PMC members elected? > > 2016-05-07 Glenn Weidner > 2016-05-07 Faraz Makari Manshadi > >
Re: rc3 source-release.zip artifact
Should we include a README with the release artifacts that describes what each one is? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 29, 2016, at 10:55 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: > > In my opinion it can be removed. > > Thanks, > Glenn > > Deron Eriksson ---10/20/2016 01:36:04 PM---The 0.11.0 rc3 artifacts are > located at: https://dist.apache.org/repos/dist/dev/incubator/systemml/0 > > From: Deron Eriksson <deroneriks...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 10/20/2016 01:36 PM > Subject: rc3 source-release.zip artifact > > > > > The 0.11.0 rc3 artifacts are located at: > https://dist.apache.org/repos/dist/dev/incubator/systemml/0.11.0-incubating-rc3/ > > I see the following artifact: > systemml-0.11.0-incubating-source-release.zip > > I do not recognize this artifact. Can anyone tell me what this artifact is? > Can it be removed? > > Deron > > >
Re: [DISCUSS] Adding tensorboard-like functionality to SystemML
Visualization is a good topic to bring up for the project. I would like to add another possible option of using TensorBoard directly. I have not looked into the file format used for TensorBoard, but it may be possible to simple adopt that format, and simply write our stats to that type of file. That would allow us to reuse that project without having to write our own. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 28, 2016, at 8:13 AM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Hi Matthias, > > Thanks for your feedback. > > There is a tradeoff between keeping a feature in-house until it is stable, > v/s continually getting community feedback as the work is getting done via PR > and discussions. I am for the latter as it encourages community feedback as > well as participation. > > I agree that our goal should be to complete the features you mentioned asap > and yes, we are working hard towards making the GPU backend, the deep > learning built-in functions and the algorithm wrappers (ones that are already > added) to be 'non-experimental' in the 1.0 release :) ... Also, like you > hinted, it is important to explicitly mark the experimental features in the > documentation to avoid the 'bad impression'. The Python DSL will remain > experimental until there is more interest from the community. I am fine with > deleting the debugger since it is rarely used, if at all. > > Keeping inline with the Apache guidelines, this discussion is to allow > community to decide on whether SystemML community should consider adding new > visualization functionality (since this feature is user facing). If there is > no interest, we can either postpone or discard this discussion :) > > Thanks, > > Niketan. > >> On Oct 28, 2016, at 1:24 AM, Matthias Boehm <mboe...@googlemail.com> wrote: >> >> Thanks for putting this together Niketan. However, could we please >> postpone this discussion after our 1.0 release? Right now, I'm concerned >> to see that we're adding many experimental features without really >> getting them done. This includes for example, the GPU backend, the new >> MLContext API, the Python DSL, the deep learning builtin functions, the >> Scala algorithm wrappers, the old Spark debugger interface, and >> compressed linear algebra. I think we should finish these features first >> before moving on. If we're not careful about that, it would quickly >> create a very bad impression for new users. >> >> Regards, >> Matthias >> >>> On 10/28/2016 1:20 AM, Niketan Pansare wrote: >>> >>> >>> Hi all, >>> >>> To give every context, I am working on a new deep learning API for SystemML >>> that is backed by the NN library ( >>> https://github.com/apache/incubator-systemml/tree/master/scripts/staging/SystemML-NN/nn >>> ). This API allows the users to express their model using Caffe >>> specification and perform fit/predict similar to scikit-learn APIs. I have >>> created a sample notebook explaining the usage of the API: >>> https://github.com/niketanpansare/incubator-systemml/blob/1b655ebeec6cdffd66b282eadc4810ecfd39e4f2/samples/jupyter-notebooks/Barista-API-Demo.ipynb >>> . This API also allows the user to load and store pre-trained models. See >>> https://github.com/niketanpansare/model_zoo/tree/master/caffe/vision/vgg/ilsvrc12 >>> >>> As part of this API, I added a mini-tensorboard like functionality (see >>> step 6 and 7) using matplotlib. If there is enough interest, we can extend >>> and standardize the visualization functionality across all over algorithms. >>> Here are some initial discussion points: >>> 1. Primary visualization mechanism (Jupyter or a standalone app or both => >>> former is useful for cloud offering such as DSX and latter provides the >>> design team more creative control) >>> 2. What to plot for each algorithm (data scientists and algorithms >>> developers will help us here). >>> 3. Standardize UI (if we decide to go with Jupyter, we need to extend the >>> code in _visualize method: >>> https://github.com/niketanpansare/incubator-systemml/blob/1b655ebeec6cdffd66b282eadc4810ecfd39e4f2/src/main/python/systemml/mllearn/estimators.py#L621 >>> ) >>> 4. Primary APIs to target (python, scala, command-line or all) >>> >>> Thanks, >>> >>> Niketan Pansare >>> IBM Almaden Research Center >>> E-mail: npansar At us.ibm.com >>> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar >>> >> >
Re: [VOTE] Apache SystemML 0.11.0-incubating (RC4)
+1 I've been running large scale use-case tests and things ran well on rc3 and on this rc4. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 28, 2016, at 4:08 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > Off course, my +1. > > On Mon, Oct 24, 2016 at 4:11 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> Please vote on releasing the following candidate as Apache SystemML >> version 0.11.0-incubating ! >> >> The vote is open for at least 72 hours and passes if a majority of at >> least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache SystemML 0.11.0-incubating >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache SystemML, please see >> http://systemml.apache.org/ >> >> The tag to be voted on is v0.11.0-incubating-rc4 ( >> 6937683b01a13458990e698b0cf04f4f6ccecde3) >> >> https://github.com/apache/incubator-systemml/tree/ >> 6937683b01a13458990e698b0cf04f4f6ccecde3 >> >> The release artifacts can be found at : >> >> https://dist.apache.org/repos/dist/dev/incubator/systemml/0. >> 11.0-incubating-rc4/ >> >> The maven release artifacts, including signatures, digests, etc. can be >> found at: >> >> https://repository.apache.org/content/repositories/orgapachesystemml-1010/ >> >> >> = >> == Apache Incubator release policy == >> = >> Please find below the guide to release management during incubation: >> http://incubator.apache.org/guides/releasemanagement.html >> >> === >> == How can I help test this release? == >> === >> If you are a SystemML user, you can help us test this release by taking an >> existing Algorithm or workload and running on this release candidate, then >> reporting any regressions. >> >> >> == What justifies a -1 vote for this release? == >> >> -1 votes should only occur for significant stop-ship bugs or legal >> related issues (e.g. wrong license, missing header files, etc). Minor bugs >> or regressions should not block this release. >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: Couple of questions on website contents
Overall, the new website looks awesome! Good job everyone! One concerning issue though is that the site is currently fairly broken on mobile. Can we update the site so that it renders properly on both desktop and mobile? -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 26, 2016, at 12:39 AM, Dexter Lesaca <dexter.les...@gmail.com> wrote: > > The site looks really good everyone! Thank to everyone for churning out > this awesome! Thanks for your diligence during design and development > process! > > Luciano, your fix is just fine for now until a more encompassing solution > is designed which we should arrive at soon. > > > > On Wed, Oct 26, 2016 at 9:20 AM Luciano Resende <luckbr1...@gmail.com> > wrote: > > I made that change, as I think we need to ve able to list all available > mailing lists, but I didn't want to use the obsolete docs page. > > Thinking more about this, maybe a meet in the middle approach is to use the > full details on the community page, and revert the front page to focus on > the dev list ? > > Tgoughts ? > >> On Wednesday, October 26, 2016, Jason Azares <jason.aza...@gmail.com> wrote: >> >> Hi Deron, >> >> Thanks for publishing the updates, and the site looks great! One thing I >> noticed is that the "Subscribe to Our Mailing Lists" section does not >> reflect what the design team originally had. I'm not sure if you were > aware >> of this discrepancy. >> >> [image: Inline image 1] >> >> On Tue, Oct 25, 2016 at 9:53 PM, Deron Eriksson <deroneriks...@gmail.com >> <javascript:_e(%7B%7D,'cvml','deroneriks...@gmail.com');>> wrote: >> >>> Hi Luciano, >>> >>> Since the current website updates are major improvements, I have gone >>> ahead >>> and published the new updates. I think we can now start publishing more >>> frequently since important parts of the codebase have stabilized. >>> >>> Deron >>> >>> >>> On Tue, Oct 25, 2016 at 5:40 PM, Deron Eriksson <deroneriks...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','deroneriks...@gmail.com');>> >>> wrote: >>> >>>> Hi Luciano, >>>> >>>> Several updates to the website were merged today. I think we're at the >>>> point where we can publish the new website updates. Do you agree? >>>> >>>> Deron >>>> >>>> >>>> On Tue, Oct 25, 2016 at 11:02 AM, Jason Azares <jason.aza...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','jason.aza...@gmail.com');>> >>>> wrote: >>>> >>>>> Hi Luciano, >>>>> >>>>> Initial page: >>>>>> - What's the intention of the section just above the social banner >>> ? I >>>>>> noticed it was actually a copy of a section from the community page, >>>>> but it >>>>>> looks like the content was duplicated and not extracted to a banner, >>>>> and I >>>>>> have changed the one in community to what I think it better > clarifies >>>>> the >>>>>> mailing list, but I am not sure if that's the same intent of the >>> banner >>>>> on >>>>>> the initial page. >>>>> >>>>> >>>>> Thanks for bringing this point to our attention. The content on the >>>>> initial >>>>> page is different from the community page. We wanted to have a call to >>>>> action to get users to subscribe to the mailing list. We are currently >>>>> designing this section and will send a pull request once completed. >>>>> >>>>> Navigation Menu: >>>>>> - The community navigation seems to have gone wild with a few >>>>> duplications. >>>>>> We have source code and github links, which are both the same. We >>> also >>>>> have >>>>>> the community get involved link that includes a list of committers >>> using >>>>>> the new design format, but there is also a link to project > committers >>>>> that >>>>>> include the old page listing all committers. >>>>> >>>>> >>>>> Dexter is currently working to resolve this issue. He will send his >>>>> updates >>>>> once they are finished. >>>>>
Re: SystemML Medium Blog
+1 This sounds like a great idea! It would be nice to include blogs with tutorials, fun quick tips and tricks, full case studies, example use cases, etc. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 25, 2016, at 1:43 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > On Tue, Oct 25, 2016 at 7:32 PM, Madison Myers <madisonjmy...@gmail.com> > wrote: > >> Hey everyone, >> >> Just a thought on expanding visibility of SystemML: I know lots of us have >> written some blogs and articles on SystemML and I think it would be great >> to get these all in the same spot (and also write more)! I've started a >> SystemML Medium blog for this and would love to: >> >> 1. republish existing blogs on Medium >> 2. have volunteers write new blogs >> >> The idea would be to have these be linked directly from the website. If you >> wouldn't mind, I'd love your feedback! If you're up for me republishing >> articles that you've already written on the SystemML medium account (the >> author's name will still be yours), please let me know! Also, if you have >> ideas on topics that the SystemML community should be writing on and/or are >> up for writing an article or two, let me know as well! >> >> Luciano, do you see any issues from an Apache standpoint? >> >> Thanks! >> Madison > > > +1, just make sure "republish" are done by the blog authors or with their > explicit permission archived on this mailing list. > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: [VOTE] SystemML New Logo Ideas
+1 that sounds great to me. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 25, 2016, at 10:45 AM, Madison Myers <madisonjmy...@gmail.com> wrote: > > I agree! > +1 to using both. I think, like you suggested, that using #1 for headers > and #4 for other uses sounds fantastic. > > On Tue, Oct 25, 2016 at 10:36 AM, Jason Azares <jason.aza...@gmail.com> > wrote: > >> Hey guys, >> >> Branding wise, we also feel that #1 and #4 are the best choices. It's great >> that we're all on the same page. To answer the question of pros and cons of >> each logo, here is a quick list: >> >> Logo 1: >> >> >> - More versatile because of its scalability; We think logo 4 will be >> hard to discern once sized down; Logo 1 looks cleaner in website >> headers >> with text >> - Relevant because it has a matrix bracket >> - It's a simplified version of the robot. Think of it as the batman >> signal and the robot is batman. >> >> Logo 4: >> >> >> - More original because it has a personality >> - Diverse in the actions it can perform because it can move, animate, >> and be customized based on intent and use >> - The robot is kind of cute and approachable >> >> Our suggestion is to use both. Logo 1 is the simplified version of the >> robot. Logo 4 is the personification of the logo used to explain concepts. >> >> We'd love to hear your thoughts! >> >> Regards, >> Jason and the design team >> >> P.S. In general, here are our guidelines for creating a great logo: >> >> - *original* - something that stands out from competitors >> - *relevant* - reflects the brand's mission and values >> - *versatile* - look good in black and white, in different colors and >> sizes depending on context (e.g. billboards, websites, t-shirts, toys, >> business cards, etc) >> - *memorable* - easily recognizable everywhere (e.g. mickey mouse, nike) >> - *timeless* - not just based on what's currently popular >> >> >> >>> On Tue, Oct 25, 2016 at 9:47 AM, <dusenberr...@gmail.com> wrote: >>> >>> Looks like there is a large amount of support for both #1 and #4. Design >>> team, could you provide some more thoughts on the pros and cons for each, >>> and perhaps any thoughts on ways the icons could be used in various >> project >>> materials? >>> >>> -- >>> >>> Mike Dusenberry >>> GitHub: github.com/dusenberrymw >>> LinkedIn: linkedin.com/in/mikedusenberry >>> >>> Sent from my iPhone. >>> >>> >>>> On Oct 25, 2016, at 9:41 AM, Acs S <ac...@yahoo.com.INVALID> wrote: >>>> >>>> I like #4 as well. >>>> +1 on #4. >>>> >>>> -Arvind >>>> >>>> From: Berthold Reinwald <reinw...@us.ibm.com> >>>> To: dev@systemml.incubator.apache.org >>>> Sent: Monday, October 24, 2016 12:34 AM >>>> Subject: Re: [VOTE] SystemML New Logo Ideas >>>> >>>> +1 on #4. >>>> >>>> Regards, >>>> Berthold Reinwald >>>> IBM Almaden Research Center >>>> office: (408) 927 2208; T/L: 457 2208 >>>> e-mail: reinw...@us.ibm.com >>>> >>>> >>>> >>>> From: Luciano Resende <luckbr1...@gmail.com> >>>> To:dev@systemml.incubator.apache.org >>>> Date: 10/21/2016 04:37 PM >>>> Subject:Re: [VOTE] SystemML New Logo Ideas >>>> >>>> >>>> >>>> On Fri, Oct 21, 2016 at 11:27 AM, Frederick R Reiss < >> frre...@us.ibm.com> >>>> wrote: >>>> >>>>> These are awesome! I'm more a fan of option #4 myself. >>>>> >>>>> >>>> I like option $4 myself as well. >>>> >>>> >>>> -- >>>> Luciano Resende >>>> http://twitter.com/lresende1975 >>>> http://lresende.blogspot.com/ >>>> >>>> >>>> >>>> >>>> >>> >> > > > > -- > *Madison J. Myers* > *UC Berkeley, Master of Information & Data Science '17* > > *King's College London, MA Political Science '14* > *New York University, BA Political Science '12* > > - > LinkedIn <http://linkedin.com/in/madisonjmyers>
Re: [VOTE] SystemML New Logo Ideas
Looks like there is a large amount of support for both #1 and #4. Design team, could you provide some more thoughts on the pros and cons for each, and perhaps any thoughts on ways the icons could be used in various project materials? -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 25, 2016, at 9:41 AM, Acs S <ac...@yahoo.com.INVALID> wrote: > > I like #4 as well. > +1 on #4. > > -Arvind > > From: Berthold Reinwald <reinw...@us.ibm.com> > To: dev@systemml.incubator.apache.org > Sent: Monday, October 24, 2016 12:34 AM > Subject: Re: [VOTE] SystemML New Logo Ideas > > +1 on #4. > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: Luciano Resende <luckbr1...@gmail.com> > To:dev@systemml.incubator.apache.org > Date: 10/21/2016 04:37 PM > Subject:Re: [VOTE] SystemML New Logo Ideas > > > > On Fri, Oct 21, 2016 at 11:27 AM, Frederick R Reiss <frre...@us.ibm.com> > wrote: > >> These are awesome! I'm more a fan of option #4 myself. >> >> > I like option $4 myself as well. > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > > > >
Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
+1 I finished running some test jobs in my large scale scenario on this release candidate, and I think it is good to go. Specifically, my scenario involved large numerical DataFrames, MLContext, matrices, DML, and multiple script invocations involving the various intermediate outputs. One option would be to release this candidate as 0.11, and then follow up with a 0.11.1 release containing any bug fixes. This might make sense for edge-case bugs that don't impact normal usage. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 22, 2016, at 2:14 AM, Berthold Reinwald <reinw...@us.ibm.com> wrote: > > -1. > > Transformencode throws an unnecessary error if strings to not comply with > the field requirements specified in RFC 4180. Arvind has a fix on the way > which should be included in the release. > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: dusenberr...@gmail.com > To: dev@systemml.incubator.apache.org > Date: 10/21/2016 11:25 AM > Subject:Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3) > > > > Okay I found out that the error I was encountering occurred due to passing > in a DataFrame with an explicit row index column ("__INDEX") that contain > incorrect row indices. Basically, I had taken a large DataFrame with the > row index column and sampled from it, without updating the row indices. > Thus, I was effectively left with sparse row indices -- i.e. I may have > had rows 2, 18, 587, 398678, etc. The current DataFrame conversion code > appears to not yet be able to handle sparse row indices and thus threw an > exception. When I correctly re-indexed the sampled DataFrame with dense > row indices, everything worked as expected. Of course, our conversion code > automatically adds row indices to a given DataFrame during conversion if > the user does not supply them explicitly. However, it can save a bit of > time on repeated usage if it is done explicitly in one prior batch job. > > I don't think this should block this release, and we should instead think > about this for the next release. I've created SYSTEMML-1053 to track this > issue. > > I'm running a few more tests, and then I'll respond today with a vote. > > -Mike > > -- > > Mike Dusenberry > GitHub: github.com/dusenberrymw > LinkedIn: linkedin.com/in/mikedusenberry > > Sent from my iPhone. > > >> On Oct 20, 2016, at 10:48 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: >> >> Similar release-process steps executed successfully on Windows. >> >> Performance test suite for large data still running; reviewing of > available log files in-progress. >> >> Thanks, >> Glenn >> >> Nakul Jindal ---10/20/2016 02:43:06 PM---Basic sanity tests pasts on > MacOS following the process here: http://apache.github.io/incubator-syst >> >> From: Nakul Jindal <naku...@gmail.com> >> To: dev@systemml.incubator.apache.org >> Date: 10/20/2016 02:43 PM >> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3) >> >> >> >> >> Basic sanity tests pasts on MacOS following the process here: >> > http://apache.github.io/incubator-systemml/release-process.html#all-binaries-execute > >> >> (The in-memory jar was removed by [SYSTEMML-741]) >> >> +1 >> >> Nakul Jindal >> >> >>> On Thu, Oct 20, 2016 at 12:18 PM, <dusenberr...@gmail.com> wrote: >>> >>> Okay I've been testing the release candidate on a large-scale problem, > and >>> I'm currently running into a "java.lang.NegativeArraySizeException" in >>> the SparseBlockMCSR that I do not believe was present previously. I'm >>> currently investigating, and will post again soon. >>> >>> On another note, I successfully ran all of the Python tests on both > Python >>> 2.7 and 3.5. >>> >>> -Mike >>> >>> -- >>> >>> Mike Dusenberry >>> GitHub: github.com/dusenberrymw >>> LinkedIn: linkedin.com/in/mikedusenberry >>> >>> Sent from my iPhone. >>> >>> >>>> On Oct 19, 2016, at 2:46 PM, Glenn Weidner <gweid...@us.ibm.com> > wrote: >>>> >>>> Yes - that is correct for test cases involving ID column for >>> DataFrameVectorFrameConversionTest, DataFrameVectorScriptTest, >>> MLContextTest. The four failures for MLContextFrameTest were slightly >>
Re: [VOTE] SystemML New Logo Ideas
I like all of these options! I'll give a +1 for #1 as the main logo, and I also think it would be great to make use of the rest of the designs throughout the website and other project materials. Thanks!! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 21, 2016, at 1:01 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > All the logos are awesome, thanks design team !! > > I vote for #4. > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > Deron Eriksson ---10/21/2016 12:57:25 PM---Given the overwhelming support for > #1, I give my +1 to #1. Deron > > From: Deron Eriksson <deroneriks...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 10/21/2016 12:57 PM > Subject: Re: [VOTE] SystemML New Logo Ideas > > > > > Given the overwhelming support for #1, I give my +1 to #1. > > Deron > > > On Fri, Oct 21, 2016 at 12:46 PM, Jason Azares <jason.aza...@gmail.com> > wrote: > > > I vote for #1 > > > > On Fri, Oct 21, 2016 at 12:27 PM, Matthias Boehm <mboe...@googlemail.com> > > wrote: > > > > > ha, that's interesting - thanks for the pointer Deron, I wasn't expecting > > > this at all. Somehow my eyes always ignored this. > > > > > > Regards, > > > Matthias > > > > > > > > > On 10/21/2016 9:22 PM, Deron Eriksson wrote: > > > > > >> I think they all look fantastic. My untrained eye likes the features of > > 3 > > >> and 4 but I completely defer to the judgements of others here since I > > have > > >> no training in design and the multitude of considerations involved such > > as > > >> scalability. > > >> > > >> I believe the logo trademark is an official requirement of the ASF ( > > >> http://www.apache.org/foundation/marks/pmcs.html#graphics), although I > > >> don't know how strict this is. > > >> > > >> Deron > > >> > > >> > > >> On Fri, Oct 21, 2016 at 12:15 PM, Matthias Boehm < > > mboe...@googlemail.com> > > >> wrote: > > >> > > >> Thanks for these proposals. For all the options, I'd prefer to remove > > the > > >>> TM - it's just a little odd for an open source project with no > > intentions > > >>> to register a trademark. I know, the new Spark logo has it too but it's > > >>> probably a different context, especially since there are discussions to > > >>> add > > >>> SPARC support in Spark 2.1 ;-) > > >>> > > >>> Regards, > > >>> Matthias > > >>> > > >>> > > >>> On 10/21/2016 8:47 PM, Dexter Lesaca wrote: > > >>> > > >>> +1 for 1 > > >>>> > > >>>> On Fri, Oct 21, 2016 at 11:44 AM Jeremy Anderson < > > >>>> jer...@objectadjective.com> > > >>>> wrote: > > >>>> > > >>>> +1 on option 1 as well. > > >>>> > > >>>>> > > >>>>> For the 4 options, I think it's important that full logo with name > > and > > >>>>> mark, scales well. I'm concerned detail will get lost with the other > > 3, > > >>>>> at > > >>>>> small sizes. I would love to use all of the simple and isometric > > >>>>> versions. > > >>>>> They make a great family. > > >>>>> > > >>>>> ... > > >>>>> > > >>>>> Jeremy Anderson > > >>>>> https://twitter.com/ObjectAdjective > > >>>>> http://www.linkedin.com/in/objectadjective > > >>>>> > > >>>>> On 21 October 2016 at 11:27, Frederick R Reiss <frre...@us.ibm.com> > > >>>>> wrote: > > >>>>> > > >>>>> These are awesome! I'm more a fan of option #4 myself. > > >>>>> > > >>>>>> > > >>>>>> Fred > > >>>>>> > > >>>>>> [image: Inactive hide details for Renee Mascarinas ---10/21/2016 > > >>>>>> 11:19:01 > > >>>>>&g
Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3)
Okay I've been testing the release candidate on a large-scale problem, and I'm currently running into a "java.lang.NegativeArraySizeException" in the SparseBlockMCSR that I do not believe was present previously. I'm currently investigating, and will post again soon. On another note, I successfully ran all of the Python tests on both Python 2.7 and 3.5. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 19, 2016, at 2:46 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: > > Yes - that is correct for test cases involving ID column for > DataFrameVectorFrameConversionTest, DataFrameVectorScriptTest, MLContextTest. > The four failures for MLContextFrameTest were slightly different and involve > similar fix as done for FrameConverterTest under [SYSTEMML-568] where > FrameRDDConverterUtils.csvToRowRDDused to incorporate schema information when > converting to JavaRDD. > > Thanks, > Glenn > > Matthias Boehm ---10/19/2016 12:36:04 PM---Glenn, all these issues were only > caused by wrong tests that used an invalid ID schema or populated > > From: Matthias Boehm <mboe...@googlemail.com> > To: dev@systemml.incubator.apache.org > Date: 10/19/2016 12:36 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3) > > > > > Glenn, all these issues were only caused by wrong tests that used an > invalid ID schema or populated this column incorrectly, right? If so, > then I think it's fine to release. However, if we touch it anyway, we > should globally change the ID schema from double to long, which is more > intuitive when created by hand. > > Regards, > Matthias > > On 10/19/2016 8:30 PM, Deron Eriksson wrote: > > OK, so I think it's my understanding that for the 'src' release for rc3, > > the pom is using Spark 1.4 and the test suite passes for Spark 1.4, so this > > issue being discussed regarding test cases on Spark 1.6 is not a blocker > > for this release since the 'src' release builds and all tests pass. > > > > If this is not correct, could someone please correct me? > > > > Deron > > > > > > On Wed, Oct 19, 2016 at 11:17 AM, Luciano Resende <luckbr1...@gmail.com> > > wrote: > > > >> if tests are consistently failing, then we should cancel the RC and either > >> fix the test or mark it as @ignored. > >> > >> Intermittent fails might be ok, but it's a community decision. > >> > >> On Wed, Oct 19, 2016 at 10:50 AM, Deron Eriksson <deroneriks...@gmail.com> > >> wrote: > >> > >>> I believe that for an Apache release, our test suite is supposed to pass > >>> (although I'm pretty sure random test fails can be ignored). > >>> > >>> See 2.1 of Release Check List here: > >>> http://incubator.apache.org/guides/releasemanagement.html#check-list > >>> > >>> "2.1 Build is successful including automated tests. > >>> The expanded source archive is expected to build and pass tests." > >>> > >>> Luciano, do you happen to know if some test failures are acceptable since > >>> our test suite is so enormous (6300+ tests)? > >>> > >>> Deron > >>> > >>> > >>> > >>> On Wed, Oct 19, 2016 at 3:24 AM, Glenn Weidner <gweid...@us.ibm.com> > >>> wrote: > >>> > >>>> It's a nice-to-have but not a release blocker. > >>>> > >>>> Thanks, > >>>> Glenn > >>>> > >>>> [image: Inactive hide details for Niketan Pansare---10/18/2016 05:38:26 > >>>> PM---Glenn: Would you prefer to have https://github.com/apache/] > >> Niketan > >>>> Pansare---10/18/2016 05:38:26 PM---Glenn: Would you prefer to have > >>>> https://github.com/apache/incubator-systemml/pull/269 in 0.11 releas > >>>> > >>>> From: Niketan Pansare/Almaden/IBM@IBMUS > >>>> To: dev@systemml.incubator.apache.org > >>>> Date: 10/18/2016 05:38 PM > >>>> Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC3) > >>>> -- > >>>> > >>>> > >>>> > >>>> Glenn: Would you prefer to have > >>>> *https://github.com/apache/incubator-systemml/pull/269* > >>>> <https://github.com/apache/incubator-systemml/pull/269> in 0.11 > >> release > >>> ? > >>>> > >>>> Thanks, >
Re: UX Research
This is awesome! I really like the storyboards as they describe the types of scenarios in which SystemML would be really useful. We should continue to work on making sure all of these are successful stories for the project. The website layout analysis is great too -- we definitely need to get to the point where new users understand the project as quickly as possible. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 18, 2016, at 11:10 AM, Madison Myers <madisonjmy...@gmail.com> wrote: > > +1 Luciano > > On Tue, Oct 18, 2016 at 8:52 AM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> Great guys !!! I think having the UX roadmap published is fine, and we >> could just create a roadmap page where we have sections for development and >> ux and fixing SYSTEMML-972 (particularly SYSTEMML-974) will make it much >> simpler to add more contents to the website. >> >> >> https://issues.apache.org/jira/browse/SYSTEMML-972 >> https://issues.apache.org/jira/browse/SYSTEMML-974 >> >> On Mon, Oct 17, 2016 at 5:42 PM, Jeremy Anderson < >> jer...@objectadjective.com >>> wrote: >> >>> Thanks Madison and Felix. To your point Felix, I think you're on the >> nose. >>> I am hoping actionable items for both design and dev will emerge from >> user >>> research. Ideally, I'd love to see a clear direction and roadmap for the >>> future of SystemML begin to take shape. This thread is a great start, but >>> it might also be helpful to start a UX roadmap wiki page. Who can I reach >>> out to for access to publish to the wiki? >>> >>> Jeremy >>> >>> ... >>> >>> Jeremy Anderson >>> https://twitter.com/ObjectAdjective >>> http://www.linkedin.com/in/objectadjective >>> >>>> On 17 October 2016 at 16:33, <fschue...@posteo.de> wrote: >>>> >>>> Jeremy and others, thanks for the detailed presentation! >>>> The storyboards look great and it would be nice to see SystemML getting >>> to >>>> a point where those scenarios just work! >>>> >>>> From the point of development I wonder how much is adding new features >>>> (that enhance user experience) versus making it more stable/reliable >> and >>>> compiling/editing resources and documentation. >>>> It seems to me that what's currently missing are the easy entry points >>>> both in documentation and user interfaces (API's, notebooks, quickstart >>>> guides, ...) that are so perfectly depicted in those storyboards. >>>> >>>> I hope to see an outcome of actionable items for developers from this >> UX >>>> research that we can manifest in concrete Jiras to work on. >>>> >>>> Felix >>>> >>>> >>>> >>>> >>>> Am 18.10.2016 00:39 schrieb Jeremy Anderson: >>>> >>>>> Hi all, >>>>> >>>>> I began working with a few designers on UX research for SystemML. We >>>>> synthesized some of our early findings to share with everyone. From >> some >>>>> of >>>>> the pain points that surfaced in our research, we began storyboarding >>> user >>>>> scenarios and look for ways we might be able to improve user >> experience >>>>> and >>>>> increase adoption. I wanted to start a discussion around this UX and >>>>> research. Here's a link to the research we've synthesized so. I'd love >>>>> input/thoughts from everyone. >>>>> >>>>> https://drive.google.com/file/d/0B2__Aw0kKn-uTWJ4S0ZvcHhhTE0/view >>>>> >>>>> Cheers, >>>>> >>>>> Jeremy >>>>> >>>>> ... >>>>> >>>>> Jeremy Anderson >>>>> https://twitter.com/ObjectAdjective >>>>> http://www.linkedin.com/in/objectadjective >>>>> >>>> >>> >> >> >> >> -- >> Luciano Resende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ >> > > > > -- > *Madison J. Myers* > *UC Berkeley, Master of Information & Data Science '17* > > *King's College London, MA Political Science '14* > *New York University, BA Political Science '12* > > - > LinkedIn <http://linkedin.com/in/madisonjmyers>
Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
+1 for SYSTEMML-951 -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 5, 2016, at 1:17 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > as the Python DSL is still in experimental status, I don't think that > SYSTEMML-1013 is blocking the release. However, there is one more > nice-to-have performance feature I'd like to include: SYSTEMML-951 (right > indexing via lookup). If nobody objects, we could cut tomorrow once 951 is > in; if somebody get's a chance to look into 1013 then we could include this > too. > > Regards, > Matthias > > Acs S ---10/05/2016 12:19:57 PM---Imran has opened Jira 1013. -Arvind > > From: Acs S <ac...@yahoo.com.INVALID> > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org> > Date: 10/05/2016 12:19 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > > Imran has opened Jira 1013. > -Arvind > > From: Matthias Boehm <mbo...@us.ibm.com> > To: dev@systemml.incubator.apache.org > Sent: Tuesday, October 4, 2016 5:43 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > ok, SYSTEMML-1009 has been resolved too. > > Regards, > Matthias > > Acs S ---10/04/2016 05:30:52 PM---There is one more issue I am aware of: > Imran facing max recursion issue in toNumPyArray().Not sure > > From: Acs S <ac...@yahoo.com.INVALID> > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org> > Date: 10/04/2016 05:30 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > There is one more issue I am aware of: Imran facing max recursion issue in > toNumPyArray().Not sure if Imran has opened Jira or not, but we need > resolution for it. > -Arvind > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org > Sent: Tuesday, October 4, 2016 5:03 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > Ok, so looks like we are down to waiting on SYSTEMML-1009. > > On Tue, Oct 4, 2016 at 4:44 PM, <dusenberr...@gmail.com> wrote: > > > The Python test failure issue has been resolved in SYSTEMML-1005. From my > > end, we are ready to go. > > > > -Mike > > > > -- > > > > Mike Dusenberry > > GitHub: github.com/dusenberrymw > > LinkedIn: linkedin.com/in/mikedusenberry > > > > Sent from my iPhone. > > > > > > > On Oct 4, 2016, at 2:02 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > > > > > apart from the recently resolved SYSTEMML-1004 and SYSTEMML-1008, there > > is one more performance fix I'd like to get in: SYSTEMML-1009. > > > > > > Regards, > > > Matthias > > > > > > Luciano Resende ---10/04/2016 12:29:12 PM---Mike, are these Python > > failures still blocking the next RC ? Please let me know, as I am waiting > > for > > > > > > From: Luciano Resende <luckbr1...@gmail.com> > > > To: dev@systemml.incubator.apache.org > > > Date: 10/04/2016 12:29 PM > > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > > > > > > > > > > > > Mike, are these Python failures still blocking the next RC ? Please let > > me > > > know, as I am waiting for the green light to cut the RC2. > > > > > > On Mon, Oct 3, 2016 at 9:41 AM, <dusenberr...@gmail.com> wrote: > > > > > > > Yeah I can confirm that all of those issues are now resolved, which is > > > > great! However, I'm seeing a test failure in the Python mllearn tests > > > > today that I want to look into before we cut. > > > > > > > > -Mike > > > > > > > > -- > > > > > > > > Mike Dusenberry > > > > GitHub: github.com/dusenberrymw > > > > LinkedIn: linkedin.com/in/mikedusenberry > > > > > > > > Sent from my iPhone. > > > > > > > > > > > > > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> > > wrote: > > > > > > > > > > yes, I just closed them - I left them open for Mike to confirm, but > > we > > > > resolved all known issues yesterday together. We should be good to go. > > > > > > > > > > Regards, > > > > > Matthias > > > > > > > > > > Luciano Resende ---10/02/2016 08:30:37 PM---I still see the following &g
Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
The Python test failure issue has been resolved in SYSTEMML-1005. From my end, we are ready to go. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 4, 2016, at 2:02 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > apart from the recently resolved SYSTEMML-1004 and SYSTEMML-1008, there is > one more performance fix I'd like to get in: SYSTEMML-1009. > > Regards, > Matthias > > Luciano Resende ---10/04/2016 12:29:12 PM---Mike, are these Python failures > still blocking the next RC ? Please let me know, as I am waiting for > > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 10/04/2016 12:29 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > > Mike, are these Python failures still blocking the next RC ? Please let me > know, as I am waiting for the green light to cut the RC2. > > On Mon, Oct 3, 2016 at 9:41 AM, <dusenberr...@gmail.com> wrote: > > > Yeah I can confirm that all of those issues are now resolved, which is > > great! However, I'm seeing a test failure in the Python mllearn tests > > today that I want to look into before we cut. > > > > -Mike > > > > -- > > > > Mike Dusenberry > > GitHub: github.com/dusenberrymw > > LinkedIn: linkedin.com/in/mikedusenberry > > > > Sent from my iPhone. > > > > > > > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > > > > > yes, I just closed them - I left them open for Mike to confirm, but we > > resolved all known issues yesterday together. We should be good to go. > > > > > > Regards, > > > Matthias > > > > > > Luciano Resende ---10/02/2016 08:30:37 PM---I still see the following > > jiras, which were mentioned on this thread, open: https://issues.apache.or > > > > > > From: Luciano Resende <luckbr1...@gmail.com> > > > To: dev@systemml.incubator.apache.org > > > Date: 10/02/2016 08:30 PM > > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > > > > > > > > > > > > I still see the following jiras, which were mentioned on this thread, > > open: > > > > > > https://issues.apache.org/jira/browse/SYSTEMML-993 > > > https://issues.apache.org/jira/browse/SYSTEMML-994 > > > https://issues.apache.org/jira/browse/SYSTEMML-995 > > > > > > Did folks forgot to clode the jiras ? Or are there things that still need > > > to be handled here ? > > > > > > > > > On Sat, Oct 1, 2016 at 2:41 PM, Matthias Boehm <mbo...@us.ibm.com> > > wrote: > > > > > > > ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved - > > > > from my perspective we're ready to cut a new RC. > > > > > > > > Regards, > > > > Matthias > > > > > > > > [image: Inactive hide details for Matthias Boehm---09/29/2016 10:44:51 > > > > PM---just a quick update: SYSTEMML-969 has been resolved too. > > Th]Matthias > > > > Boehm---09/29/2016 10:44:51 PM---just a quick update: SYSTEMML-969 has > > been > > > > resolved too. The open issues are SYSTEMML-993, SYSTEMML- > > > > > > > > From: Matthias Boehm/Almaden/IBM@IBMUS > > > > To: dev@systemml.incubator.apache.org > > > > Date: 09/29/2016 10:44 PM > > > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > -- > > > > > > > > > > > > > > > > just a quick update: SYSTEMML-969 has been resolved too. The open > > issues > > > > are SYSTEMML-993, SYSTEMML-994, and the new SYSTEMML-995. We should be > > able > > > > to resolve them by tomorrow to give everybody a chance of testing a > > new RC > > > > over the weekend. > > > > > > > > Regards, > > > > Matthias > > > > > > > > Acs S ---09/29/2016 05:31:23 PM---SYSTEMML-964 being addressed (I added > > > > changes and with UTF support Matthias added he reverted change > > > > > > > > From: Acs S <ac...@yahoo.com.INVALID> > > > > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator. > > apache.org > > > > > > > > > Date: 09/29/2016 05:31 PM > > > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1)
Yeah I can confirm that all of those issues are now resolved, which is great! However, I'm seeing a test failure in the Python mllearn tests today that I want to look into before we cut. -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Oct 2, 2016, at 8:35 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > yes, I just closed them - I left them open for Mike to confirm, but we > resolved all known issues yesterday together. We should be good to go. > > Regards, > Matthias > > Luciano Resende ---10/02/2016 08:30:37 PM---I still see the following jiras, > which were mentioned on this thread, open: https://issues.apache.or > > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 10/02/2016 08:30 PM > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > > I still see the following jiras, which were mentioned on this thread, open: > > https://issues.apache.org/jira/browse/SYSTEMML-993 > https://issues.apache.org/jira/browse/SYSTEMML-994 > https://issues.apache.org/jira/browse/SYSTEMML-995 > > Did folks forgot to clode the jiras ? Or are there things that still need > to be handled here ? > > > On Sat, Oct 1, 2016 at 2:41 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: > > > ok the blocking issues SYSTEMML-993, 994, and 995 have been resolved - > > from my perspective we're ready to cut a new RC. > > > > Regards, > > Matthias > > > > [image: Inactive hide details for Matthias Boehm---09/29/2016 10:44:51 > > PM---just a quick update: SYSTEMML-969 has been resolved too. Th]Matthias > > Boehm---09/29/2016 10:44:51 PM---just a quick update: SYSTEMML-969 has been > > resolved too. The open issues are SYSTEMML-993, SYSTEMML- > > > > From: Matthias Boehm/Almaden/IBM@IBMUS > > To: dev@systemml.incubator.apache.org > > Date: 09/29/2016 10:44 PM > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > -- > > > > > > > > just a quick update: SYSTEMML-969 has been resolved too. The open issues > > are SYSTEMML-993, SYSTEMML-994, and the new SYSTEMML-995. We should be able > > to resolve them by tomorrow to give everybody a chance of testing a new RC > > over the weekend. > > > > Regards, > > Matthias > > > > Acs S ---09/29/2016 05:31:23 PM---SYSTEMML-964 being addressed (I added > > changes and with UTF support Matthias added he reverted change > > > > From: Acs S <ac...@yahoo.com.INVALID> > > To: "dev@systemml.incubator.apache.org" <dev@systemml.incubator.apache.org > > > > > Date: 09/29/2016 05:31 PM > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > -- > > > > > > > > SYSTEMML-964 being addressed (I added changes and with UTF support > > Matthias added he reverted changes) > > > > -Arvind > > > > From: "dusenberr...@gmail.com" <dusenberr...@gmail.com> > > To: dev@systemml.incubator.apache.org > > Sent: Thursday, September 29, 2016 2:31 PM > > Subject: Re: [VOTE] Apache SystemML 0.11.0-incubating (RC1) > > > > I've also opened SYSTEMML-993 that relates to poor performance for vector > > DataFrame conversions, as well as SYSTEMML-994 for GC OOM on SystemML > > matrix to frame conversions that would both be good to work on. > > > > -- > > > > Mike Dusenberry > > GitHub: github.com/dusenberrymw > > LinkedIn: linkedin.com/in/mikedusenberry > > > > Sent from my iPhone. > > > > > > > On Sep 29, 2016, at 12:32 PM, Luciano Resende <luckbr1...@gmail.com> > > wrote: > > > > > >> On Thu, Sep 29, 2016 at 11:11 AM, Matthias Boehm <mbo...@us.ibm.com> > > wrote: > > >> > > >> SYSTEMML-968 has been resolved too but we're still waiting for > > >> SYSTEMML-964. Furthermore, there is also a nice-to-have feature we want > > to > > >> get it in: SYSTEMML-969 (extended dataframe - frame converter). > > >> > > >> Regards, > > >> Matthias > > > Great progress !!! > > > > > > Matthias, please let us know when these issues get resolved and I will > > work > > > on RC2. > > > > > > -- > > > Luciano Resende > > > *http://twitter.com/lresende1975* <http://twitter.com/lresende1975> > > > *http://lresende.blogspot.com/* <http://lresende.blogspot.com/> > > > > > > > > > > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > >
Re: [DISCUSS] Apache SystemML Release 1.0.0
Yes I'm also in favor of moving to a 1.0 version for our upcoming release targeting the Spark 1.x series. Since we'll also be subsequently releasing a version targeting the Spark 2.x series, I would also like to suggest that we name that version 2.0. This version naming scheme would allow us to easily associate a SystemML version with the Spark series that it targets, thus reducing confusion for a user. Rather than view a 2.0 version as a successor to 1.0, let's view it instead as simply a naming scheme that corresponds to the targeted version of Spark. So, 1.0 would be our upcoming release targeting Spark 1.x, and 2.0 would be our upcoming release targeting Spark 2.x. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Aug 24, 2016, at 4:53 PM, Frederick R Reiss <frre...@us.ibm.com> wrote: > > I would favor declaring a 1.0 release. Having two digits in the minor release > is a bit awkward, and the project has progressed enough in terms of > functionality and stability to warrant a major release number bump. > > Fred > > Luciano Resende ---08/24/2016 11:19:20 AM---With the decision to have sort of > two code streams, one to support 1.0x and another to support 2.x, > > From: Luciano Resende <luckbr1...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 08/24/2016 11:19 AM > Subject: [DISCUSS] Apache SystemML Release 1.0.0 > > > > > With the decision to have sort of two code streams, one to support 1.0x and > another to support 2.x, I was wondering that we should call the next 1.x > release our SystemML 1.0.0 release. > > Thoughts ? > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/ > > >
Re: [DISCUSS] SystemML with Spark 2.0 support and roadmap
I think this is a great idea so that we can simplify the official release and reduce confusion for potential users. Certainly we can still retain the potential to build the extra artifacts locally, just like Spark current does. I would also like to suggest that we move away from the current Standalone package that is designed to be used with Java, and instead move to simply using Spark in local mode for all "standalone" applications. Since running Spark locally on a laptop consists of simply downloading a release binary and running it, without any installation, I think this is a much cleaner way now. This would allow us to immediately move to the goal of only releasing a single JAR file, as that same JAR file could be used in Spark locally, Spark on a cluster, and Hadoop on a cluster. Then we could just release the single JAR file and a folder of scripts as our official release. All other special artifacts could be kept as "download and build" artifacts. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Aug 23, 2016, at 5:22 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > On Tue, Aug 23, 2016 at 3:51 PM, Deron Eriksson <deroneriks...@gmail.com> > wrote: > >> To simplify release candidate validation, I would like to propose that the >> distribution profile only builds the following 7 (out of the current >> included 10) artifacts: >> >> systemml-0.11.0-incubating-SNAPSHOT-javadoc.jar >> systemml-0.11.0-incubating-SNAPSHOT-sources.jar >> systemml-0.11.0-incubating-SNAPSHOT-src.tar.gz >> systemml-0.11.0-incubating-SNAPSHOT-src.zip >> systemml-0.11.0-incubating-SNAPSHOT-standalone.tar.gz (rename w/o >> "-standalone") >> systemml-0.11.0-incubating-SNAPSHOT-standalone.zip (rename w/o >> "-standalone") >> systemml-0.11.0-incubating-SNAPSHOT.jar >> >> The following could still be built using maven profiles but would not be in >> the distribution profile: >> >> systemml-0.11.0-incubating-SNAPSHOT-standalone.jar >> systemml-0.11.0-incubating-SNAPSHOT.tar.gz (also rename) >> systemml-0.11.0-incubating-SNAPSHOT.zip (also rename) >> >> This would decrease the number of our artifacts by 30% which means that we >> can validate the release faster, and the release candidate will also be >> more likely to pass external validation/voting. >> >> Deron > +1 > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: Preview tag, was Re: [2/2] incubator-systemml git commit: Preparing SystemML development version 0.11.0-incubating-SNAPSHOT.
Thanks, Luciano for pointing this out. As you mentioned, the intent was definitely just to tag a commit that was known to be stable on the Spark 1.x line. I've deleted the existing tag, and created a new "spark-1.x-stable" tag simply pointing to a previous commit that was tested on Spark 1.x. Thanks! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Aug 17, 2016, at 11:18 AM, Luciano Resende <luckbr1...@gmail.com> wrote: > > -1 > > Sorry Folks, this isn't a voted release and thus creating a tag without > SNAPSHOT is not valid. Please delete this tag. > > If what is wanted is to have a stable point in the codebase where folks can > go back if a release is needed for 1.x, then just create a branch/tag with > a descriptive name (e.g. spark_1.x_stable). > > If you actually want a release, there is a need to follow the Apache > Release vote process (e.g. see > https://www.mail-archive.com/dev%40spark.apache.org/msg14223.html for Spark > preview release vote) > > Thanks > > >> On Wed, Aug 17, 2016 at 1:21 PM, <dusenberr...@apache.org> wrote: >> >> Preparing SystemML development version 0.11.0-incubating-SNAPSHOT. >> >> >> Project: http://git-wip-us.apache.org/repos/asf/incubator-systemml/repo >> Commit: http://git-wip-us.apache.org/repos/asf/incubator-systemml/ >> commit/b6bde0d4 >> Tree: http://git-wip-us.apache.org/repos/asf/incubator-systemml/ >> tree/b6bde0d4 >> Diff: http://git-wip-us.apache.org/repos/asf/incubator-systemml/ >> diff/b6bde0d4 >> >> Branch: refs/heads/master >> Commit: b6bde0d4599d551cf1dc903c72662888abc22787 >> Parents: 05b6da0 >> Author: Mike Dusenberry <mwdus...@us.ibm.com> >> Authored: Wed Aug 17 10:17:52 2016 -0700 >> Committer: Mike Dusenberry <mwdus...@us.ibm.com> >> Committed: Wed Aug 17 10:17:52 2016 -0700 >> >> -- >> pom.xml | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> -- >> >> >> http://git-wip-us.apache.org/repos/asf/incubator-systemml/ >> blob/b6bde0d4/pom.xml >> -- >> diff --git a/pom.xml b/pom.xml >> index aba8808..a4c66a1 100644 >> --- a/pom.xml >> +++ b/pom.xml >> @@ -25,7 +25,7 @@ >>18 >> >>org.apache.systemml >> - 0.11.0-incubating-preview >> + 0.11.0-incubating-SNAPSHOT >>systemml >>jar >>SystemML >> @@ -41,7 +41,7 @@ >>scm:git:g...@github.com:apache/incubator- >> systemml >>scm:git:h >> ttps://git-wip-us.apache.org/repos/asf/incubator-systemml> developerConnection> >>https://git-wip-us.apache.org/repos/asf?p= >> incubator-systemml.git >> - 0.11.0-incubating-preview >> + HEAD >> >> >>JIRA > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: [DISCUSS] Migration to Spark 2.0.0
Yes, I think this approach sounds great. To that end, I created a new tag "0.11.0-incubating-preview" that points to a specific commit that contains new features that will be in the 0.11 release with specific support for the Spark 1.x line. - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Aug 16, 2016, at 4:44 PM, Frederick R Reiss <frre...@us.ibm.com> wrote: > > I think the approach Glenn proposes here is fine. > > Fred > > Deron Eriksson ---08/16/2016 02:41:51 PM---Hi Glenn, I am fine with this > approach. If this approach is taken, I would like to > > From: Deron Eriksson <deroneriks...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 08/16/2016 02:41 PM > Subject: Re: [DISCUSS] Migration to Spark 2.0.0 > > > > > Hi Glenn, > > I am fine with this approach. If this approach is taken, I would like to > set the documentation version in _config.yml to 0.10.x before the project > is tagged (I recently set it to 0.11). > > Deron > > > On Thu, Aug 11, 2016 at 3:40 PM, Glenn Weidner <gweid...@us.ibm.com> wrote: > > > I would like to propose an alternative to supporting Spark 2.0 and Spark > > 1.x within single stream. > > > > 1) Capture snapshot and establish label of current Apache SystemML master > > which includes new features added since 0.10.0 release. > > > > 2) After step 1 completed, enable master to move forward with support for > > Spark 2.x only. > > > > This is similar to what Fred initially proposed except step 1 would not > > involve a separate release. The 0.11 release of Apache SystemML would be > > compatible for Spark 2.0 and Scala 2.11. > > > > Thanks, > > Glenn > > > > [image: Inactive hide details for Glenn Weidner---08/08/2016 03:33:43 > > PM---As a preliminary experiment in attempt to compile against bo]Glenn > > Weidner---08/08/2016 03:33:43 PM---As a preliminary experiment in attempt > > to compile against both Spark 2.0.0 and Spark 1.6.2 from same > > > > From: Glenn Weidner/Silicon Valley/IBM@IBMUS > > To: dev@systemml.incubator.apache.org > > Date: 08/08/2016 03:33 PM > > Subject: Re: [DISCUSS] Migration to Spark 2.0.0 > > -- > > > > > > > > As a preliminary experiment in attempt to compile against both Spark 2.0.0 > > and Spark 1.6.2 from same code base, I made another set of changes for > > comparison against previous proposed changes for [SYSTEMML-776]. > > This experimental set can be viewed here: > > > > *https://github.com/gweidner/incubator-systemml/commit/0611f0c197e4a0e816b3325093168bc5162d62c0* > > <https://github.com/gweidner/incubator-systemml/commit/0611f0c197e4a0e816b3325093168bc5162d62c0> > > > > This compiles against Spark 2.0.0 and Spark 1.6.2 except for fit/transform > > overrides in LogisticRegression.scala due to: > > SPARK-14500 Accept Dataset[] instead of DataFrame in MLlib APIs > > > > Detailed code comments and suggestions to try out can be made in the > > branch commit instead of this mail thread. > > > > Thanks, > > Glenn > > > > Deron Eriksson ---08/05/2016 02:02:10 PM---I am open to the idea of > > supporting Spark 2 and Spark<2 concurrently if someone shows that it can be > > > > From: Deron Eriksson <deroneriks...@gmail.com> > > To: dev@systemml.incubator.apache.org > > Date: 08/05/2016 02:02 PM > > Subject: Re: [DISCUSS] Migration to Spark 2.0.0 > > -- > > > > > > > > I am open to the idea of supporting Spark 2 and Spark<2 concurrently if > > someone shows that it can be accomplished with minimal inconvenience. > > > > However, I would lean towards Fred's approach (Spark 1.6 release followed > > shortly by a Spark 2 release). If possible, I want to be able to focus most > > of our efforts towards the future rather than the past. > > > > Deron > > > > > > On Thu, Aug 4, 2016 at 10:59 AM, Luciano Resende <luckbr1...@gmail.com> > > wrote: > > > > > That was going to be my suggestion... In Zeppelin, we just introduced > > > support for different versions of scala and added support for spark 2.0 > > > based on profiles and a bit of reflections... > > > > > > Do we have to do anything related to Scala versions as well ? > > > > > > On Thursday, August 4, 2016, Matthias Boehm <mbo...@us.ibm.com> wrote: > > > > > > > I would recommend to start an in
0.10 Maintenance Branch
Hi all, Just FYI, I created a new "branch-0.10" branch to track any bug fixes that we would like to eventually release in a 0.10.1 release. Moving forward, please push any future bug fixes that would be applicable to the 0.10 series to this branch, in addition to the master branch. Additionally, please run tests on both branches. I've started with a bug fix to our existing Python API that prevented usage in Python 3. - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone.
Re: Build failed in Jenkins: SystemML-DailyTest #340
Just FYI, I pushed a hotfix for the RAT failures, which were due to a couple of new Jupyter notebooks I added yesterday. Thanks! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jun 25, 2016, at 12:31 AM, jenk...@spark.tc wrote: > > See <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/340/changes> > > Changes: > > [Glenn Weidner] [SYSTEMML-771] Fix warnings in CsplineCG.dml and CsplineDS.dml > > [mwdusenb] [SYSTEMML-618] SystemML-NN: Adding an MNIST softmax classifier > example, > > [mwdusenb] [SYSTEMML-618] SystemML-NN: Updating the MNIST softmax classifier > > [mwdusenb] [SYSTEMML-618] SystemML-NN: Adding an MNIST "LeNet" neural net > example, > > [Matthias Boehm] [SYSTEMML-556] Simplified json meta data string > construction, for apis > > [Matthias Boehm] [SYSTEMML-630] Fix robustness csv frame readers (count num > columns) > > -- > [...truncated 300 lines...] > [INFO] Copying core-1.1.2.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/core-1.1.2.jar> > [INFO] Copying jetty-util-6.1.26.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jetty-util-6.1.26.jar> > [INFO] Copying jackson-core-2.4.4.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jackson-core-2.4.4.jar> > [INFO] Copying test-interface-1.0.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/test-interface-1.0.jar> > [INFO] Copying snappy-java-1.1.1.7.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/snappy-java-1.1.1.7.jar> > [INFO] Copying hamcrest-core-1.3.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/hamcrest-core-1.3.jar> > [INFO] Copying uncommons-maths-1.2.2a.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/uncommons-maths-1.2.2a.jar> > [INFO] Copying jsp-api-2.1.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jsp-api-2.1.jar> > [INFO] Copying jersey-server-1.9.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jersey-server-1.9.jar> > [INFO] Copying pyrolite-4.4.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/pyrolite-4.4.jar> > [INFO] Copying compress-lzf-1.0.3.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/compress-lzf-1.0.3.jar> > [INFO] Copying xmlenc-0.52.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/xmlenc-0.52.jar> > [INFO] Copying zookeeper-3.4.5.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/zookeeper-3.4.5.jar> > [INFO] Copying jasper-runtime-5.5.23.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jasper-runtime-5.5.23.jar> > [INFO] Copying hadoop-hdfs-2.4.1.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/hadoop-hdfs-2.4.1.jar> > [INFO] Copying antlr4-runtime-4.3.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/antlr4-runtime-4.3.jar> > [INFO] Copying curator-framework-2.4.0.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/curator-framework-2.4.0.jar> > [INFO] Copying jodd-core-3.6.3.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/jodd-core-3.6.3.jar> > [INFO] Copying commons-net-2.2.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/commons-net-2.2.jar> > [INFO] Copying json4s-ast_2.10-3.2.10.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/json4s-ast_2.10-3.2.10.jar> > [INFO] Copying commons-lang3-3.3.2.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/commons-lang3-3.3.2.jar> > [INFO] Copying py4j-0.8.2.1.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/py4j-0.8.2.1.jar> > [INFO] Copying stream-2.7.0.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/stream-2.7.0.jar> > [INFO] Copying hadoop-mapreduce-client-shuffle-2.4.1.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/hadoop-mapreduce-client-shuffle-2.4.1.jar> > [INFO] Copying slf4j-api-1.7.10.jar to > <https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/ws/target/lib/slf4j-api-1.7.10.jar>
Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2)
+1 Tested the main JAR with a PySpark Jupyter notebook. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jun 1, 2016, at 12:16 PM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > +1, but please note following findings: > > 1. Is the *source-release.zip artifact unnecessary, since we have > src.tar.gz and src.zip artifacts? Also, it contains the Hadoop binaries. > So, it can't be used as the "source release" artifact. > 2. No standalone uberjar is present (I am happy with this since no one to > my knowledge is using it and the LICENSE/NOTICE may need updating. I would > like to remove this artifact forever.) > 3. No in-memory jar is present (I am happy with this too since this > artifact is not very lightweight as it was probably initially meant to be.) > > Deron > > > > > > On Wed, Jun 1, 2016 at 10:01 AM, Frederick R Reiss <frre...@us.ibm.com> > wrote: > >> +1 >> >> Sent from my iPhone using IBM Verse >> >> On Jun 1, 2016, 9:31:36 AM, reinw...@us.ibm.com wrote: >> >> From: reinw...@us.ibm.com >> To: dev@systemml.incubator.apache.org >> Cc: >> Date: Jun 1, 2016 9:31:36 AM >> Subject: Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2) >> >> >> +1 >> Regards, >> Berthold Reinwald >> IBM Almaden Research Center >> office: (408) 927 2208; T/L: 457 2208 >> e-mail: reinw...@us.ibm.com >> From: Shirish Tatikonda >> To: dev@systemml.incubator.apache.org >> Date: 06/01/2016 12:47 AM >> Subject:Re: [VOTE] Apache SystemML 0.10.0-incubating (RC2) >> +1 >>> On Jun 1, 2016 12:40 AM, "Matthias Boehm" wrote: >>> +1, but if there is a third rc, let us please create a branch or cut >> the >>> release as of today to ensure no new features are leaking in. >>> >>> Regards, >>> Matthias >>> >>> [image: Inactive hide details for Luciano Resende ---05/31/2016 >> 10:05:48 >>> PM---Please vote on releasing the following candidate as Apach]Luciano >>> Resende ---05/31/2016 10:05:48 PM---Please vote on releasing the >> following >>> candidate as Apache SystemML version 0.10.0-incubating ! >>> >>> From: Luciano Resende >>> To: dev@systemml.incubator.apache.org >>> Date: 05/31/2016 10:05 PM >>> Subject: [VOTE] Apache SystemML 0.10.0-incubating (RC2) >>> -- >>> >>> >>> >>> Please vote on releasing the following candidate as Apache SystemML >> version >>> 0.10.0-incubating ! >>> >>> The vote is open for at least 72 hours and will close on Saturday, >>> Wednesday 25 and passes if a majority of at least 3 +1 PMC votes are >> cast. >>> >>> [ ] +1 Release this package as Apache SystemML 0.10.0-incubating >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache SystemML, please see >>> http://systemml.apache.org/ >>> >>> The tag to be voted on is v0.10.0-incubating-rc2 >>> (3d5f9b11741f6d6ecc6af7cbaa1069cde32be838) >> >> https://github.com/apache/incubator-systemml/tree/3d5f9b11741f6d6ecc6af7cbaa1069cde32be838 >>> >>> The release artifacts can be found at : >> >> https://dist.apache.org/repos/dist/dev/incubator/systemml/0.10.0-incubating-rc2/ >>> >>> The maven release artifacts, including signatures, digests, etc. can be >>> found at: >> >> https://repository.apache.org/content/repositories/orgapachesystemml-1006/ >>> >>> >>> = >>> == Apache Incubator release policy == >>> = >>> Please find below the guide to release management during incubation: >>> http://incubator.apache.org/guides/releasemanagement.html >>> >>> === >>> == How can I help test this release? == >>> === >>> If you are a SystemML user, you can help us test this release by taking >> an >>> existing Algorithm or workload and running on this release candidate, >> then >>> reporting any regressions. >>> >>> >>> == What justifies a -1 vote for this release? == >>> >>> -1 votes should only occur for significant stop-ship bugs or legal >> related >>> issues (e.g. wrong license, missing header files, etc). Minor bugs or >>> regressions should not block this release. >>> >>> -- >>> Luciano Resende >>> http://twitter.com/lresende1975 >>> http://lresende.blogspot.com/ >>
Re: Discussion on GPU backend
In my opinion, the problem with using a separate branch with longer-term work, rather than smaller PRs into the master, is that after several commits, say 10 or 20, it becomes much more difficult to rebase without running into nasty merge conflicts, especially when those conflicts are on an intermediate commit so one would have to remember what the code looked like at that point in time to properly fix the conflicts. To me, this invites issues such as duplicated code and slower progress. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On May 25, 2016, at 9:01 AM, Luciano Resende <luckbr1...@gmail.com> wrote: > > On Wed, May 25, 2016 at 6:03 AM, Berthold Reinwald <reinw...@us.ibm.com> > wrote: > >> the discussion is less about (1), (2), or (3). As practiced so far, (3) is >> the way to go. >> >> The question is about (A) or (B). Curious was the Apache suggested >> practice is. > Apache is key on fostering open collaboration, so specifically about > branching, having a SystemML branch that is used for > collaboration/experimentation is probably preferable, as it gives > visibility to others on the community, enables iterative development trough > review of small patches, while shield the trunk of issues these experiments > can cause. > > I would just recommend to avoid making the branch stale, and keep rebasing > it with latest master, which will make integration much easier in the > future. > > > > -- > Luciano Resende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Re: Discussion on GPU backend
Yeah to do this in the most "Apache Way (TM)", as well as to maintain sanity, we should definitely use JIRA issues (ideally actual "sub tasks") and PRs to split up major features. It would also be great to split it up into chunks of varying complexity that do not block others, so that we could gather more contributors of various SystemML experience levels. The JIRA issues should be used to divvy up tasks, and PRs should be used to propose an implementation for that task, which would be followed by the usual comments from other contributors. As for a few other best practices with PRs, the PRs should also be merged with a "Closes #172." line appended to the end, where the number reflects the GitHub PR number, so that the conversations on a PR are linked to the final merged commit. Also, any necessary rebasing on a PR should be done by simply overwriting that PR branch (which exists on the contributor's fork of SystemML), which allows GitHub to keep the same PR open, and thus the entire conversation can be followed. Excited about the GPU work! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On May 25, 2016, at 8:08 AM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Thanks Berthold and Matthias for your suggestions. It is important to note > whether we go with (A) or (B), the initial PR will be squashed in one commit > and individual commits by external contributor will be lost in the process. > However, since we are planning to go with option (3), the impact won't be too > severe. > > Matthias: Here are my thoughts regarding the unknowns for GPU backend: > 1. Handling of native libraries: > Both JCuda and Nvidia provide shared libraries/DLL for most OS/platforms > along with installation instructions. > > For deployment: > As per the previous email, the native libraries will be treated as an > external dependency, just like hadoop/spark. For example: if someone > executes: "hadoop jar SystemML.jar -f test.dml -exec hybrid_spark", she will > get "Class Not Found" exception. In similar fashion, if the user doesnot > include JCu*.jar or provide native libraries (JCu*.dll/so or CUDA or CuDNN) > and supplies "-accelerator" flag, a "Class not found" or "Cannot load .." > exception will be thrown respectively. If user doesnot supply "-accelerator" > flag, SystemML will proceed will normal execution as it does today. > > For dev: > We are planning to host jcu*.jar into one of maven repository. Once that's > done, the "system" scope in pom will be replaced by "provided" scope and the > jcu*.jars will be deleted from PR. Like deployment, it is responsibility of > the developer to install native libraries if she intends to work on GPU > backend. > > For testing: > The user can set the environment variable "CUDA_PATH" and set TEST_GPU flag > to enable GPU tests (Please see > https://github.com/apache/incubator-systemml/pull/165/files#diff-bcda036e4c3ff62cb2648acbbd19f61aR113). > The PR will be accompanied by additional tests which will be enabled only > when TEST_GPU is set. Having TEST_GPU flag allows users without Nvidia GPU to > run the integration test. Like deployment, it is responsibility of the > developer to install native libraries for testing with TEST_GPU flag. > > The first version will not contain custom native kernels. > > 2. I can add the summary of the performance comparisons in the PR :) > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > Berthold Reinwald---05/25/2016 06:03:55 AM---the discussion is less about > (1), (2), or (3). As practiced so far, (3) is the way to go. > > From: Berthold Reinwald/Almaden/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 05/25/2016 06:03 AM > Subject: Re: Discussion on GPU backend > > > > > the discussion is less about (1), (2), or (3). As practiced so far, (3) is > the way to go. > > The question is about (A) or (B). Curious was the Apache suggested > practice is. > > Regards, > Berthold Reinwald > IBM Almaden Research Center > office: (408) 927 2208; T/L: 457 2208 > e-mail: reinw...@us.ibm.com > > > > From: Matthias Boehm/Almaden/IBM@IBMUS > To: dev@systemml.incubator.apache.org > Date: 05/24/2016 09:10 PM > Subject:Re: Discussion on GPU backend > > > > Generally, I think we should really stick to (3) as done in the past, > i.e., bring up major features in the roadmap discussions, create jira > epics
Re: Draft - May 2016 SystemML Incubator Podling Report
I might add that we are preparing to release our next version very soon. Otherwise, LGTM. Thanks, Deron! -Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On May 2, 2016, at 1:11 PM, Niketan Pansare <npan...@us.ibm.com> wrote: > > Hi Deron, > > Thanks for writing the draft. I also presented SystemML at Rice University. > Can you please add it to the report ? > > Link to the event on Rice CS calendar: > https://calendar.google.com/calendar/render?eid=MTdqZnJmZHZqM2ExNWlkbWtwa2czZXFzYmcgZnBoYmd1b3JsbzM2azJ0MWk4djk5ODcwbWtAZw=America/Chicago=true=xml#eventpage_6 > > Thanks, > > Niketan Pansare > IBM Almaden Research Center > E-mail: npansar At us.ibm.com > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar > > Deron Eriksson ---05/02/2016 12:36:06 PM---Hi, I created a draft for the May > 2016 SystemML podling report that is due this > > From: Deron Eriksson <deroneriks...@gmail.com> > To: dev@systemml.incubator.apache.org > Date: 05/02/2016 12:36 PM > Subject: Draft - May 2016 SystemML Incubator Podling Report > > > > > Hi, > > > I created a draft for the May 2016 SystemML podling report that is due this > Wednesday. Please provide feedback if you'd like anything updated. For PMC > members, if the issue is private related to the project, please use the > private mailing list for discussion. > > Thanks! > > Deron > > > > SystemML > > > SystemML provides declarative large-scale machine learning (ML) that aims at > > flexible specification of ML algorithms and automatic generation of hybrid > > runtime plans ranging from single node, in-memory computations, to > > distributed computations running on Apache Hadoop MapReduce and Apache > > Spark. > > > SystemML has been incubating since 2015-11-02. > > > Three most important issues to address in the move towards graduation: > > > - Grow SystemML community: increase mailing list activity, > >increase adoption of SystemML for scalable machine learning, encourage > >data scientists to adopt DML and PyDML algorithm scripts, respond to > >user feedback to ensure SystemML meets the requirements of real-world > >situations, write papers, and present talks about SystemML. > > - Continue to produce releases. > > - Increase the diversity of our project's contributors and committers. > > > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be > > aware of? > > > NONE. > > > How has the community developed since the last report? > > > Our mailing list from February through April had 199 messages involving > > topics such as algorithms, DML functionality, usability, and bug fixes. In > > addition, we have had many discussions on our JIRA site and in pull > request > > conversations. Fred Reiss presented at Spark Summit East on February 17 > about > > SystemML internals. Berthold Reinwald spoke at the Spark Technology > Center on > > March 9 about scalable machine learning with SystemML. Niketan Pansare > spoke > > on April 28 at Datapalooza in Austin about declarative machine learning at > > scale with SystemML. Researchers in Germany are working to add Flink as an > > additional SystemML backend. On GitHub, the project has been starred 267 > > times and forked 92 times. > > > How has the project developed since the last report? > > > We produced our first Apache release, version 0.9.0-incubating. Numerous > > additions have been made to the project, including core functionality, > > usability improvements, and documentation. The project has had 204 commits > > since February 1. In the same time frame, 155 new issues have been > reported > > on our JIRA site and 77 issues have been resolved. 114 pull requests > opened > > since Febrary 1 have been closed. > > > Date of last release: > > > 2016-02-15 (version 0.9.0-incubating) > > > When were the last committers or PMC members elected? > > > NONE > > >
Deprecate `ppred(...)` Built-in Function
Hi all, The `ppred(...)` built-in function (`ppred(X, 0, ">")`) is no longer necessary as relational comparison operators are supported natively in the language (`X > 0`) and follow R's semantics. SYSTEMML-657 had been created to track the deprecation of this function, and is currently open if anyone would like to take it on. We'd like to add deprecated warnings to the parser and documentation, and replace all current uses of the function in our DML scripts. Thanks! - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone.
Re: remove castAsScalar?
Yeah those both sound great. Even if we have to possibly support old DML code outside the project, we can certainly aim to keep our DML code as modern and clean as possible. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Apr 22, 2016, at 11:29 AM, Deron Eriksson <deroneriks...@gmail.com> wrote: > > In that case, perhaps I could create JIRAs to: > 1) replace all castAsScalar's in the project with as.scalar's > 2) if castAsScalar is used in a DML file, issue a log warning such as > 'castAsScalar has been deprecated, please replace with as.scalar' > 3) update docs to say castAsScalar has been deprecated. > > That way, we maintain backwards compatibility with older DML outside the > project while replacing the castAsScalar's in the project. > > Deron > > > >> On Thu, Apr 21, 2016 at 5:42 PM, Matthias Boehm <mbo...@us.ibm.com> wrote: >> >> Let's be careful not to unnecessarily break backwards compatibility. How >> about we collect all instances of language builtin functions that we want >> to remove and clean them up with our 1.0 release later this year? There are >> other instances like ppred that do not exist in R and meanwhile redundant >> in DML (but still heavily used). >> >> Regards, >> Matthias >> >> [image: Inactive hide details for Deron Eriksson ---04/21/2016 05:33:56 >> PM---Hi, In the ongoing discussion concerning printing a matrix]Deron >> Eriksson ---04/21/2016 05:33:56 PM---Hi, In the ongoing discussion >> concerning printing a matrix (at >> >> From: Deron Eriksson <deroneriks...@gmail.com> >> To: dev@systemml.incubator.apache.org >> Date: 04/21/2016 05:33 PM >> Subject: remove castAsScalar? >> -- >> >> >> >> Hi, >> >> In the ongoing discussion concerning printing a matrix (at >> https://github.com/apache/incubator-systemml/pull/120), I noticed that >> castAsScalar was introduced to the language as a mistake. It has been >> replaced by as.scalar but castAsScalar has been kept around until now for >> historical reasons. Since it is redundant and we are an open source >> project, can we now go ahead and remove it, since having two ways to >> accomplish the same thing (as.scalar and castAsScalar) can be confusing to >> new users? >> >> Deron >> >> >>
Re: [VOTE] Release SystemML 0.9.0-incubating (RC2)
FYI, the issue was due to a leftover Git cache from before the addition of the `.gitattributes` file that fixed these line endings. The cache on the machine used for cutting the release still contained the files with the Windows-style line-endings. Since the `.gitattributes` file is present, these incorrect line-endings would not have made their way back into the official repo, but when building the release distributions locally, they were simply copied over. The solution was to instruct Git to remove its local cache as follows, which may be beneficial for everyone to perform: - `git rm --cached -r .` - `git reset --hard` Just note that this will remove any changes that have not yet been committed or stashed locally. - Mike -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 28, 2016, at 5:07 PM, Luciano Resende <luckbr1...@gmail.com> wrote: > > Vote canceled, as it seems we have been hit by end of line problems again. > New vote coming up shortly. > > On Thu, Jan 28, 2016 at 2:02 PM, Luciano Resende <luckbr1...@gmail.com> > wrote: > >> Please vote on releasing the following candidate as Apache SystemML >> version 0.9.0! >> >> The vote is open for at least 72 hours and passes if a majority of at >> least 3 +1 PMC votes are cast. >> >> [ ] +1 Release this package as Apache SystemML 0.9.0 >> [ ] -1 Do not release this package because ... >> >> To learn more about Apache SystemML, please see >> http://systemml.apache.org/ >> >> The tag to be voted on is v0.9.0-rc2 >> (6da9d60db4a5a7adfcc943d954f41153e496866f) >> >> >> https://github.com/apache/incubator-systemml/tree/6da9d60db4a5a7adfcc943d954f41153e496866f >> >> The release files, including signatures, digests, etc. can be found at: >> >> https://repository.apache.org/content/repositories/orgapachesystemml-1002/ >> >> The distribution is also available at: >> >> http://people.apache.org/~lresende/systemml/0.9.0-rc2/ >> >> = >> == Apache Incubator release policy == >> = >> Please find below the guide to release management during incubation: >> http://incubator.apache.org/guides/releasemanagement.html >> >> === >> == How can I help test this release? == >> === >> If you are a SystemML user, you can help us test this release by taking >> an existing Algorithm or workload and running on this release candidate, >> then reporting any regressions. >> >> >> == What justifies a -1 vote for this release? == >> >> -1 votes should only occur for significant stop-ship bugs or legal >> related issues (e.g. wrong license, missing header files, etc). Minor bugs >> or regressions should not block this release. >> >> >> -- >> Luciano Resende >> http://people.apache.org/~lresende >> http://twitter.com/lresende1975 >> http://lresende.blogspot.com/ > > > > -- > Luciano Resende > http://people.apache.org/~lresende > http://twitter.com/lresende1975 > http://lresende.blogspot.com/
Future Release Package Naming & Structure
Hi all, A discussion regarding the release package structure started on pull request 54 [https://github.com/apache/incubator-systemml/pull/54]. Currently, we have a "distributed" release for running SystemML on a cluster* using Spark or Hadoop, as well as a "standalone" release for running SystemML on a single node with Java (no Spark or Hadoop installation necessary). Given this, two questions were raised during the discussion: 1. Should we name our releases as "*-cluster" and "*-standalone", or just distinguish the standalone version as "*" and "*-standalone"? 2. Should we maintain the two separate releases ("distributed" and "standalone"), or should we move to have one single release with one JAR that works in all environments and execution modes? The consensus was that there are pros and cons for each option, and that this discussion would be more appropriate for the mailing list. Thoughts? Thanks, - Mike * Yes, SystemML can still be run in single node execution mode even on Spark or Hadoop. -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone.
Re: [VOTE] Release SystemML 0.9-incubating (RC1)
+1 -- Mike Dusenberry GitHub: github.com/dusenberrymw LinkedIn: linkedin.com/in/mikedusenberry Sent from my iPhone. > On Jan 20, 2016, at 3:39 AM, Frederick R Reiss <frre...@us.ibm.com> wrote: > > > +1 > > Sent from my iPhone > >>> On Jan 20, 2016, at 11:06 AM, Shirish Tatikonda >> <shirish.tatiko...@gmail.com> wrote: >> >> +1 >> >> >> >> On Tue, Jan 19, 2016 at 9:46 PM, Luciano Resende <luckbr1...@gmail.com> >> wrote: >> >>> Please vote on releasing the following candidate as Apache SystemML > version >>> 0.9.0! >>> >>> The vote is open for at least 72 hours and will close on Saturday, > January >>> 23 and passes if a majority of at least 3 +1 PMC votes are cast. >>> >>> [ ] +1 Release this package as Apache SystemML 0.9.0 >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache SystemML, please see >>> http://systemml.apache.org/ >>> >>> The tag to be voted on is v0.9.0-rc1 >>> (3e7e5cf6ca697ec247a7dc4e005a7f7b1cb18856) > https://github.com/apache/incubator-systemml/tree/3e7e5cf6ca697ec247a7dc4e005a7f7b1cb18856 > >>> >>> The release files, including signatures, digests, etc. can be found at: > https://repository.apache.org/content/repositories/orgapachesystemml-1001/ >>> >>> >>> = >>> == Apache Incubator release policy == >>> = >>> Please find below the guide to release management during incubation: >>> http://incubator.apache.org/guides/releasemanagement.html >>> >>> === >>> == How can I help test this release? == >>> === >>> If you are a SystemML user, you can help us test this release by taking > an >>> existing Algorithm or workload and running on this release candidate, > then >>> reporting any regressions. >>> >>> >>> == What justifies a -1 vote for this release? == >>> >>> -1 votes should only occur for significant stop-ship bugs or legal > related >>> issues (e.g. wrong license, missing header files, etc). Minor bugs or >>> regressions should not block this release. >>> >>> -- >>> Luciano Resende >>> http://people.apache.org/~lresende >>> http://twitter.com/lresende1975 >>> http://lresende.blogspot.com/