Out of memory error and problem generating heap dump

2015-11-13 Thread Deron Eriksson
Hello,

I'm running into an out-of-memory issue when I attempt to use the
Kmeans.dml algorithm on a 1M-row matrix of generated test data. I am trying
to generate a heap dump in order to help diagnose the problem but so far I
haven't been able to correctly generate a heap dump file. I was wondering
if anyone has any advice regarding the out-of-memory issue and creating a
heap dump to help diagnose the problem.

I set up a 4-node Hadoop cluster (on Red Hat Enterprise Linux Server
release 6.6 (Santiago)) with HDFS and YARN to try out SystemML in Hadoop
batch mode. The master node has NameNode, SecondaryNameNode, and
ResourceManager daemons running on it. The 3 other nodes have DataNode and
NodeManager daemons running on them.

I'm trying out the Kmeans.dml algorithm. To begin, I generated test data
using the genRandData4Kmeans.dml script with 100K rows via:

hadoop jar system-ml-0.8.0/SystemML.jar -f genRandData4Kmeans.dml -nvargs
nr=10 nf=100 nc=10 dc=10.0 dr=1.0 fbf=100.0 cbf=100.0 X=Xsmall.mtx
C=Csmall.mtx Y=Ysmall.mtx YbyC=YbyCsmall.mtx

Next, I ran Kmeans.dml against the Xsmall.mtx 100K-row matrix via:

hadoop jar system-ml-0.8.0/SystemML.jar -f
system-ml-0.8.0/algorithms/Kmeans.dml -nvargs X=Xsmall.mtx k=5

This ran perfectly.

However, next I increased the amount of test data to 1M rows, which
resulted in matrix data of about 3GB in size:

hadoop jar system-ml-0.8.0/SystemML.jar -f genRandData4Kmeans.dml -nvargs
nr=100 nf=100 nc=10 dc=10.0 dr=1.0 fbf=100.0 cbf=100.0 X=X.mtx C=C.mtx
Y=Y.mtx YbyC=YbyC.mtx

I ran Kmeans.dml against the 1M-row X.mtx matrix via:

hadoop jar system-ml-0.8.0/SystemML.jar -f
system-ml-0.8.0/algorithms/Kmeans.dml -nvargs X=X.mtx k=5

In my console, I received a number of error messages such as:

Error: Java heap space
15/11/13 14:48:58 INFO mapreduce.Job: Task Id :
attempt_1447452404596_0006_m_23_1, Status : FAILED
Error: GC overhead limit exceeded

Next, I attempted to generate a heap dump. Additionally, I added some
settings so that I could look at memory usage remotely using JConsole.

I added the following lines to my hadoop-env.sh files on each node:

export HADOOP_NAMENODE_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-dfs/
-Dcom.sun.management.jmxremote.port=
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false ${HADOOP_NAMENODE_OPTS}"

export HADOOP_DATANODE_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-dfs/
-Dcom.sun.management.jmxremote.port=
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false ${HADOOP_DATANODE_OPTS}"

I added the following to my yarn-env.sh files on each node:

export YARN_RESOURCEMANAGER_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-yarn/
-Dcom.sun.management.jmxremote.port=9998
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false
${YARN_RESOURCEMANAGER_OPTS}"

export YARN_NODEMANAGER_OPTS="-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-yarn/
-Dcom.sun.management.jmxremote.port=9998
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false ${YARN_NODEMANAGER_OPTS}"

Additionally, I modified the bin/hadoop file:

HADOOP_OPTS="$HADOOP_OPTS -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps/
-Dcom.sun.management.jmxremote.port=9997
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.local.only=false"

I was able to look at my Java processes remotely in real-time using
JConsole. I did not see where the out-of-memory error was happening.

Next, I examined the error logs on the 4-nodes. I searched for FATAL
entries with the following:

$ pwd
/home/hadoop2/hadoop-2.6.2/logs
$ grep -R FATAL *

On the slave nodes, I had log messages such as the following, which seem to
indicate the error occurred for the YARN process (NodeManager).

userlogs/application_1447377156841_0006/container_1447377156841_0006_01_07/syslog:2015-11-12
17:53:22,581 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running
child : java.lang.OutOfMemoryError: GC overhead limit exceeded

Does anyone have any advice regarding what is causing this error or how I
can go about generating a heap dump so I can help diagnose the issue?

Thank you,

Deron


Re: Out of memory error and problem generating heap dump

2015-11-16 Thread Deron Eriksson
Hello Matthias,

Thank you for the help! I'm still running into issues so I was wondering if
you have any further guidance. I think the main question I have is if I am
setting memory and garbage collection options in the right place, since
it's a multi-node and multi-JVM environment.

With regards to your point (1):
I updated my mapred-site.xml mapreduce.map.java.opts property to
"XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-map/" and my
mapreduce.reduce.java.opts property to "XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-reduce/".

If I reran (Kmeans.dml with 1M-row matrix), I had the same errors at this
point, but the log messages provided further useful information now. The
physical memory appeared to be fine but the virtual memory had an issue:
15/11/16 10:36:05 INFO mapreduce.Job: Task Id :
attempt_1447698794207_0001_m_15_2, Status : FAILED
Container [pid=63900,containerID=container_1447698794207_0001_01_72] is
running beyond virtual memory limits. Current usage: 165.6 MB of 1 GB
physical memory used; 3.7 GB of 2.1 GB virtual memory used. Killing
container.


Next I looked at points (2) and (3):
I updated mapreduce.map.java.opts to "-server -Xmx2g -Xms2g -Xmn200m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-map/" and
mapreduce.reduce.java.opts to "-server -Xmx2g -Xms2g -Xmn200m
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/home/hadoop2/heapdumps-reduce/".

This resulted in the same errors as before.

Am I setting the memory and garbage collection options in the right place
to the right JVMs?

Each node has 12GB RAM and about 60GB (of 144GB) free HD space.

Thanks!
Deron




On Fri, Nov 13, 2015 at 4:17 PM,  wrote:

> Hi Deron,
>
> couple of things to try out:
>
> 1) Task Configuration: please double check you configuration; if the errors
> are coming from the individual map/reduce tasks, please change
> 'mapreduce.map.java.opts' and 'mapreduce.reduce.java.opts' in your
> mapred-site.xml. The name node / data node configurations don't have any
> effect on the actual tasks.
> 2) Recommended mem config: Normally, we recommend a configuration of -Xmx2g
> -Xms2g -Xmn200m for map/reduce tasks (if this still allows a task/core) w/
> a io.sort.mb of 384 MB for an hdfs blocksize of 128MB. Note the -mn
> parameter, which fixes the size of the young generation; this size also
> affects additional memory overheads - if set to 10% of your max heap, we
> guarantee that your tasks will not run out of memory.
> 3) GC Overhead: You're not getting the OOM because you actually ran out of
> memory but because you spent to much time on garbage collection (because
> you are close to the mem limits). If you're running OpenJDK, it's usually a
> good idea to specify the '-server' flag. If this does not help, you might
> want to increase the number of threads for garbage collection.
> 4) Explain w/ memory estimates: Finally, there is always a possibility of
> bugs too. If the configuration changes above do not solve the problem,
> please run it with "-explain recompile_hops" and subsequently "-explain
> recompile_runtime" which will give you the memory estimates - things to
> look for are broadcast-based operators where the size of vectors exceed the
> budgets of your tasks and instructions that generate large outputs.
>
>
> Regards,
> Matthias
>
>
>
>
>
> From:   Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date:   11/13/2015 03:25 PM
> Subject:Out of memory error and problem generating heap dump
>
>
>
> Hello,
>
> I'm running into an out-of-memory issue when I attempt to use the
> Kmeans.dml algorithm on a 1M-row matrix of generated test data. I am trying
> to generate a heap dump in order to help diagnose the problem but so far I
> haven't been able to correctly generate a heap dump file. I was wondering
> if anyone has any advice regarding the out-of-memory issue and creating a
> heap dump to help diagnose the problem.
>
> I set up a 4-node Hadoop cluster (on Red Hat Enterprise Linux Server
> release 6.6 (Santiago)) with HDFS and YARN to try out SystemML in Hadoop
> batch mode. The master node has NameNode, SecondaryNameNode, and
> ResourceManager daemons running on it. The 3 other nodes have DataNode and
> NodeManager daemons running on them.
>
> I'm trying out the Kmeans.dml algorithm. To begin, I generated test data
> using the genRandData4Kmeans.dml script with 100K rows via:
>
> hadoop jar system-ml-0.8.0/SystemML.jar -f genRandData4Kmeans.dml -nvargs
> nr=10 nf=100 nc=10 dc=10.0 dr=1.0 fbf=100.0 cbf=100.0 X=Xsmall.mtx
> C=Csmall.mtx Y=Ysmall.mtx YbyC=YbyCsmall.mtx
>
> Nex

Re: Out of memory error and problem generating heap dump

2015-11-17 Thread Deron Eriksson
Hello,

Thank you for the help, Matthias. Explicitly bumping up
"mapreduce.map.memory.mb" and "mapreduce.reduce.memory.mb" in
mapred-site.xml took care of the memory issues that I had been hitting with
Kmeans in Hadoop batch mode.

Deron


On Mon, Nov 16, 2015 at 4:24 PM,  wrote:

> well, I think you're on the right track but your cluster configuration
> still has a couple of issues.
>
> The error tells us that you're not actually running out of memory but your
> tasks are killed by the node managers because you are exceeding the
> allocated virtual container memory. So here are a couple of things to
> check:
>
> 1) Consistent container configuration: You already modified jvm options for
> map/reduce tasks (e.g., mapreduce.map.java.opts). View them as
> configurations of your actual processes. In addition, you have to ensure
> that you request consistent container resources for these tasks. Please,
> double check in mapred-site.xml the 'mapreduce.map.memory.mb' and
> 'mapreduce.reduce.memory.mb' (the mapred AM request container resources
> according to these configurations, which also need to cover JVM overheads)
> - I usually configure them conservatively to 1.5x the max heap
> configuration of my tasks.
>
> 2) Virtual memory configuration: Also, please ensure that you allow a
> sufficiently large ratio between allocated virtual and physical memory.
> Overcommitting virtual memory is fine. Please check  in yarn-site.xml the
> following property: 'yarn.nodemanager.vmem-pmem-ratio' - I usually
> configure this to something between 2 and 5. If this does not solve your
> problem, you can also disable that your task processes are killed in these
> situations by setting 'yarn.nodemanager.vmem-check-enabled' to false.
>
> Regards,
> Matthias
>
>
>
>
>
> From:   Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date:   11/16/2015 02:58 PM
> Subject:Re: Out of memory error and problem generating heap dump
>
>
>
> Hello Matthias,
>
> Thank you for the help! I'm still running into issues so I was wondering if
> you have any further guidance. I think the main question I have is if I am
> setting memory and garbage collection options in the right place, since
> it's a multi-node and multi-JVM environment.
>
> With regards to your point (1):
> I updated my mapred-site.xml mapreduce.map.java.opts property to
> "XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/hadoop2/heapdumps-map/" and my
> mapreduce.reduce.java.opts property to "XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/hadoop2/heapdumps-reduce/".
>
> If I reran (Kmeans.dml with 1M-row matrix), I had the same errors at this
> point, but the log messages provided further useful information now. The
> physical memory appeared to be fine but the virtual memory had an issue:
> 15/11/16 10:36:05 INFO mapreduce.Job: Task Id :
> attempt_1447698794207_0001_m_15_2, Status : FAILED
> Container [pid=63900,containerID=container_1447698794207_0001_01_72] is
> running beyond virtual memory limits. Current usage: 165.6 MB of 1 GB
> physical memory used; 3.7 GB of 2.1 GB virtual memory used. Killing
> container.
>
>
> Next I looked at points (2) and (3):
> I updated mapreduce.map.java.opts to "-server -Xmx2g -Xms2g -Xmn200m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/hadoop2/heapdumps-map/" and
> mapreduce.reduce.java.opts to "-server -Xmx2g -Xms2g -Xmn200m
> -XX:+HeapDumpOnOutOfMemoryError
> -XX:HeapDumpPath=/home/hadoop2/heapdumps-reduce/".
>
> This resulted in the same errors as before.
>
> Am I setting the memory and garbage collection options in the right place
> to the right JVMs?
>
> Each node has 12GB RAM and about 60GB (of 144GB) free HD space.
>
> Thanks!
> Deron
>
>
>
>
> On Fri, Nov 13, 2015 at 4:17 PM,  wrote:
>
> > Hi Deron,
> >
> > couple of things to try out:
> >
> > 1) Task Configuration: please double check you configuration; if the
> errors
> > are coming from the individual map/reduce tasks, please change
> > 'mapreduce.map.java.opts' and 'mapreduce.reduce.java.opts' in your
> > mapred-site.xml. The name node / data node configurations don't have any
> > effect on the actual tasks.
> > 2) Recommended mem config: Normally, we recommend a configuration of
> -Xmx2g
> > -Xms2g -Xmn200m for map/reduce tasks (if this still allows a task/core)
> w/
> > a io.sort.mb of 384 MB for an hdfs blocksize of 128MB. Note the -mn
> > parameter, which fixes the size of the young generation; this size also
> > affects addi

SystemML-config.xml in distributed Hadoop environment

2015-11-17 Thread Deron Eriksson
Hello,

The SystemML binary release comes with a SystemML configuration file
(SystemML-config.xml) in its root directory. Are all the property
name/values in this file the recommended SystemML configuration settings
when running on a Hadoop cluster? Are any of these properties of particular
relevance when increasing performance for the cluster?

For example, I have a 4-node cluster with 3 data nodes. Should I change
 to be 2x the number of data nodes, so change from 10 to 6?

Also, with regards to , what is being optimized and how does this
affect performance?

Thanks!
Deron


Re: SystemML-config.xml in distributed Hadoop environment

2015-11-18 Thread Deron Eriksson
Thank you, Niketan. That information is very useful.

Deron


On Wed, Nov 18, 2015 at 8:25 AM, Niketan Pansare  wrote:

> Hi Deron,
>
> Please see the below answers:
>
> Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster?
> Yes, but some are dependent on the size of cluster (for example: number of
> reducers). So the user might need to modify them accordingly.
>
> Are any of these properties of particular
> relevance when increasing performance for the cluster?
> Yes. Going back to "the number of reducers" example, if one has 100 node
> cluster and using default "10" reducers would cause underutilization of the
> cluster.
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
>  to be 2x the number of data nodes, so change from 10 to 6?
> 2x nodes is a good rule of thumb for the number of reducers for "MR"
> backend. I verified this in the performance experiments.
>
> Also, with regards to , what is being optimized and how does this
> affect performance?
>  is a tuning flag for SystemML's runtime optimizer. I would
> recommend to use the default optlevel. Here is the documentation:
> * Optimization Types for Compilation
> *
> * O0 STATIC - Decisions for scheduling operations on CP/MR are based on
> * predefined set of rules, which check if the dimensions are below a
> * fixed/static threshold (OLD Method of choosing between CP and MR).
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O1 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * Advanced rewrites like constant folding, common subexpression
> elimination,
> * or *inter* procedural analysis are NOT applied.
> *
> * O2 MEMORY_BASED - Every operation is scheduled on CP or MR, solely
> * based on the amount of memory required to perform that operation.
> * It does NOT take the execution time into account.
> * The optimization scope is LOCAL, i.e., per statement block.
> * All advanced rewrites are applied. This is the default optimization
> * level of SystemML.
> *
> * O3 GLOBAL TIME_MEMORY_BASED - Operation scheduling on CP or MR as well
> as
> * many other rewrites of data flow properties such as block size,
> partitioning,
> * replication, vectorization, etc are done with the optimization objective
> of
> * minimizing execution time under hard memory constraints per operation and
> * execution context. The optimization scope if GLOBAL, i.e., program-wide.
> * All advanced rewrites are applied. This optimization level requires more
> * optimization time but has higher optimization potential.
> *
> * O4 DEBUG MODE - All optimizations, global and local, which interfere
> with
> * breakpoints are NOT applied. This optimization level is REQUIRED for the
> * compiler running in debug mode.
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> Phone (office): (408) 927 1740
> E-mail: npan...@us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Deron Eriksson ---11/17/2015 07:31:06
> PM---Hello, The SystemML binary release comes with a SystemML c]Deron
> Eriksson ---11/17/2015 07:31:06 PM---Hello, The SystemML binary release
> comes with a SystemML configuration file
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 11/17/2015 07:31 PM
> Subject: SystemML-config.xml in distributed Hadoop environment
> --
>
>
>
> Hello,
>
> The SystemML binary release comes with a SystemML configuration file
> (SystemML-config.xml) in its root directory. Are all the property
> name/values in this file the recommended SystemML configuration settings
> when running on a Hadoop cluster? Are any of these properties of particular
> relevance when increasing performance for the cluster?
>
> For example, I have a 4-node cluster with 3 data nodes. Should I change
>  to be 2x the number of data nodes, so change from 10 to 6?
>
> Also, with regards to , what is being optimized and how does this
> affect performance?
>
> Thanks!
> Deron
>
>
>


Re: Next SystemML Release

2015-11-20 Thread Deron Eriksson
Hello,

Now that SystemML is an Apache Incubator project, the packages need to be
updated, right? (com.ibm.bi.dml.* to org.apache.systemml.*) Since the
project just moved to its new repo, it would probably be a good time to
make this update since this would minimize impact on developers. This
affects all Java files in the project.

Any thoughts?

Deron


On Thu, Nov 19, 2015 at 11:58 PM, Shirish Tatikonda <
shirish.tatiko...@gmail.com> wrote:

> Luciano,
>
> First week of December sounds good. That should give sufficient time to fix
> a few known issues.
>
> Shirish
>
> On Thu, Nov 19, 2015 at 4:54 PM, Luciano Resende 
> wrote:
>
> > I would like to get our first official Apache release soon, which will
> > incorporate minor bug fixes and anything else post our 0.8.0 release.
> Any
> > other big enhancement we should target for this release ?
> >
> > How about targeting to cut a release in the first week of December ?
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>


Re: Next SystemML Release

2015-11-20 Thread Deron Eriksson
With regards to package refactoring, I think short package names are great,
unless there is an Apache expectation for the full project name to be in
the package. Luciano, are you aware any standards in regards to this?  I
would highly favor org.apache.sysml over org.apache.dml, since (1) sysml
reflects the project name, and (2) DML has very strongly entrenched meaning
in the database world.

Deron


On Fri, Nov 20, 2015 at 11:53 AM, Matthias Boehm  wrote:

> first week of December for our 0.8.1 release sounds good to me.
>
> Apart from bug fixes and performance features, the focus should be mostly
> on increasing robustness of our spark backend. Since 0.8.0, we already
> added important features like "partitioned broadcasts" (in order to
> overcome the 2GB broadcast limitation) and "improved guarded collect" (to
> overcome OOM situations on RDD collect). There are still some open issues
> w/ regard to (1) potential OOMs, (2) repeated lazy evaluation of common
> subexpressions, and (3) data converter utils for external formats.
> Resolving these known issues would be a nice cut for this minor release.
>
> Regarding the package refactoring, I agree that we should do this right
> away; I would, however, prefer a short package name such as
> 'org.apache.dml' or 'org.apache.sysml'.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---11/20/2015 11:09:51
> AM---Hello, Now that SystemML is an Apache Incubator project, t]Deron
> Eriksson ---11/20/2015 11:09:51 AM---Hello, Now that SystemML is an Apache
> Incubator project, the packages need to be
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 11/20/2015 11:09 AM
> Subject: Re: Next SystemML Release
> --
>
>
>
> Hello,
>
> Now that SystemML is an Apache Incubator project, the packages need to be
> updated, right? (com.ibm.bi.dml.* to org.apache.systemml.*) Since the
> project just moved to its new repo, it would probably be a good time to
> make this update since this would minimize impact on developers. This
> affects all Java files in the project.
>
> Any thoughts?
>
> Deron
>
>
> On Thu, Nov 19, 2015 at 11:58 PM, Shirish Tatikonda <
> shirish.tatiko...@gmail.com> wrote:
>
> > Luciano,
> >
> > First week of December sounds good. That should give sufficient time to
> fix
> > a few known issues.
> >
> > Shirish
> >
> > On Thu, Nov 19, 2015 at 4:54 PM, Luciano Resende 
> > wrote:
> >
> > > I would like to get our first official Apache release soon, which will
> > > incorporate minor bug fixes and anything else post our 0.8.0 release.
> > Any
> > > other big enhancement we should target for this release ?
> > >
> > > How about targeting to cut a release in the first week of December ?
> > >
> > > --
> > > Luciano Resende
> > > http://people.apache.org/~lresende
> > > http://twitter.com/lresende1975
> > > http://lresende.blogspot.com/
> > >
> >
>
>
>


Process for updating SystemML website

2015-11-23 Thread Deron Eriksson
Hi,

I believe this is the current process for SystemML website updates. Is this
correct, Luciano? Is there any other information that committers should be
aware of?

(1) Clone website Git repo (for raw project - md files, etc)
git clone
https://git-wip-us.apache.org/repos/asf/incubator-systemml-website.git

(2) Checkout website SVN project (for generated site - html files, etc)
svn co https://svn.apache.org/repos/asf/incubator/systemml/site
incubator-systemml-website-site

(3) Start Jekyll in raw project directory and specify SVN project as site
target directory
jekyll serve -d ../incubator-systemml-website-site/

(4) Make site updates

(5) Review and commit raw project to Git

(6) Review and commit generated site to SVN, which publishes site


Deron


Permission denied when commit to website SVN repo

2015-11-23 Thread Deron Eriksson
Hi,

I was able to commit and push website project updates to the raw Git
project (
https://git-wip-us.apache.org/repos/asf/incubator-systemml-website.git),
but I am unable to commit my generated site updates to the SVN project (
https://svn.apache.org/repos/asf/incubator/systemml/site).

Luciano, could you see if this is a permissions issue, and if you have the
ability, add other committers and me to the list of people who can commit
to the website SVN project?

Thanks!
Deron


Re: Permission denied when commit to website SVN repo

2015-11-23 Thread Deron Eriksson
Fantastic! Thank you.

On Mon, Nov 23, 2015 at 2:51 PM, Luciano Resende 
wrote:

> On Mon, Nov 23, 2015 at 1:26 PM, Deron Eriksson 
> wrote:
>
> > Hi,
> >
> > I was able to commit and push website project updates to the raw Git
> > project (
> > https://git-wip-us.apache.org/repos/asf/incubator-systemml-website.git),
> > but I am unable to commit my generated site updates to the SVN project (
> > https://svn.apache.org/repos/asf/incubator/systemml/site).
> >
> > Luciano, could you see if this is a permissions issue, and if you have
> the
> > ability, add other committers and me to the list of people who can commit
> > to the website SVN project?
> >
> > Thanks!
> > Deron
> >
>
> I have pushed a PR to infrastructure team, and as soon as it's merged the
> access issues will be solved.
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: December 2015 Report

2015-12-02 Thread Deron Eriksson
Hi Luciano,

Here are a few ideas... Perhaps others can add to or modify this? Luciano,
could you generate numbers for the user count on the dev list and the
message count on the dev list since we became an Apache Incubator project?

* A list of the three most important issues to address in the move towards
graduation.
1. Grow SystemML community: active mailing list, promote developer
involvement in codebase, increase adoption of SystemML for scalable machine
learning, encourage data scientists to use DML and PyDML algorithm scripts,
respond to user feedback to ensure SystemML meets the requirements of
real-world situations, write papers, and present SystemML at conferences.
2. Core library improvements, including Apache Spark integration.
3. Improved SystemML documentation to lower the learning curve.


* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of?
We expect that the SystemML Apache JIRA site will be available shortly, as
detailed in INFRA-10714.


* How has the community developed since the last report?
Subscriptions: dev@ - ?(Luciano, can you report the number of users on the
dev@systemml.incubator.apache.org mailing list?)
Message count: dev@ - ?(Luciano, can you report the number of messages on
the dev@systemml.incubator.apache.org mailing list?)

Could someone please mention any conference presentations or papers?


* How has the project developed since the last report?
Since becoming an Apache Incubator project on 2015-11-02, there have been
51 commits to the project (determined via git log --pretty=oneline
--since=2015-11-02 | wc -l)


Deron



On Tue, Dec 1, 2015 at 5:12 PM, Luciano Resende 
wrote:

> On Mon, Nov 30, 2015 at 8:53 AM, Luciano Resende 
> wrote:
>
> > Any volunteers for this month report ?
> >
> > Please take at the links below for report examples:
> > http://wiki.apache.org/incubator/November2015
> > http://wiki.apache.org/incubator/December2015
> >
> >
> > Once we review it here, I can update the wiki page.
> >
>
>
> Anyone volunteering to handle this ?
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: SystemML Committer Git Guide

2015-12-02 Thread Deron Eriksson
Thank you for creating this, Mike. It's great to get us all using the same
standard approach with regards to Git and SystemML. Great job!

Deron


On Wed, Dec 2, 2015 at 12:38 PM, Mike Dusenberry 
wrote:

> Hi all,
>
>
> Here is a quick committer guide to using Git with SystemML, located in the
> GitHub Gist at the following link, and reproduced below.
>
>
>   • https://gist.github.com/dusenberrymw/78eb31b101c1b1b236e5
> ---
>
>
> # SystemML Git Guide
>
>
> ## Setup Git repo locally
> * Fork Apache SystemML to your personal GitHub account by browsing to [
> https://github.com/apache/incubator-systemml] and clicking "Fork".
> * Clone your personal GitHub fork of Apache SystemML:
>   * `git clone g...@github.com:USERNAME/incubator-systemml.git` //
> assuming the use of SSH keys with GitHub
> * Add GitHub (read-only mirror) and Apache-owned (committer writeable) Git
> repositories as remotes:
>   * `cd incubator-systemml`
>   * `git remote add apache-github
> https://github.com/apache/incubator-systemml.git`
>   * `git remote add apache
> https://git-wip-us.apache.org/repos/asf/incubator-systemml.git`
> * Add a Git alias for checking out GitHub pull requests locally:
>   * Install alias globally by placing the following in `~/.gitconfig`
> ```
> [alias]
> pr = "!f() { git fetch ${2:-apache-github} pull/$1/head:pr-$1 && git
> checkout pr-$1; }; f"
> ```
>   * Look at pull request on GitHub to determine the pull request number,
> indicated as "#4", for example.
>   * Checkout out locally:
> * `git pr 4`
>
>
> ## PR flow
> * Create local branch for feature(s):
>   * `git checkout -b SYSML--My_Awesome_Feature`
> * Make commits on `SYSML--My_Awesome_Feature` branch.
> * Push the `SYSML--My_Awesome_Feature` branch to your personal GitHub
> fork of SystemML:
>   * `git checkout SYSML--My_Awesome_Feature`
>   * First push of this branch:
> * `git push --set-upstream origin SYSML--My_Awesome_Feature`
>   * Future pushes of this branch:
> * `git push`
> * Open a new pull request by browsing to the
> `SYSML--My_Awesome_Feature` branch on your personal GitHub fork of
> SystemML and clicking "New pull request".
>
>
> ## Merging (manually) without merge commits
> * Update your local `SYSML--My_Awesome_Feature` branch with the latest
> commits in the Apache repo by *rebasing*:
>   * `git checkout SYSML--My_Awesome_Feature`
>   * `git pull --rebase apache master`
> * Update your local `master` branch with the latest commits in the Apache
> repo:
>   * `git checkout mater`
>   * `git pull apache master`
> * Move the commits from your local `SYSML--My_Awesome_Feature` branch
> to the local `master` branch.  Note: This will **not create merge commits**
> since both branches are fully updated from the Apache repo.
>   * `git checkout master`
>   * `git merge SYSML--My_Awesome_Feature`
> * Note: This should result in a "fast-forward" merge.
> * Push to the Apache repo:
>   * `git push apache master`
>
>
> ## Merging (script)
> * WIP
>
>
> ## Tricks
> * If you add the phrase "Closes #4." to the end of a commit message and
> then push to Apache, GitHub will automatically close pull request 4, and
> the commit will contain a link to that pull request.
>
>
> ---
>
>
> Cheers!
>
>
> - Mike
>
>
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry


Incubator logo on project website

2015-12-07 Thread Deron Eriksson
On the incubator-general mailing list, there was a discussion regarding
podling branding, and I saw it mentioned that we should include the Apache
Incubator logo on the project website (this is a SHOULD and not a MUST).

As we update the website, we should keep in mind the branding and website
guidelines, as described here:

http://incubator.apache.org/guides/branding.html
http://incubator.apache.org/guides/sites.html

Deron


Re: API documentation for SystemML

2015-12-07 Thread Deron Eriksson
Hi Sourav,

One way to generate Javadocs for the entire SystemML project is "mvn
javadoc:javadoc".

Unfortunately, classes such as MatrixCharacteristics and RDDConverterUtils
currently have very minimal API documentation. We are hoping to address
this in the near future. However, you may find that the following
documentation link could be of assistance in getting started, given your
interest in Scala:

http://apache.github.io/incubator-systemml/mlcontext-programming-guide.html

Deron


On Mon, Dec 7, 2015 at 1:58 PM, Sourav Mazumder  wrote:

> Hi,
>
> Is there any Scala/Java API documentation available for classes like
>
> MatrixCharacteristics, RDDConverterUtils ?
>
> What I need to understand is what all such helper utilities available
> and the deatils of their signature/APIs.
>
> Regards,
>
> Sourav
>


Re: API documentation for SystemML

2015-12-07 Thread Deron Eriksson
0).count == 0, "Expected 1
> -based ratings file")
> * *val* *nnz* = matRDD.count
> * *val* numRows = matRDD.map(_.i).max
> * *val* numCols = matRDD.map(_.j).max
> * *val* coordinateMatrix = new CoordinateMatrix(matRDD, numRows, numCols)
> * *val* *mc* = new MatrixCharacteristics(numRows, numCols, 1000, 1000,
> *nnz*)
> * *val* binBlocks =
> RDDConverterUtilsExt.coordinateMatrixToBinaryBlock(new JavaSparkContext(
> *sc*), coordinateMatrix, *mc*, true)
>
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> [image: Inactive hide details for Deron Eriksson ---12/07/2015 02:50:30
> PM---Hi Sourav, One way to generate Javadocs for the entire Sys]Deron
> Eriksson ---12/07/2015 02:50:30 PM---Hi Sourav, One way to generate
> Javadocs for the entire SystemML project is "mvn
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 12/07/2015 02:50 PM
> Subject: Re: API documentation for SystemML
> --
>
>
>
> Hi Sourav,
>
> One way to generate Javadocs for the entire SystemML project is "mvn
> javadoc:javadoc".
>
> Unfortunately, classes such as MatrixCharacteristics and RDDConverterUtils
> currently have very minimal API documentation. We are hoping to address
> this in the near future. However, you may find that the following
> documentation link could be of assistance in getting started, given your
> interest in Scala:
>
> http://apache.github.io/incubator-systemml/mlcontext-programming-guide.html
>
> Deron
>
>
> On Mon, Dec 7, 2015 at 1:58 PM, Sourav Mazumder <
> sourav.mazumde...@gmail.com
> > wrote:
>
> > Hi,
> >
> > Is there any Scala/Java API documentation available for classes like
> >
> > MatrixCharacteristics, RDDConverterUtils ?
> >
> > What I need to understand is what all such helper utilities available
> > and the deatils of their signature/APIs.
> >
> > Regards,
> >
> > Sourav
> >
>
>
>


Link from old GitHub project to incubator GitHub project

2015-12-08 Thread Deron Eriksson
I noticed that at the top of our old GitHub project page (
https://github.com/SparkTC/systemml), there is a link that points to the
Apache project website. Down below, in the README, there is a link that
points to the new GitHub project page (
https://github.com/apache/incubator-systemml).

If a user does not scroll down to view the README, I think a user might not
realize that this is the old repository so the user might fork this repo
rather than forking the new incubator repo.

So perhaps these two links on the old GitHub project page should be
switched to avoid people forking the old project? This occurred to me
because I saw a question a couple days ago from a user who had forked the
old repo.

Any thoughts?

Deron


Apache JIRA project key for SystemML

2015-12-08 Thread Deron Eriksson
Hi,

There is a discussion at https://issues.apache.org/jira/browse/INFRA-10714
regarding the Apache JIRA project key for SystemML. The original key listed
in the request is SYSML, which gives a URL of
https://issues.apache.org/jira/browse/SYSML. However, Mike and I both feel
that a project key of SYSTEMML giving a URL of
https://issues.apache.org/jira/browse/SYSTEMML would be better since that
would give a URL that accurately reflect the project name.

Does anyone else have a preference with regards to this?

Deron


DML transform() function

2015-12-09 Thread Deron Eriksson
Hi,

I'm working on updating the online docs for the DML transform() function
since a couple things didn't copy over in the conversion to markdown.
However, I've run into an issue when I execute the transform() example. In
summary, is the "scale" transformation no longer allowed, and "bin" is
allowed?

I did the following:

I created data.csv:

zipcode,district,sqft,numbedrooms,numbathrooms,floors,view,saleprice,askingprice
95141,south,3002,6,3,2,FALSE,929,934
NA,west,1373,,1,3,FALSE,695,698
91312,south,NA,6,2,2,FALSE,902,
94555,NA,1835,3,,3,,888,892
95141,west,2770,5,2.5,,TRUE,812,816
95141,east,2833,6,2.5,2,TRUE,927,
96334,NA,1339,6,3,1,FALSE,672,675
96334,south,2742,6,2.5,2,FALSE,872,876
96334,north,2195,5,2.5,2,FALSE,799,803

I created data.csv.mtd:

{
"data_type": "frame",
"format": "csv",
"sep": ",",
"header": true,
"na.strings": [ "NA", "" ]
}

I created data.spec.json:

{
"omit": [ "zipcode" ]
   ,"impute":
[ { "name": "district", "method": "constant", "value": "south" }
 ,{ "name": "numbedrooms" , "method": "constant", "value": 2 }
 ,{ "name": "numbathrooms", "method": "constant", "value": 1 }
 ,{ "name": "floors"  , "method": "constant", "value": 1 }
 ,{ "name": "view", "method": "global_mode" }
 ,{ "name": "askingprice" , "method": "global_mean" }
]

,"recode":
[ "zipcode", "district", "numbedrooms", "numbathrooms", "floors",
"view" ]

,"bin":
[ { "name": "saleprice"  , "method": "equi-width", "numbins": 3 }
 ,{ "name": "sqft"   , "method": "equi-width", "numbins": 4 }
]

,"dummycode":
[ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ]

,"scale":
[ { "name": "sqft", "method": "mean-subtraction" }
 ,{ "name": "saleprice", "method": "z-score" }
 ,{ "name": "askingprice", "method": "z-score" }
]
}

I executed the following DML:

D = read("data.csv");
tfD = transform(target=D,
transformSpec="data.spec.json",
transformPath="example-transform");
s = sum(tfD);
print("Sum = " + s);

This generated the following error:

java.lang.IllegalArgumentException: Invalid transformations on column ID 3.
A column can not be binned and scaled.

So, I removed the "scale" from data.spec.json:

{
"omit": [ "zipcode" ]
   ,"impute":
[ { "name": "district", "method": "constant", "value": "south" }
 ,{ "name": "numbedrooms" , "method": "constant", "value": 2 }
 ,{ "name": "numbathrooms", "method": "constant", "value": 1 }
 ,{ "name": "floors"  , "method": "constant", "value": 1 }
 ,{ "name": "view", "method": "global_mode" }
 ,{ "name": "askingprice" , "method": "global_mean" }
]

,"recode":
[ "zipcode", "district", "numbedrooms", "numbathrooms", "floors",
"view" ]

,"bin":
[ { "name": "saleprice"  , "method": "equi-width", "numbins": 3 }
 ,{ "name": "sqft"   , "method": "equi-width", "numbins": 4 }
]

,"dummycode":
[ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ]

}

This generated:

java.lang.RuntimeException: Encountered "NA" in column ID "3", when
expecting a numeric value. Consider adding "NA" to na.strings, along with
an appropriate imputation method.

So, I set "sqft" to be "global_mean" in the "impute" section of the spec.

{
"omit": [ "zipcode" ]
   ,"impute":
[ { "name": "district", "method": "constant", "value": "south" }
 ,{ "name": "numbedrooms" , "method": "constant", "value": 2 }
 ,{ "name": "numbathrooms", "method": "constant", "value": 1 }
 ,{ "name": "floors"  , "method": "constant", "value": 1 }
 ,{ "name": "view", "method": "global_mode" }
 ,{ "name": "askingprice" , "method": "global_mean" }
 ,{ "name": "sqft", "method": "global_mean" }
]

,"recode":
[ "zipcode", "district", "numbedrooms", "numbathrooms", "floors",
"view" ]

,"bin":
[ { "name": "saleprice"  , "method": "equi-width", "numbins": 3 }
 ,{ "name": "sqft"   , "method": "equi-width", "numbins": 4 }
]

,"dummycode":
[ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ]

}

This allowed the DML to execute successfully.

So, is "scale" not allowed anymore? And "bin" is allowed (despite the
message saying it isn't allowed)?

Thank you,
Deron


Re: DML transform() function

2015-12-10 Thread Deron Eriksson
Hi Shirish,

Thank you for the explanation. That clears everything up. I misinterpreted
the meaning of the error message.

I'll add that useful table to the documentation.

Deron


On Wed, Dec 9, 2015 at 5:59 PM, Shirish Tatikonda <
shirish.tatiko...@gmail.com> wrote:

> Hi Deron,
>
> As the error said "A column can not be binned and scaled.", no column can
> be subjected to both *binning* and *scaling *because it does not make
> sense. *Binning* turns a scale column with continuous values into a
> categorical column. On the other hand, *Scaling* can only be done on
> continuous values.
>
> The error *does not *mean that *Scaling* is not supported. We do support S
> *caling*.
>
> At some point, I wanted to add the following table (which is currently
> present in Java code as comments) to our documentation to indicate
> transformations that can be used *simultaneously* on a single column. While
> you are at it, could you make sure it is added to the documentation?
>
> x indicates the combination is invalid.
> * indicates the combination is allowed.
> - indicates the combination is not applicable.
>
>   OMIT MVI RCD BIN DCD SCL
> OMIT -  x   *   *   *   *
> MVI  x  -   *   *   *   *
> RCD  *  *   -   x   *   x
> BIN  *  *   x   -   *   x
> DCD  *  *   *   *   -   x
> SCL  *  *   x   x   x   -
>
> OMIT = Missing value handling by *omitting *rows
> MVI  = Missing value handling by *imputation*
> RCD  = Recoding
> BIN  = Binning
> DCD  = Dummycoding
> SCL  = Scaling
>
> Let me know if you have any further questions.
>
> Thank you,
> Shirish
>
>
> On Wed, Dec 9, 2015 at 4:53 PM, Deron Eriksson 
> wrote:
>
> > Hi,
> >
> > I'm working on updating the online docs for the DML transform() function
> > since a couple things didn't copy over in the conversion to markdown.
> > However, I've run into an issue when I execute the transform() example.
> In
> > summary, is the "scale" transformation no longer allowed, and "bin" is
> > allowed?
> >
> > I did the following:
> >
> > I created data.csv:
> >
> >
> >
> zipcode,district,sqft,numbedrooms,numbathrooms,floors,view,saleprice,askingprice
> > 95141,south,3002,6,3,2,FALSE,929,934
> > NA,west,1373,,1,3,FALSE,695,698
> > 91312,south,NA,6,2,2,FALSE,902,
> > 94555,NA,1835,3,,3,,888,892
> > 95141,west,2770,5,2.5,,TRUE,812,816
> > 95141,east,2833,6,2.5,2,TRUE,927,
> > 96334,NA,1339,6,3,1,FALSE,672,675
> > 96334,south,2742,6,2.5,2,FALSE,872,876
> > 96334,north,2195,5,2.5,2,FALSE,799,803
> >
> > I created data.csv.mtd:
> >
> > {
> > "data_type": "frame",
> > "format": "csv",
> > "sep": ",",
> > "header": true,
> > "na.strings": [ "NA", "" ]
> > }
> >
> > I created data.spec.json:
> >
> > {
> > "omit": [ "zipcode" ]
> >,"impute":
> > [ { "name": "district", "method": "constant", "value": "south" }
> >  ,{ "name": "numbedrooms" , "method": "constant", "value": 2 }
> >  ,{ "name": "numbathrooms", "method": "constant", "value": 1 }
> >  ,{ "name": "floors"  , "method": "constant", "value": 1 }
> >  ,{ "name": "view", "method": "global_mode" }
> >  ,{ "name": "askingprice" , "method": "global_mean" }
> > ]
> >
> > ,"recode":
> > [ "zipcode", "district", "numbedrooms", "numbathrooms", "floors",
> > "view" ]
> >
> > ,"bin":
> > [ { "name": "saleprice"  , "method": "equi-width", "numbins": 3 }
> >  ,{ "name": "sqft"   , "method": "equi-width", "numbins": 4 }
> > ]
> >
> > ,"dummycode":
> > [ "district", "numbathrooms", "floors", "view", "saleprice", "sqft" ]
> >
> > ,"scale":
> > [ { "name": "sqft", "method": "mean-subtraction" }
> >  ,{ "name": "saleprice", &q

Re: SystemML github mirror is behind one commit

2015-12-11 Thread Deron Eriksson
Hi,

The SystemML GitHub mirror repo is missing the latest commit again compared
with the Apache SystemML git repo. Would it be possible for someone to look
at this? Please let me know if I should create a JIRA.

Thank you
Deron

On Sun, Dec 6, 2015 at 4:31 AM, Daniel Gruno  wrote:

> On 12/06/2015 05:36 AM, Luciano Resende wrote:
> > Looks like the SystemML github mirror is missing one commit compared to
> > Apache SystemML git repository.
> >
> > git log --oneline master..apache/master
> > 41d9d2c Remove copyrights from license, add license where needed
> >
> > What's the best way to fix this issue ? Should I create a JIRA ?
> >
> > Thank you.
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>
> Hi Luciano et al,
> I've done a manual update of it now. This was likely missed because of
> the size of the commit. When a commit gets pushed, the sync process
> waits N seconds, then fetches the updates. If all the updates haven't
> been fully processed yet by the git backend, not all updates are fetched
> (and how is the poor system supposed to know that), so it happily pushes
> whatever it gets on to GitHub. I suppose we could increase the waiting
> period and see if that helps.
>
> With regards,
> Daniel.
>
>


Re: SystemML github mirror is behind one commit

2015-12-11 Thread Deron Eriksson
Thank you Geoffrey for the rapid response and the useful information! It is
greatly appreciated.

Deron


On Fri, Dec 11, 2015 at 10:54 AM, Geoffrey Corey  wrote:

> Looks like the git mirror is having some issues and locking up when
> syncing some of the svn->git mirrors. I have triggered a manual sync now,
> and am working on fixing up the mirror.
>
> In the future, you can try a sefl-service fix by commiting something like
> a whitespace change (has to contain a change of some sort), and if that
> doesn't work, you should open a JIRA, as INFRA gets a lot of emails and
> there's a chance it will unintentionally get drowned in the noise.
>
> On Fri, Dec 11, 2015 at 10:42 AM, Deron Eriksson 
> wrote:
>
>> Hi,
>>
>> The SystemML GitHub mirror repo is missing the latest commit again
>> compared with the Apache SystemML git repo. Would it be possible for
>> someone to look at this? Please let me know if I should create a JIRA.
>>
>> Thank you
>> Deron
>>
>> On Sun, Dec 6, 2015 at 4:31 AM, Daniel Gruno 
>> wrote:
>>
>>> On 12/06/2015 05:36 AM, Luciano Resende wrote:
>>> > Looks like the SystemML github mirror is missing one commit compared to
>>> > Apache SystemML git repository.
>>> >
>>> > git log --oneline master..apache/master
>>> > 41d9d2c Remove copyrights from license, add license where needed
>>> >
>>> > What's the best way to fix this issue ? Should I create a JIRA ?
>>> >
>>> > Thank you.
>>> >
>>> > --
>>> > Luciano Resende
>>> > http://people.apache.org/~lresende
>>> > http://twitter.com/lresende1975
>>> > http://lresende.blogspot.com/
>>>
>>> Hi Luciano et al,
>>> I've done a manual update of it now. This was likely missed because of
>>> the size of the commit. When a commit gets pushed, the sync process
>>> waits N seconds, then fetches the updates. If all the updates haven't
>>> been fully processed yet by the git backend, not all updates are fetched
>>> (and how is the poor system supposed to know that), so it happily pushes
>>> whatever it gets on to GitHub. I suppose we could increase the waiting
>>> period and see if that helps.
>>>
>>> With regards,
>>> Daniel.
>>>
>>>
>>
>


Closing pull requests

2015-12-13 Thread Deron Eriksson
Hi Luciano and others,

I just merged my first pull request from another user into SystemML.
Previously, before pushing to Apache master I've been doing a "commit
--amend" to add a "Closes #[PR-NUM]." to the end of the commit message so
as to let asfgit close the pull request. However, because of the sync
issues from Apache to GitHub (2 of my last 5 commits seemed to hang the
propagation), I decided to hold off on the "commit --amend" to try to keep
things as simple as possible to avoid any kind of sync issue.

So, the PR merged cleanly and the update shows up on the SystemML project
on GitHub as expected. However, the PR is not closed because asfgit didn't
close it and I don't have permissions to close it.

So, at this stage, does the user close the pull request, or is it possible
for me to have permissions to close the pull request?

Thank you,
Deron


parser.Token class and related exceptions

2015-12-15 Thread Deron Eriksson
I noticed the project contains the org.apache.sysml.parser.Token class,
which is an old JavaCC class that I believe is now obsolete since antlr is
used. The Token class itself is only referenced by
org.apache.sysml.parser.ParseException. The
org.apache.sysml.parser.DMLParseException (subclass of ParseException)
references the currentToken (Token) field of ParseException.

If the currentToken, expectedTokenSequences, and tokenImage fields are
removed from ParseException, the only code that requires further removal is
the small amount of code in DMLParseException that references currentToken,
which would seem to indicate that these fields aren't used and can be
removed. If those fields are removed, ParseException has no fields.

So, I was wondering the following:
(1) Can we remove the Token class and the three fields from ParseException
and remove the currentToken code from DMLParseException?
(2) Does ParseException have a different meaning and is it used differently
than DMLParseException, so they should be kept as separate classes, or can
they be combined into a single class?

Deron


DML example on main SystemML website

2015-12-16 Thread Deron Eriksson
Hi,

I think the main SystemML website at http://systemml.incubator.apache.org/
needs to be updated so that the DML example is an actual algorithm or at
least a fragment of an algorithm.

Does anyone have a recommendation for a short, concise example that shows
the power of DML?

Thanks!
Deron


Re: Open tasks: Integration with MLPipeline

2015-12-16 Thread Deron Eriksson
Hi Tsuyoshi,

We are still having some issues getting our JIRA project data imported into
Apache JIRA. Please see: https://issues.apache.org/jira/browse/INFRA-10714

We are hoping that the 3 missing fields will import correctly very soon. If
not, we will manually handle the issue so that we can begin using JIRA
again to track our issues. Thank you for your patience!

Deron


On Wed, Dec 16, 2015 at 5:16 PM, Tsuyoshi Ozawa <
ozawa.tsuyo...@lab.ntt.co.jp> wrote:

> Hi Niketan,
>
> The jira for SystemML seems to be open now:
> https://issues.apache.org/jira/browse/SYSTEMML
>
> Do you mind creating issue for the tasks? We can avoid conflicts of
> assignee
> by using JIRA.
>
> Thanks,
> - Tsuyoshi
>
>
> -Original Message-
> From: Glenn Weidner [mailto:gweid...@us.ibm.com]
> Sent: Tuesday, December 15, 2015 5:28 AM
> To: dev@systemml.incubator.apache.org
> Cc: npan...@us.ibm.com
> Subject: Re: Open tasks: Integration with MLPipeline
>
> Hi,
>
> I'm interested in working on item 4:
>
> 4. Add MLPipeline wrappers for existing scripts.
> - Refer to
> https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms
> to pick the algorithm and
> http://apache.github.io/incubator-systemml/algorithms-reference.html to
> understand the assumptions as well as parameters to the given algorithm.
> - A good algorithm to start with is L2SVM:
>
> http://apache.github.io/incubator-systemml/algorithms-classification.html#bi
> nary-class-support-vector-machines
>
> https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/
> l2-svm.dml
>
> https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/
> l2-svm-predict.dml
>
> Thanks,
> Glenn
>
>
> Niketan Pansare---12/03/2015 02:32:50 PM---Hi all, In this email, I list
> the
> open tasks related to integration with
>
> From: Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Cc: "Tatsuya Nishiyama" 
> Date: 12/03/2015 02:32 PM
> Subject: Open tasks: Integration with MLPipeline
>
> 
>
>
>
>
>
>
> Hi all,
>
> In this email, I list the open tasks related to integration with
> MLPipeline.
> This allows external developers to contribute to the SystemML project until
> our JIRA server is up and running.
>
> 1. Make the existing Logistic regression wrapper more robust:
> - Extend the wrapper or the DML script to handle zero-based labels (either
> throw an error or support zero-based labels).
>
> 2. Improve the performance of the Logistic regression wrapper:
> - Profile the wrapper to find potential bottlenecks. The candidates for
> bottlenecks are RDDConverterUtilsExt.vectorDataFrameToBinaryBlock and line
> 153-158 in LogisticRegressionModel.
>
> 3.  Perform detailed performance analysis of the converter utils.
> - Also explore the usability aspect of these utils.
>
> 4. Add MLPipeline wrappers for existing scripts.
> - Refer to
> https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms
> to pick the algorithm and
> http://apache.github.io/incubator-systemml/algorithms-reference.html to
> understand the assumptions as well as parameters to the given algorithm.
> - A good algorithm to start with is L2SVM:
>
> http://apache.github.io/incubator-systemml/algorithms-classification.html#bi
> nary-class-support-vector-machines
>
> https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/
> l2-svm.dml
>
> https://github.com/apache/incubator-systemml/blob/master/scripts/algorithms/
> l2-svm-predict.dml
>
> 5. Add the documentation for MLPipeline wrappers to
> http://apache.github.io/incubator-systemml/index.html
>
> References:
> 1. Existing Logistic regression wrappers:
>
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/a
> pache/sysml/api/ml/LogisticRegression.java
>
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/a
> pache/sysml/api/ml/LogisticRegressionModel.java
>
> 2. Converter utils:
>
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/a
> pache/sysml/runtime/instructions/spark/utils/RDDConverterUtilsExt.java
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
>
>
>
>
>


Re: DML example on main SystemML website

2015-12-16 Thread Deron Eriksson
That example is perfect. Concise and powerful. Thank you Fred.

Deron


On Wed, Dec 16, 2015 at 4:16 PM, Frederick R Reiss 
wrote:

> We can use the Poisson nonnegative matrix factorization example from last
> week's webcast:
>
>i = 0
>while(i < max_iterations) {
>H = (H * (t(W) %*% (V/(W%*%H + epsilon / t(colSums(W))
>W = (W * ((V/(W%*%H) + epsilon) %*% t(H))) / t(rowSums(H))
>i = i + 1;
>}
>
>
> Sound ok to everyone?
>
> Fred
>
>
> [image: Inactive hide details for Deron Eriksson ---12/16/2015 04:02:26
> PM---Hi, I think the main SystemML website at http://systemml.i]Deron
> Eriksson ---12/16/2015 04:02:26 PM---Hi, I think the main SystemML website
> at http://systemml.incubator.apache.org/
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 12/16/2015 04:02 PM
> Subject: DML example on main SystemML website
> --
>
>
>
> Hi,
>
> I think the main SystemML website at http://systemml.incubator.apache.org/
> needs to be updated so that the DML example is an actual algorithm or at
> least a fragment of an algorithm.
>
> Does anyone have a recommendation for a short, concise example that shows
> the power of DML?
>
> Thanks!
> Deron
>
>
>


Re: DML example on main SystemML website

2015-12-17 Thread Deron Eriksson
I updated the DML example on the main site to be the PNMF example from Fred
and Matthias.

I think the idea of a DML cookbook is a fantastic idea. I can begin a
cookbook based on the examples from Shirish. Additionally, I believe Alok
Singh has several functions as a result of his algorithm development work
that could be useful in a cookbook. I like Mike's suggestion of keeping
things simple. Such a notebook would probably not have a lot of structure
at first but would gain structure over time as code snippets can be grouped
and sorted by common functionality as their numbers increase.

Deron


On Thu, Dec 17, 2015 at 12:55 AM, Berthold Reinwald 
wrote:

> Creating a DML cookbook sounds very useful ... especially for new data
> scientists starting to pick up DML. It shows vectorization avoiding loops,
> and people are familiar with the semantics. There may be more nuggets like
> that through-out the DML algorithms. And the list will grow in the course
> of time.
>
> Regards,
> Berthold Reinwald
> IBM Almaden Research Center
> office: (408) 927 2208; T/L: 457 2208
> e-mail: reinw...@us.ibm.com
>
>
>
> From:   Shirish Tatikonda 
> To: dev@systemml.incubator.apache.org
> Date:   12/16/2015 10:03 PM
> Subject:Re: DML example on main SystemML website
>
>
>
> Deron,
>
> Along with such a complete algorithm, we could also include one/two common
> and useful DML snippets. We could also create a "DML Cookbook" with such
> snippets and keep adding more over time.
> Some example snippets are below -- note that I created them quite a while
> back, and they may need revision and testing.
>
> *Classifier Performance*
>
> # Confusion matrix
> cm = table(truthLabels, predictedLabels)
> TP = as.scalar(cm[1,1])
> TN = as.scalar(cm[2,2])
> FP = as.scalar(cm[1,2])
> FN = as.scalar(cm[2,1])
>
> accuracy = (TP+TN)/nrow(truthLabels)
> precision = TP / (TP+FP)
> recall = TP / (TP+FN)
>
> print("Accuracy = " + accuracy + ", Precision = " + precision + ", Recall
> =
> " + recall);
>
> *Covariance matrix*
>
> A = read("input.mtx");
> N = nrow(A);
> # column means
> mu = colSums(A)/N;
> # Covarianace matrix
> C = (t(A) %*% A)/(N-1) - (N/(N-1))*t(mu) %*% mu;
>
> *Select rows satisfying a predicate*
>
> ind = diag(ppred(A[,1], thresh, ">"));
> ind = removeEmpty(target=ind, margin="rows");
> result = ind %*% A;
>
>
> *Center and Scale columns*
>
> A = read("input.mtx");
> cm = colMeans(A);
> cvars = (colSums (A^2));
> cvars = (cvars - N*(cm^2))/(N-1);
> Ascaled = (A-cm)/sqrt(cvars);
>
>
> *Random shuffling of rows*
>
> N = nrow(A);
> s = sample(N, N, replace=FALSE);
> tab = table(seq(1:N), s);
> result = tab %*% A;
>
> On Wed, Dec 16, 2015 at 5:59 PM, Deron Eriksson 
> wrote:
>
> > That example is perfect. Concise and powerful. Thank you Fred.
> >
> > Deron
> >
> >
> > On Wed, Dec 16, 2015 at 4:16 PM, Frederick R Reiss 
> > wrote:
> >
> > > We can use the Poisson nonnegative matrix factorization example from
> last
> > > week's webcast:
> > >
> > >i = 0
> > >    while(i < max_iterations) {
> > >H = (H * (t(W) %*% (V/(W%*%H + epsilon / t(colSums(W))
> > >W = (W * ((V/(W%*%H) + epsilon) %*% t(H))) / t(rowSums(H))
> > >i = i + 1;
> > >}
> > >
> > >
> > > Sound ok to everyone?
> > >
> > > Fred
> > >
> > >
> > > [image: Inactive hide details for Deron Eriksson ---12/16/2015
> 04:02:26
> > > PM---Hi, I think the main SystemML website at http://systemml.i]Deron
> > > Eriksson ---12/16/2015 04:02:26 PM---Hi, I think the main SystemML
> > website
> > > at http://systemml.incubator.apache.org/
> > >
> > > From: Deron Eriksson 
> > > To: dev@systemml.incubator.apache.org
> > > Date: 12/16/2015 04:02 PM
> > > Subject: DML example on main SystemML website
> > > --
> > >
> > >
> > >
> > > Hi,
> > >
> > > I think the main SystemML website at
> > http://systemml.incubator.apache.org/
> > > needs to be updated so that the DML example is an actual algorithm or
> at
> > > least a fragment of an algorithm.
> > >
> > > Does anyone have a recommendation for a short, concise example that
> shows
> > > the power of DML?
> > >
> > > Thanks!
> > > Deron
> > >
> > >
> > >
> >
>
>


Re: [DISCUSS] Project Roadmap

2015-12-30 Thread Deron Eriksson
Hi,

I would like to suggest some documentation/usability/code tasks for the
2016 SystemML roadmap. The primary focus of these goals is to lower the
barrier to entry to SystemML for these groups: (1) Users without a data
science/ML background who want to try SystemML, (2) Data scientists who
want to run, modify, and create DML/PyDML scripts, (3) Developers who want
to contribute code to the project, and (4) Spark community who want to use
the MLContext API or Spark Batch Mode.

Tasks:

* Non-mathematical practical description of the purpose of each algorithm
and real-world examples of problems that each algorithm solves.

* Examples showing the conversion of real-world data sets (Wikipedia
database, images, log files, Twitter messages, etc) to matrix-based
representations for use in SystemML.

* Working one-line examples of invoking each algorithm on an existing small
data set (The user can copy/paste this single line and it runs). This means
creating working example data files so that the user doesn't need to. These
data files can be in the SystemML project, in another project, or they can
be deployed to a web server and SystemML can read the data sets from URLs.

* DML Cookbook to give script writers the DML building blocks they need.

* DML Language Reference completely up-to-date.

* PyDML Language Reference converted to markdown, clean mirror of DML
Language Reference, and up-to-date.

* Document DML algorithm best practices into programming guide (especially,
how to write algorithms that scale efficiently).

* Structure documentation to more clearly indicate the ways to invoke
SystemML.

* Identify heavily used classes/methods (run test suite with a profiler)
and ensure these classes/methods have Javadocs and are efficient.

* Create printMatrix() function to allow a user doing prototyping to see a
matrix or a subset of a matrix in the console rather than having to write
to a file and open the file to see the result.

* If a DML function doesn't return a value, don't require an lvalue when
calling the function.

* Spark Batch Mode clearly documented.

* Very thoroughly Javadoc the MLContext API (MLContext and related
classes/methods) since it is a programmatic interface with enormous
potential for the Spark community.

* Address differences in data representations between Spark (RDD/DataFrame)
and SystemML (binary block). Determine solution to give best performance
when working on a large distributed data set while optimizing the
capabilities of Spark and SystemML. Is DataFrame-to-binary-block conversion
needed or is it possible to use a single format and avoid the data
conversion cost?

* Enhanced Spark integration, for instance ML Pipeline integration via Java
or Scala algorithm wrappers.

* Ensure documentation allows a user to download SystemML and run a 'Hello
World' DML example and an actual algorithm in 5 minutes or less.

* IDE tools such as DML editor that allows code completion.

* Promote SystemML in the user community:
  (1) activity on mailing lists
  (2) talks at conferences
  (3) academic papers
  (4) blog posts
  (5) post information to forums such as stackoverflow

Deron


On Mon, Dec 21, 2015 at 3:09 AM, Matthias Boehm  wrote:

> From my perspective, our roadmap for 2016 should cover the following
> SystemML engine extensions with regard to runtime (R), optimizer (O), as
> well as language and tools (L). Each sub-bullet in the following list will
> be further broken down into multiple JIRAs.
>
> R1) Extended Scale-Up Backend
> * Support for large dense matrix blocks >16GB
> * Extended multi-threaded operations (e.g., ctable, aggregate)
> * NUMA-awareness (partitioning and multi-threaded operations)
> * Extended update-in-place support
>
> R2) Generalized Matrix Block Library
> * Investigation interface design (abstraction)
> * Boolean matrices and operations
> * Different types of sparse matrix blocks
> * Additional physical runtime operators
>
> R3) HW Accelerators / Low-Level Optimizations
> * Exploit GPU BLAS libraries (integration)
> * Custom GPU kernels for complex operator patterns
> * Low-level optimizations (source code gen, compression)
>
> O1) Global Program Optimization
> * Global data flow optimization (rewrites, holistic)
> * Code motion (for cse, program block merge)
> * Advanced loop vectorization (common patterns)
> * Advanced function inlining (inlining multi-block functions)
> * Extended inter-procedure analysis (independent constant propagation)
>
> O2) Cost Model
> * Update memory budgets wrt Spark 1.6 dynamic memory management
> * Extended runtime cost model for Spark (incl lazy evaluation)
> * Extended execution type selection based on FLOPs
>
> O3) Dynamic Rewrites
> * Extended matrix mult chain opt (sparsity, rewrites, ops)
> * Rewrites exploiting additional statistics (e.g., min/max)
>
> O4) Optimizer Support R2/R3
> * Extended memory estimates for R2/R3
> * Type inference for matrix operations
> * Extended cost model and operator selection
>
> L1) Extended 

Re: POC Eclipse IDE for DML

2016-01-01 Thread Deron Eriksson
Hi Nakul,

This is very cool! I see great value in IDE tools such as an Eclipse editor
for DML/PyDML. Highlighting syntax errors helps people catch syntax
mistakes before running the script, and IDE code completion is a great
productivity boost.

Also, I like how clean and readable the grammar is. I don't know how others
feel, but it is so clean that, if flushed out, it might be a useful
addition to the DML Language Reference. It does things such as enumerate
the built-in functions, which is handy when trying to wrap your mind around
the language. The Dml.g4 file is very powerful and useful but I have a
harder time understanding the language syntax when looking at it since
things like built-in functions aren't included in it. I wonder if there is
a way to integrate some of your work into Dml.g4?

I haven't done Xtext before. Do you have any advice to get started if I'd
like to try this out in my Eclipse (for instance, to look at the semicolon
issue)?

Deron



On Mon, Dec 21, 2015 at 2:44 PM, Nakul Jindal  wrote:

> Hi,
>
> I've been trying to build a Proof of Concept IDE for DML using Xtext.
>
> https://github.com/nakul02/sysml.dml
>
> Grammar File :
>
> https://github.com/nakul02/sysml.dml/blob/master/sysml.dml/src/sysml/Dml.xtext
>
> Here are some screenshots:
> http://imgur.com/a/ZJyg7
>
> The files shown are from the algorithms folder.
> https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms
>
> (There was no particular reason to have chosen these files over others)
>
> For this POC, my only goal was to get syntax highlighting to work in
> Eclipse.
> SystemML DML grammar is written in ANTLRv4 whereas Xtext uses ANTLRv3. Some
> amount of work was needed on the grammar before it could be used it for
> Xtext.
>
> From the screenshots, you can see that eclipse thinks there are syntactic
> errors in the files. This has to do with statements ending in an optional
> semicolon.
>
> I only supplied xtext a grammar file from which to generate the IDE. I have
> not yet worked on the generated code to add any other features.
> I could not come up with a LL(*) grammar that would let me express optional
> semicolons at the end of statements. For the time being, the grammar
> requires semicolons at the end of statements (as you can see in the grammar
> file :
>
> https://github.com/nakul02/sysml.dml/blob/master/sysml.dml/src/sysml/Dml.xtext
> )
>
> I discussed this problem with Fred(@frreiss) and he suggested 3 ways of
> dealing with this (assuming no way of modifying the grammar to fix this):
>
> 1. Support a subset of DML in the XText based tools
> 2. Modify the xtext generated tools to fix this
> 3. Write an eclipse plugin from scratch with complete control over lexer
> and parser (among other components)
>
> Option 1 requires the least amount of work.
> I am not sure between options 2 & 3, which would be more work.
>
>
> Is this useful to anyone?
> If so, are there any thoughts or suggestions on how to approach the
> optional semicolon problem?
>
>
> Nakul Jindal
> https://github.com/nakul02/
>


Re: [DISCUSS] Project Roadmap

2016-01-01 Thread Deron Eriksson
Hi Matthias,

I agree about the JIRA situation. I am writing notes in a notebook to keep
track of what I am working on. I was hopeful that Alan's comment from Dec
17th would help with https://issues.apache.org/jira/browse/INFRA-10714. If
the missing fields can't be imported automatically, perhaps we can manually
update the missing fields and move forward.

As for specific algorithms, I'm not well-versed enough in the ML field to
make worthwhile suggestions without significant research, but I can think
of some qualities that I would look for in candidate algorithms:

* Is there a strong demand/need for the algorithm? For example, in a very
general sense, if a Google search is done for the algorithm, do we get back
10 hits or 10 million hits? Of course, in the case of a novel algorithm, no
results would be returned because the world doesn't even know that it needs
this algorithm yet.

* How much effort is required for the algorithm implementation? 10 lines of
DML vs 1000 lines of DML makes a huge difference.

* Does a high-performance, scalable version already exist, such as in Spark
MLlib? If a great implementation already exists, perhaps another algorithm
should be chosen, unless a strong need is seen for customizability of that
algorithm.

* Is an algorithm so ubiquitous that a toolkit of DML algorithms would seem
incomplete without it?

* Does an open-source R implementation already exist? If so, it could serve
as a useful starting point to a DML implementation for distributed
computing.

* Personally, this interests me... Does the algorithm solve an interesting
problem that generates results that can be presented in a way that has
sensory impact? This is a "wow" factor. Imagine presenting results at a
conference and having the audience murmur because they're impressed.
Pictures and graphs make a compelling case.
 (1) For example, think about something like facial recognition. What if an
algorithm is used in applications that let you do the following queries:
 Find pictures of me.
 Find pictures of people that look like me.
 Where do I look most similar to people on the planet?
 If I exercise, what will I look like in 20 years?
 How old do I currently look?
 What historical figure do I look most like?
 What movie character do I look most like?
  To me, answers to questions like these have a certain "wow" about them
because of the visuals that can be tied to them.
 (2) As another example, I saw Fred do a presentation regarding Poisson
Nonnegative Matrix Factorization and thought the graphical presentation of
the results were amazing and compelling. His graphs conveyed both the
accuracy and scalability of the DML algorithm, in addition to SystemML's
customizability applied to a real-world business case. It also showcased
the power of DML utilizing a very compact piece of code.

Deron



On Thu, Dec 31, 2015 at 9:36 AM, Matthias Boehm  wrote:

> That's a good point Deron - we will incorporate these tasks into the road
> map. Additionally, we should also include a list of new algorithms. Any
> suggestions?
>
> Furthermore, I'd like to have all JIRAs created by mid January. If the
> infra ticket is not resolved by then, I would rather start with a clean
> JIRA than waiting for this any longer.
>
> Regards
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---12/30/2015 05:58:17
> PM---Hi, I would like to suggest some documentation/usability/c]Deron
> Eriksson ---12/30/2015 05:58:17 PM---Hi, I would like to suggest some
> documentation/usability/code tasks for the
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 12/30/2015 05:58 PM
> Subject: Re: [DISCUSS] Project Roadmap
> --
>
>
>
> Hi,
>
> I would like to suggest some documentation/usability/code tasks for the
> 2016 SystemML roadmap. The primary focus of these goals is to lower the
> barrier to entry to SystemML for these groups: (1) Users without a data
> science/ML background who want to try SystemML, (2) Data scientists who
> want to run, modify, and create DML/PyDML scripts, (3) Developers who want
> to contribute code to the project, and (4) Spark community who want to use
> the MLContext API or Spark Batch Mode.
>
> Tasks:
>
> * Non-mathematical practical description of the purpose of each algorithm
> and real-world examples of problems that each algorithm solves.
>
> * Examples showing the conversion of real-world data sets (Wikipedia
> database, images, log files, Twitter messages, etc) to matrix-based
> representations for use in SystemML.
>
> * Working one-line examples of invoking each algorithm on an existing small
> data set (The user can copy/paste this single line and it runs). This means
> creating working example data files so that the user doesn

Re: Cleanup SparkTC/systemml repository

2016-01-02 Thread Deron Eriksson
Hi Matthias,

I agree. If I do a Google search for "systemml", the old GitHub site comes
up on the first page of Google results and the new GitHub site doesn't,
which is very confusing to new users.

Deron






On Sat, Jan 2, 2016 at 12:11 PM, Matthias Boehm  wrote:

>
> Hi all,
>
> I'd like to delete our old SparkTC/systemml repository because it's causing
> unnecessary confusion and it's anyway outdated. For example, even
> "developerWorks Open" is still referring to the old repository.
>
> @Luciano: Could you please delete the SparkTC/systemml repository if nobody
> objects by this Friday? Thanks.
>
>
> Regards,
> Matthias
>


Re: POC Eclipse IDE for DML

2016-01-02 Thread Deron Eriksson
Hi Matthias,

Thank you for the clarification with regards to the parser grammar and
built-in functions. To me, as a relative newbie, in one way, the built-in
functions seem similar to java.lang.* classes in Java, but in another way,
they seem almost even more intrinsic than that.

I guess what I was mainly trying to get at is that Dml.g4 is fantastic in
terms of flexibility, especially when considered in conjunction with the
supporting referenced Java classes, but Nakul's grammar for IDE support is
able to convey the essence of the language capabilities (such as built-in
functions) in one place.

Also, consider the following. The icdf() function is not in Dml.g4 since it
is a built-in function. It is handled in the
parser.ParameterizedBuiltinFunctionExpression class. On the other hand,
as.scalar() is in Dml.g4 and in parser.BuiltinFunctionExpression. However,
castAsScalar() is not in Dml.g4 but is handled similarly in
parser.BuiltinFunctionExpression.

As a result of this, I am now confused. If I am trying to understand what I
can do with the language, do I look in Dml.g4 (such as for as.scalar()), do
I look in BuiltinFunctionExpression (such as for as.scalar() and
castAsScalar()), or do I look in ParameterizedBuiltinFunctionExpression
(such as for icdf())?

Perhaps as.scalar() needs to be removed from the grammar and as.scalar()
and castAsScalar() need to be condensed to one function and I will be less
confused. :-)

Maybe I am overthinking things. The DML Language Reference does contain a
tremendous amount of useful information with regards to the language
capabilities.

Deron




On Sat, Jan 2, 2016 at 12:25 PM, Matthias Boehm  wrote:

> just to clarify, our parser grammar does not include builtin functions
> because they are -- in contrast to keywords and language constructs -- not
> part of the DML/PyDML syntax. This is important for both maintainability
> and flexibility. For example, it allows you to define a variable or
> user-defined function with the name 't' although there exists a builtin
> function t() for transpose operations.
>
> Regards,
> Matthias
>
>
> [image: Inactive hide details for Deron Eriksson ---01/01/2016 11:38:27
> AM---Hi Nakul, This is very cool! I see great value in IDE tool]Deron
> Eriksson ---01/01/2016 11:38:27 AM---Hi Nakul, This is very cool! I see
> great value in IDE tools such as an Eclipse editor
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 01/01/2016 11:38 AM
> Subject: Re: POC Eclipse IDE for DML
> --
>
>
>
> Hi Nakul,
>
> This is very cool! I see great value in IDE tools such as an Eclipse editor
> for DML/PyDML. Highlighting syntax errors helps people catch syntax
> mistakes before running the script, and IDE code completion is a great
> productivity boost.
>
> Also, I like how clean and readable the grammar is. I don't know how others
> feel, but it is so clean that, if flushed out, it might be a useful
> addition to the DML Language Reference. It does things such as enumerate
> the built-in functions, which is handy when trying to wrap your mind around
> the language. The Dml.g4 file is very powerful and useful but I have a
> harder time understanding the language syntax when looking at it since
> things like built-in functions aren't included in it. I wonder if there is
> a way to integrate some of your work into Dml.g4?
>
> I haven't done Xtext before. Do you have any advice to get started if I'd
> like to try this out in my Eclipse (for instance, to look at the semicolon
> issue)?
>
> Deron
>
>
>
> On Mon, Dec 21, 2015 at 2:44 PM, Nakul Jindal  wrote:
>
> > Hi,
> >
> > I've been trying to build a Proof of Concept IDE for DML using Xtext.
> >
> > https://github.com/nakul02/sysml.dml
> >
> > Grammar File :
> >
> >
> https://github.com/nakul02/sysml.dml/blob/master/sysml.dml/src/sysml/Dml.xtext
> >
> > Here are some screenshots:
> > http://imgur.com/a/ZJyg7
> >
> > The files shown are from the algorithms folder.
> >
> https://github.com/apache/incubator-systemml/tree/master/scripts/algorithms
> >
> > (There was no particular reason to have chosen these files over others)
> >
> > For this POC, my only goal was to get syntax highlighting to work in
> > Eclipse.
> > SystemML DML grammar is written in ANTLRv4 whereas Xtext uses ANTLRv3.
> Some
> > amount of work was needed on the grammar before it could be used it for
> > Xtext.
> >
> > From the screenshots, you can see that eclipse thinks there are syntactic
> > errors in the files. This has to do with statements ending in an optional
> > semicolon.
> >
> > I only supplied xt

Re: January 2016 Report

2016-01-04 Thread Deron Eriksson
Hello,

This is Deron Eriksson from the SystemML Apache Incubator project. I think
Luciano Resende (lresende) (who filed the last monthly report) may be on
vacation for another week, and the Apache Incubator status report is due
this Wednesday Jan 6th. I logged on to
https://wiki.apache.org/incubator/January2016 but I see it listed as
"Immutable Page" for me.

Would it be possible for a couple SystemML project committers to be given
edit capabilities to the report page, such as Fred Reiss (freiss), Deron
Eriksson (deron), and Mike Dusenberry (dusenberrymw)? Where can I make such
a request? If we can't request edit capabilities, should the status
information be emailed to a particular address?

Thank you!
Deron





On Wed, Dec 30, 2015 at 2:05 PM, Marvin Humphrey  wrote:

> Greetings, {podling} developers!
>
> This is a reminder that your report is due next Wednesday, January
> 6th.  Details below.
>
> Best,
>
> Marvin Humphrey, Report Manager for January, on behalf of the
> Incubator PMC
>
> ---
>
> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 20 January 2016, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, January 6th).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
>
> This should be appended to the Incubator Wiki page at:
>
> http://wiki.apache.org/incubator/January2016
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>
>


Re: January 2016 Report

2016-01-04 Thread Deron Eriksson
Hello Marvin,

My Incubator wiki login is: DeronEriksson
Mike's Incubator wiki login is: MikeDusenberry

Thank you!
Deron


On Mon, Jan 4, 2016 at 3:35 PM, Marvin Humphrey 
wrote:

> On Mon, Jan 4, 2016 at 3:24 PM, Deron Eriksson 
> wrote:
>
> > Would it be possible for a couple SystemML project committers to be given
> > edit capabilities to the report page, such as Fred Reiss (freiss), Deron
> > Eriksson (deron), and Mike Dusenberry (dusenberrymw)? Where can I make
> such
> > a request? If we can't request edit capabilities, should the status
> > information be emailed to a particular address?
>
> The wiki is protected by a whitelist of contributors as an anti-spam
> measure. What we need to know are your logins for the Incubator wiki,
> which are not the same as your Apache IDs.  Those look like Apache IDs
> above. Can you please supply us with your Incubator wiki logins?
>
> Marvin Humphrey
>


January 2016 SystemML Incubator Podling Report - Draft

2016-01-05 Thread Deron Eriksson
Hi,

The January 2016 SystemML Incubator Podling Report is due tomorrow (Jan
6th). Here is a rough draft that I threw together. Feel free to
change/comment/etc. I tried to stress that we need our JIRA server up.
Luciano, Mike, and I have access to edit the Apache Incubator Wiki page (
https://wiki.apache.org/incubator/January2016). If any other
committer needs edit access to the Apache Incubator Wiki, please see the
thread on the systemml dev and general incubator mailing lists.

Deron




SystemML

SystemML provides declarative large-scale machine learning (ML) that aims at
flexible specification of ML algorithms and automatic generation of hybrid
runtime plans ranging from single node, in-memory computations, to
distributed computations such as Apache Hadoop MapReduce and Apache Spark.

SystemML has been incubating since 2015-11-02.

Three most important issues to address in the move towards graduation:

  1. Grow SystemML community: increase mailing list activity,
 increase adoption of SystemML for scalable machine learning, encourage
 data scientists to adopt DML and PyDML algorithm scripts, respond to
 user feedback to ensure SystemML meets the requirements of real-world
 situations, write papers, and present talks about SystemML.

  2. Core library improvements, including Apache Spark integration.

  3. Produce a release

Any issues that the Incubator PMC or ASF Board wish/need to be
aware of?

  The community is blocked by INFRA-10714. Not having the JIRA server
  data in a correct state for two months is affecting our ability to manage
  issues and makes it very difficult to delegate issues to new users to grow
  our community. We will manually handle the missing fields if the JIRA data
  can't be imported.

How has the community developed since the last report?

  Users have asked several excellent questions on the dev list, and existing
  committers are actively helping these users. The project is receiving pull
  requests from contributors, committers have discussed these pull requests
  with their contributors, and contributions have been merged into the
project.
  Matthias Boehm has presented talks regarding the SystemML Optimizer at
  TU Dresden, HTW Dresden, and TU Berlin.

How has the project developed since the last report?

  Numerous core library improvements have been made to project. Additional
  documentation has been created to help new users. The test suite has been
  refactored to increase maintainability and performance.

Date of last release:

  NONE


SystemML Git Notes

2016-01-05 Thread Deron Eriksson
Hi,

I've created a GitHub Gist page with some of the git-related information
that I've found most useful recently, similar to Mike's "SystemML Git
Guide" Gist page. This page deals primarily with information that is useful
for a committer handling pull requests or doing things such as updating the
SystemML documentation or website. I'll add to it as I learn to handle more
complicated situations with git. A lot of git commands are very powerful,
so it's good to have a good understanding of what they mean before
executing them.

This page is located at:
https://gist.github.com/deroneriksson/e0d6d0634f3388f0df5e

Deron


Re: January 2016 SystemML Incubator Podling Report - Draft

2016-01-06 Thread Deron Eriksson
Thank you Matthias. I have modified the report draft, as seen below.

The report is due today, so if anyone has any other comments, please let me
know.

Deron


SystemML

SystemML provides declarative large-scale machine learning (ML) that aims at
flexible specification of ML algorithms and automatic generation of hybrid
runtime plans ranging from single node, in-memory computations, to
distributed computations such as Apache Hadoop MapReduce and Apache Spark.

SystemML has been incubating since 2015-11-02.

Three most important issues to address in the move towards graduation:

  1. Grow SystemML community: increase mailing list activity,
 increase adoption of SystemML for scalable machine learning, encourage
 data scientists to adopt DML and PyDML algorithm scripts, respond to
 user feedback to ensure SystemML meets the requirements of real-world
 situations, write papers, and present talks about SystemML.

  2. Core library improvements, including Apache Spark integration.

  3. Produce a release

Any issues that the Incubator PMC or ASF Board wish/need to be
aware of?

  The community has been blocked by INFRA-10714 since November.
  Beginning Jan 6th, we are manually updating the missing fields in JIRA so
  that we can properly manage project issues and delegate issues to new
  users to grow our community.

How has the community developed since the last report?

  Users have asked several excellent questions on the dev list, and existing
  committers are actively helping these users. The project is receiving pull
  requests from contributors, committers have discussed these pull requests
  with their contributors, and contributions have been merged into the
project.
  Matthias Boehm has presented talks regarding the SystemML Optimizer at
  TU Dresden, HTW Dresden, and TU Berlin. Fred Reiss presented a talk on
  Dec 8th.

How has the project developed since the last report?

  Numerous core library improvements have been made to project. Additional
  documentation has been created to help new users. The test suite has been
  refactored to increase maintainability and performance. Contributions have
  been made by non-IBM contributors. We are developing our 2016 roadmap,
  as seen in the "[DISCUSS] Project Roadmap" thread on the mailing list at

https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00153.html
.

Date of last release:

  NONE


On Tue, Jan 5, 2016 at 5:03 PM, Matthias Boehm  wrote:

> looks good; just a couple of potential additions:
>
> 1) Other talks (e.g., by Fred, Dec 8)
> 2) Developing our 2016 roadmap (incl link to dev list thread)
> 3) External (non-IBM) contributions.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---01/05/2016 02:57:47
> PM---Hi, The January 2016 SystemML Incubator Podling Report is]Deron
> Eriksson ---01/05/2016 02:57:47 PM---Hi, The January 2016 SystemML
> Incubator Podling Report is due tomorrow (Jan
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 01/05/2016 02:57 PM
> Subject: January 2016 SystemML Incubator Podling Report - Draft
> --
>
>
>
> Hi,
>
> The January 2016 SystemML Incubator Podling Report is due tomorrow (Jan
> 6th). Here is a rough draft that I threw together. Feel free to
> change/comment/etc. I tried to stress that we need our JIRA server up.
> Luciano, Mike, and I have access to edit the Apache Incubator Wiki page (
> https://wiki.apache.org/incubator/January2016). If any other
> committer needs edit access to the Apache Incubator Wiki, please see the
> thread on the systemml dev and general incubator mailing lists.
>
> Deron
>
>
> 
>
> SystemML
>
> SystemML provides declarative large-scale machine learning (ML) that aims
> at
> flexible specification of ML algorithms and automatic generation of hybrid
> runtime plans ranging from single node, in-memory computations, to
> distributed computations such as Apache Hadoop MapReduce and Apache Spark.
>
> SystemML has been incubating since 2015-11-02.
>
> Three most important issues to address in the move towards graduation:
>
>  1. Grow SystemML community: increase mailing list activity,
> increase adoption of SystemML for scalable machine learning, encourage
> data scientists to adopt DML and PyDML algorithm scripts, respond to
> user feedback to ensure SystemML meets the requirements of real-world
> situations, write papers, and present talks about SystemML.
>
>  2. Core library improvements, including Apache Spark integration.
>
>  3. Produce a release
>
> Any issues that the Incubator PMC or ASF Board wish/need to be
> aware of?
>
>  The community is blocked by INFRA-10714. Not having the JIRA server
&

Re: SystemML Git Notes

2016-01-06 Thread Deron Eriksson
Good point. Maybe we can create a similar page containing some guidelines
for project committers. It would probably be even nicer to have a similar
project page regarding best practices for beginners to contribute to the
project (forking, cloning, branching, branch naming, etc).


On Wed, Jan 6, 2016 at 3:11 PM, Luciano Resende 
wrote:

> On Tue, Jan 5, 2016 at 4:05 PM, Deron Eriksson 
> wrote:
>
> > Hi,
> >
> > I've created a GitHub Gist page with some of the git-related information
> > that I've found most useful recently, similar to Mike's "SystemML Git
> > Guide" Gist page. This page deals primarily with information that is
> useful
> > for a committer handling pull requests or doing things such as updating
> the
> > SystemML documentation or website. I'll add to it as I learn to handle
> more
> > complicated situations with git. A lot of git commands are very powerful,
> > so it's good to have a good understanding of what they mean before
> > executing them.
> >
> > This page is located at:
> > https://gist.github.com/deroneriksson/e0d6d0634f3388f0df5e
> >
> > Deron
> >
>
> These type of information is very useful, but creating them as gist and not
> as part of the website will make them non-discoverable very soon.
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Cleanup SparkTC/systemml repository

2016-01-06 Thread Deron Eriksson
Hi Luciano,

Thank you for making the SparkTC/systemml repository private. This will
definitely lead to less confusion.

It turns out that there is information in some of the pull request
conversations from the old repository that I would like to consult (with
regards to the JIRA issues that I'm cleaning up because of the problems
with the import to Apache JIRA). Could you please grant me access to see
the private repository?

Thanks!
Deron


On Sun, Jan 3, 2016 at 9:32 PM, Luciano Resende 
wrote:

> On Sat, Jan 2, 2016 at 12:11 PM, Matthias Boehm  wrote:
>
> >
> > Hi all,
> >
> > I'd like to delete our old SparkTC/systemml repository because it's
> causing
> > unnecessary confusion and it's anyway outdated. For example, even
> > "developerWorks Open" is still referring to the old repository.
> >
> > @Luciano: Could you please delete the SparkTC/systemml repository if
> nobody
> > objects by this Friday? Thanks.
> >
> >
> > Regards,
> > Matthias
> >
>
> I have made the repository private, will delete after couple weeks if
> nobody objects.
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Cleanup SparkTC/systemml repository

2016-01-06 Thread Deron Eriksson
One other issue. Is it possible to download the SystemML 0.8.0 binary
release from anywhere besides https://github.com/SparkTC/systemml/releases?
If the repo is private, users can't download these binaries. So perhaps we
need to migrate the 0.8.0 binaries to the new site until 0.9.0 is ready?




On Wed, Jan 6, 2016 at 4:54 PM, Deron Eriksson 
wrote:

> Hi Luciano,
>
> Thank you for making the SparkTC/systemml repository private. This will
> definitely lead to less confusion.
>
> It turns out that there is information in some of the pull request
> conversations from the old repository that I would like to consult (with
> regards to the JIRA issues that I'm cleaning up because of the problems
> with the import to Apache JIRA). Could you please grant me access to see
> the private repository?
>
> Thanks!
> Deron
>
>
> On Sun, Jan 3, 2016 at 9:32 PM, Luciano Resende 
> wrote:
>
>> On Sat, Jan 2, 2016 at 12:11 PM, Matthias Boehm 
>> wrote:
>>
>> >
>> > Hi all,
>> >
>> > I'd like to delete our old SparkTC/systemml repository because it's
>> causing
>> > unnecessary confusion and it's anyway outdated. For example, even
>> > "developerWorks Open" is still referring to the old repository.
>> >
>> > @Luciano: Could you please delete the SparkTC/systemml repository if
>> nobody
>> > objects by this Friday? Thanks.
>> >
>> >
>> > Regards,
>> > Matthias
>> >
>>
>> I have made the repository private, will delete after couple weeks if
>> nobody objects.
>>
>> --
>> Luciano Resende
>> http://people.apache.org/~lresende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>


Re: Cleanup SparkTC/systemml repository

2016-01-06 Thread Deron Eriksson
Agreed. Sounds like a great idea.



On Wed, Jan 6, 2016 at 5:14 PM, Matthias Boehm  wrote:

> Actually, I'd like to propose to release 0.8.1 as a maintenance release
> under our Apache github repo this week. From my perspective, it would be
> the perfect time for that because meanwhile our Spark backend has been
> stabilized.
>
> Regards,
> Matthias
>
>
> [image: Inactive hide details for Deron Eriksson ---01/06/2016 05:05:52
> PM---One other issue. Is it possible to download the SystemML 0]Deron
> Eriksson ---01/06/2016 05:05:52 PM---One other issue. Is it possible to
> download the SystemML 0.8.0 binary release from anywhere besides
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 01/06/2016 05:05 PM
> Subject: Re: Cleanup SparkTC/systemml repository
> --
>
>
>
> One other issue. Is it possible to download the SystemML 0.8.0 binary
> release from anywhere besides
> https://github.com/SparkTC/systemml/releases?
> If the repo is private, users can't download these binaries. So perhaps we
> need to migrate the 0.8.0 binaries to the new site until 0.9.0 is ready?
>
>
>
>
> On Wed, Jan 6, 2016 at 4:54 PM, Deron Eriksson 
> wrote:
>
> > Hi Luciano,
> >
> > Thank you for making the SparkTC/systemml repository private. This will
> > definitely lead to less confusion.
> >
> > It turns out that there is information in some of the pull request
> > conversations from the old repository that I would like to consult (with
> > regards to the JIRA issues that I'm cleaning up because of the problems
> > with the import to Apache JIRA). Could you please grant me access to see
> > the private repository?
> >
> > Thanks!
> > Deron
> >
> >
> > On Sun, Jan 3, 2016 at 9:32 PM, Luciano Resende 
> > wrote:
> >
> >> On Sat, Jan 2, 2016 at 12:11 PM, Matthias Boehm 
> >> wrote:
> >>
> >> >
> >> > Hi all,
> >> >
> >> > I'd like to delete our old SparkTC/systemml repository because it's
> >> causing
> >> > unnecessary confusion and it's anyway outdated. For example, even
> >> > "developerWorks Open" is still referring to the old repository.
> >> >
> >> > @Luciano: Could you please delete the SparkTC/systemml repository if
> >> nobody
> >> > objects by this Friday? Thanks.
> >> >
> >> >
> >> > Regards,
> >> > Matthias
> >> >
> >>
> >> I have made the repository private, will delete after couple weeks if
> >> nobody objects.
> >>
> >> --
> >> Luciano Resende
> >> http://people.apache.org/~lresende
> >> http://twitter.com/lresende1975
> >> http://lresende.blogspot.com/
> >>
> >
> >
>
>
>


Added users to Committers and Contributors Roles on JIRA

2016-01-07 Thread Deron Eriksson
I noticed that I could not assign issues to several committers (I'm
manually fixing data that failed to import into Apache JIRA for the issues
I previously reported). This was because Issue Permissions for Assignable
User are Project Role (PMC), Project Role (Contributors), Project Role
(Committers), and Project Role (Administrators).

The JIRA site had Users as Administrators and Developers but no other
roles. So, I added all the committers listed at
http://systemml.apache.org/community-members.html to the Committers Role. I
additionally added 4 Users to Contributors.

Now, issues can be assigned to these users.

Deron


Updated my reported JIRA issues

2016-01-07 Thread Deron Eriksson
I updated the Component field and Assignee field for the 61 JIRA issues
that I reported prior to the migration to Apache JIRA. All the unassigned
issues had become assigned to Fred from the import process, so now they are
reset to Unassigned for the issues I reported.

Several of these 'issues' may not be issues at all but may instead be my
own personal preferences. As a result, a few should probably be closed
before other contributors begin working on them. Committers should feel
free to close the issues that are not issues. Any discussion or comments
regarding them would be welcomed.

Thank you,
Deron


Re: SystemML JIRA Site Is Live!

2016-01-08 Thread Deron Eriksson
Hi Luciano,

Right now there is no notification scheme for the project. See
https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications

I don't see an option to create or modify a notification scheme.

Do you happen to know if we need to submit a request to INFRA to do this?

Also, do you happen to know if there is a standard notification scheme used
by Apache, or do we specify it?

Here is the information from our old site:

The following events had notifications for All Watchers, Current Assignee,
and Reporter:
  Issue Created
  Issue Updated
  Issue Assigned
  Issue Resolved
  Issue Closed
  Issue Commented
  Issue Comment Edited
  Issue Reopened
  Issue Deleted
  Issue Moved
  Work Logged On Issue
  Work Started On Issue
  Work Stopped On Issue
  Issue Worklog Updated
  Issue Worklog Deleted
  Generic Event
The following event had no notifications:
  Issue Comment Deleted

Deron




On Thu, Jan 7, 2016 at 10:23 AM, Luciano Resende 
wrote:

> Great news, do we know where the jira notifications are being sent to ?
> should it go to dev@systemml ?
>
> On Wed, Jan 6, 2016 at 10:34 AM, Mike Dusenberry 
> wrote:
>
> > Hi all,
> >
> > Good news! The SystemML Apache JIRA project is now live, and can be found
> > at [https://issues.apache.org/jira/browse/SYSTEMML].
> >
> > There are a few caveats though:
> >
> > 1. The "Resolution" field could not be imported successfully, and now all
> > existing issues are set to "Unresolved". Therefore, we will need to mark
> > the 111 items that are current closed as "Fixed", rather than
> "Unresolved".
> > 2. Likely due to #1, the "Resolved Date" field could not be imported
> > successfully.
> > 3. The "Components" field could not be imported successfully for existing
> > items. Therefore, we will need to re-tag items as necessary with the
> > appropriate components.
> >
> >
> > Regardless, we can now start engaging the community through the JIRA
> site!
> >
> >
> > - Mike
> > --
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> >
> > Sent from my iPhone
> >
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Cleanup SparkTC/systemml repository

2016-01-08 Thread Deron Eriksson
Just wanted to bring up a few points.

1) In addition to pom.xml project version, there is a SYSTEMML_VERSION in
_config.yml that is used for version number references in the
documentation. This is patterned after the Spark documentation (see
http://spark.apache.org/docs/1.5.2/). Notice version number included in the
Spark documentation.
2) This version will be tagged in git, right?
3) It would be nice to get snapshots of the documentation onto
http://systemml.apache.org/ for each version released (at least 0.9.0,
probably not 0.8.1). This should include analytics on the pages. Right now
the latest generated project docs get generated at
http://apache.github.io/incubator-systemml/ and do not include analytics.

It will be great to have a first Apache release! Luciano, please let me
know if I need to make any updates to http://systemml.apache.org/ for
downloads/releases.

Deron


On Fri, Jan 8, 2016 at 12:13 PM, Matthias Boehm  wrote:

> ok great - Luciano, could you please create the release? Thanks.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Frederick R Reiss---01/07/2016 02:39:17
> PM---+1 From: Luciano Resende ]Frederick R
> Reiss---01/07/2016 02:39:17 PM---+1 From: Luciano Resende <
> luckbr1...@gmail.com>
>
> From: Frederick R Reiss/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 01/07/2016 02:39 PM
> Subject: Re: Cleanup SparkTC/systemml repository
> --
>
>
>
> +1
>
>
> Luciano Resende ---01/06/2016 09:48:42 PM---On Wednesday, January 6, 2016,
> Matthias Boehm  wrote: > Actually, I'd like to pro
>
> From: Luciano Resende 
> To: "dev@systemml.incubator.apache.org"  >
> Date: 01/06/2016 09:48 PM
> Subject: Re: Cleanup SparkTC/systemml repository
> --
>
>
>
> On Wednesday, January 6, 2016, Matthias Boehm  wrote:
>
> > Actually, I'd like to propose to release 0.8.1 as a maintenance release
> > under our Apache github repo this week. From my perspective, it would be
> > the perfect time for that because meanwhile our Spark backend has been
> > stabilized.
> >
> > Regards,
> > Matthias
> >
> >
> >
> > [image: Inactive hide details for Deron Eriksson ---01/06/2016 05:05:52
> > PM---One other issue. Is it possible to download the SystemML 0]Deron
> > Eriksson ---01/06/2016 05:05:52 PM---One other issue. Is it possible to
> > download the SystemML 0.8.0 binary release from anywhere besides
> >
> > From: Deron Eriksson  > <*javascript:_e(%7B%7D,'cvml','deroneriks...@gmail.com')*;>>
> > To: dev@systemml.incubator.apache.org
> > <*javascript:_e(%7B%7D,'cvml','dev@systemml.incubator.apache.org')*;>
> > Date: 01/06/2016 05:05 PM
> > Subject: Re: Cleanup SparkTC/systemml repository
> >
>
> +1
>
>
> > --
> >
> >
> >
> > One other issue. Is it possible to download the SystemML 0.8.0 binary
> > release from anywhere besides
> > *https://github.com/SparkTC/systemml/releases?*
> <https://github.com/SparkTC/systemml/releases?>
> > If the repo is private, users can't download these binaries. So perhaps
> we
> > need to migrate the 0.8.0 binaries to the new site until 0.9.0 is ready?
> >
> >
> >
> >
> > On Wed, Jan 6, 2016 at 4:54 PM, Deron Eriksson  > <*javascript:_e(%7B%7D,'cvml','deroneriks...@gmail.com')*;>>
> > wrote:
> >
> > > Hi Luciano,
> > >
> > > Thank you for making the SparkTC/systemml repository private. This will
> > > definitely lead to less confusion.
> > >
> > > It turns out that there is information in some of the pull request
> > > conversations from the old repository that I would like to consult
> (with
> > > regards to the JIRA issues that I'm cleaning up because of the problems
> > > with the import to Apache JIRA). Could you please grant me access to
> see
> > > the private repository?
> > >
> > > Thanks!
> > > Deron
> > >
> > >
> > > On Sun, Jan 3, 2016 at 9:32 PM, Luciano Resende  > <*javascript:_e(%7B%7D,'cvml','luckbr1...@gmail.com')*;>>
> > > wrote:
> > >
> > >> On Sat, Jan 2, 2016 at 12:11 PM, Matthias Boehm  > <*javascript:_e(%7B%7D,'cvml','mbo...@us.ibm.com')*;>>
> > >> wrote:
> > >>
> > >> >
> > >> > Hi all,
> > >> >
> > >> > I'd like to delete our old SparkTC/systemml repo

Re: January 2016 SystemML Incubator Podling Report - Draft

2016-01-08 Thread Deron Eriksson
Hi Fred,

Thanks for checking. The report was submitted on time on the 6th to
https://wiki.apache.org/incubator/January2016. See
https://wiki.apache.org/incubator/January2016?action=info

Deron


On Fri, Jan 8, 2016 at 12:37 PM, Frederick R Reiss 
wrote:

> I also like the content below. Did we submit this report? I don't see
> anything on the Apache page at <
> http://incubator.apache.org/projects/systemml.html>. If we haven't
> submitted, we need to send in something ASAP.
>
> Fred
>
> [image: Inactive hide details for Mike Dusenberry ---01/06/2016 05:32:43
> PM---This looks good to me. -Mike]Mike Dusenberry ---01/06/2016 05:32:43
> PM---This looks good to me. -Mike
>
> From: Mike Dusenberry 
> To: dev@systemml.incubator.apache.org
> Date: 01/06/2016 05:32 PM
> Subject: Re: January 2016 SystemML Incubator Podling Report - Draft
> --
>
>
>
> This looks good to me.
>
> -Mike
>
> On Wed, Jan 6, 2016 at 1:36 PM Deron Eriksson 
> wrote:
>
> > Thank you Matthias. I have modified the report draft, as seen below.
> >
> > The report is due today, so if anyone has any other comments, please let
> me
> > know.
> >
> > Deron
> >
> > 
> > SystemML
> >
> > SystemML provides declarative large-scale machine learning (ML) that aims
> > at
> > flexible specification of ML algorithms and automatic generation of
> hybrid
> > runtime plans ranging from single node, in-memory computations, to
> > distributed computations such as Apache Hadoop MapReduce and Apache
> Spark.
> >
> > SystemML has been incubating since 2015-11-02.
> >
> > Three most important issues to address in the move towards graduation:
> >
> >   1. Grow SystemML community: increase mailing list activity,
> >  increase adoption of SystemML for scalable machine learning,
> encourage
> >  data scientists to adopt DML and PyDML algorithm scripts, respond to
> >  user feedback to ensure SystemML meets the requirements of
> real-world
> >  situations, write papers, and present talks about SystemML.
> >
> >   2. Core library improvements, including Apache Spark integration.
> >
> >   3. Produce a release
> >
> > Any issues that the Incubator PMC or ASF Board wish/need to be
> > aware of?
> >
> >   The community has been blocked by INFRA-10714 since November.
> >   Beginning Jan 6th, we are manually updating the missing fields in JIRA
> so
> >   that we can properly manage project issues and delegate issues to new
> >   users to grow our community.
> >
> > How has the community developed since the last report?
> >
> >   Users have asked several excellent questions on the dev list, and
> > existing
> >   committers are actively helping these users. The project is receiving
> > pull
> >   requests from contributors, committers have discussed these pull
> requests
> >   with their contributors, and contributions have been merged into the
> > project.
> >   Matthias Boehm has presented talks regarding the SystemML Optimizer at
> >   TU Dresden, HTW Dresden, and TU Berlin. Fred Reiss presented a talk on
> >   Dec 8th.
> >
> > How has the project developed since the last report?
> >
> >   Numerous core library improvements have been made to project.
> Additional
> >   documentation has been created to help new users. The test suite has
> been
> >   refactored to increase maintainability and performance. Contributions
> > have
> >   been made by non-IBM contributors. We are developing our 2016 roadmap,
> >   as seen in the "[DISCUSS] Project Roadmap" thread on the mailing list
> at
> >
> >
> >
> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00153.html
> > .
> >
> > Date of last release:
> >
> >   NONE
> >
> >
> > On Tue, Jan 5, 2016 at 5:03 PM, Matthias Boehm 
> wrote:
> >
> > > looks good; just a couple of potential additions:
> > >
> > > 1) Other talks (e.g., by Fred, Dec 8)
> > > 2) Developing our 2016 roadmap (incl link to dev list thread)
> > > 3) External (non-IBM) contributions.
> > >
> > > Regards,
> > > Matthias
> > >
> > > [image: Inactive hide details for Deron Eriksson ---01/05/2016 02:57:47
> > > PM---Hi, The January 2016 SystemML Incubator Podling Report is]Deron
> > > Eriksson ---01/05/2016 02:57:47 PM---Hi, The January 2016 SystemML
> > > Incubator Podling Report is due tomorrow (Jan
>

Re: January 2016 SystemML Incubator Podling Report - Draft

2016-01-08 Thread Deron Eriksson
Fred, your question brings up a question. Luciano, do you know if
http://incubator.apache.org/projects/systemml.html is a page that we
maintain or is handled by the people who manage Apache Incubator?

Deron


On Fri, Jan 8, 2016 at 12:55 PM, Deron Eriksson 
wrote:

> Hi Fred,
>
> Thanks for checking. The report was submitted on time on the 6th to
> https://wiki.apache.org/incubator/January2016. See
> https://wiki.apache.org/incubator/January2016?action=info
>
> Deron
>
>
> On Fri, Jan 8, 2016 at 12:37 PM, Frederick R Reiss 
> wrote:
>
>> I also like the content below. Did we submit this report? I don't see
>> anything on the Apache page at <
>> http://incubator.apache.org/projects/systemml.html>. If we haven't
>> submitted, we need to send in something ASAP.
>>
>> Fred
>>
>> [image: Inactive hide details for Mike Dusenberry ---01/06/2016 05:32:43
>> PM---This looks good to me. -Mike]Mike Dusenberry ---01/06/2016 05:32:43
>> PM---This looks good to me. -Mike
>>
>> From: Mike Dusenberry 
>> To: dev@systemml.incubator.apache.org
>> Date: 01/06/2016 05:32 PM
>> Subject: Re: January 2016 SystemML Incubator Podling Report - Draft
>> --
>>
>>
>>
>> This looks good to me.
>>
>> -Mike
>>
>> On Wed, Jan 6, 2016 at 1:36 PM Deron Eriksson 
>> wrote:
>>
>> > Thank you Matthias. I have modified the report draft, as seen below.
>> >
>> > The report is due today, so if anyone has any other comments, please
>> let me
>> > know.
>> >
>> > Deron
>> >
>> > 
>> > SystemML
>> >
>> > SystemML provides declarative large-scale machine learning (ML) that
>> aims
>> > at
>> > flexible specification of ML algorithms and automatic generation of
>> hybrid
>> > runtime plans ranging from single node, in-memory computations, to
>> > distributed computations such as Apache Hadoop MapReduce and Apache
>> Spark.
>> >
>> > SystemML has been incubating since 2015-11-02.
>> >
>> > Three most important issues to address in the move towards graduation:
>> >
>> >   1. Grow SystemML community: increase mailing list activity,
>> >  increase adoption of SystemML for scalable machine learning,
>> encourage
>> >  data scientists to adopt DML and PyDML algorithm scripts, respond
>> to
>> >  user feedback to ensure SystemML meets the requirements of
>> real-world
>> >  situations, write papers, and present talks about SystemML.
>> >
>> >   2. Core library improvements, including Apache Spark integration.
>> >
>> >   3. Produce a release
>> >
>> > Any issues that the Incubator PMC or ASF Board wish/need to be
>> > aware of?
>> >
>> >   The community has been blocked by INFRA-10714 since November.
>> >   Beginning Jan 6th, we are manually updating the missing fields in
>> JIRA so
>> >   that we can properly manage project issues and delegate issues to new
>> >   users to grow our community.
>> >
>> > How has the community developed since the last report?
>> >
>> >   Users have asked several excellent questions on the dev list, and
>> > existing
>> >   committers are actively helping these users. The project is receiving
>> > pull
>> >   requests from contributors, committers have discussed these pull
>> requests
>> >   with their contributors, and contributions have been merged into the
>> > project.
>> >   Matthias Boehm has presented talks regarding the SystemML Optimizer at
>> >   TU Dresden, HTW Dresden, and TU Berlin. Fred Reiss presented a talk on
>> >   Dec 8th.
>> >
>> > How has the project developed since the last report?
>> >
>> >   Numerous core library improvements have been made to project.
>> Additional
>> >   documentation has been created to help new users. The test suite has
>> been
>> >   refactored to increase maintainability and performance. Contributions
>> > have
>> >   been made by non-IBM contributors. We are developing our 2016 roadmap,
>> >   as seen in the "[DISCUSS] Project Roadmap" thread on the mailing list
>> at
>> >
>> >
>> >
>> https://www.mail-archive.com/dev@systemml.incubator.apache.org/msg00153.html
>> > .
>> >
>> > Date of last release:
>> >
>> >   NONE
>

Re: SystemML JIRA Site Is Live!

2016-01-08 Thread Deron Eriksson
I submitted a request for a notifications scheme. See status at
https://issues.apache.org/jira/browse/INFRA-11050

Deron


On Fri, Jan 8, 2016 at 12:36 PM, Luciano Resende 
wrote:

> On Fri, Jan 8, 2016 at 11:09 AM, Deron Eriksson 
> wrote:
>
> > Hi Luciano,
> >
> > Right now there is no notification scheme for the project. See
> >
> >
> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
> >
> > I don't see an option to create or modify a notification scheme.
> >
> > Do you happen to know if we need to submit a request to INFRA to do this?
> >
> >
> Yes, please.
>
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: SystemML JIRA Site Is Live!

2016-01-08 Thread Deron Eriksson
Apparently it is being handled by
https://issues.apache.org/jira/browse/INFRA-10714 and I did not see the
comment. I will request the notifications scheme in the 10714 comments.

Deron


On Fri, Jan 8, 2016 at 1:26 PM, Deron Eriksson 
wrote:

> I submitted a request for a notifications scheme. See status at
> https://issues.apache.org/jira/browse/INFRA-11050
>
> Deron
>
>
> On Fri, Jan 8, 2016 at 12:36 PM, Luciano Resende 
> wrote:
>
>> On Fri, Jan 8, 2016 at 11:09 AM, Deron Eriksson 
>> wrote:
>>
>> > Hi Luciano,
>> >
>> > Right now there is no notification scheme for the project. See
>> >
>> >
>> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
>> >
>> > I don't see an option to create or modify a notification scheme.
>> >
>> > Do you happen to know if we need to submit a request to INFRA to do
>> this?
>> >
>> >
>> Yes, please.
>>
>>
>>
>>
>> --
>> Luciano Resende
>> http://people.apache.org/~lresende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>


developerWorks links to old SystemML repo

2016-01-08 Thread Deron Eriksson
A few days ago, Matthias pointed out that the links on "developerWorks
Open" still point to the old project repo. I'd be happy to contact someone
at developerWorks about updating their links. Does anyone know who to
contact?

See: https://developer.ibm.com/open/systemml/

This kind of thing is important currently since developerWorks comes up #2
if I google search for "systemml" but http://systemml.apache.org/ comes up
#4.

Deron


Re: Starting a SystemML 0.9 release

2016-01-11 Thread Deron Eriksson
Hi Luciano,

In the JIRA comments, Matthias asked yesterday if I could commit a fix for
SYSTEMML-330 before the release. I am looking at it now. Since it needs to
be coded and tested, I probably don't think I would be ready today. Perhaps
a Tuesday or Wednesday release might allow some updates like this?

Also, any reason for 0.9 instead of 0.8.1?

Deron



On Mon, Jan 11, 2016 at 10:23 AM, Luciano Resende 
wrote:

> I would like to volunteer as RM for the next SystemML release, which I
> propose we call it 0.9.
>
>
> I was planning to start the process later today, unless there is any
> blocking issue that folks are working on and would like to get it in before
> the release.
>
> Let's use this thread to coordinate any issues needed for the release.
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Deron Eriksson
A notifications scheme was created today by the infrastructure team as part
of https://issues.apache.org/jira/browse/INFRA-10714. It can be seen at
https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications

Should we ask to remove all the "Single Email Address (
dev@systemml.incubator.apache.org)" entries?

Here was our notifications scheme prior to incubation.
The following events had notifications for All Watchers, Current Assignee,
and Reporter:
  Issue Created
  Issue Updated
  Issue Assigned
  Issue Resolved
  Issue Closed
  Issue Commented
  Issue Comment Edited
  Issue Reopened
  Issue Deleted
  Issue Moved
  Work Logged On Issue
  Work Started On Issue
  Work Stopped On Issue
  Issue Worklog Updated
  Issue Worklog Deleted
  Generic Event
The following event had no notifications:
  Issue Comment Deleted

Deron


On Tue, Jan 12, 2016 at 7:59 PM, Matthias Boehm  wrote:

> Could we please disable sending notifications for every JIRA update to our
> dev list? Thanks.
>
> Regards,
> Matthias
>
>
> [image: Inactive hide details for Deron Eriksson ---01/08/2016 01:31:48
> PM---Apparently it is being handled by https://issues.apache.or]Deron
> Eriksson ---01/08/2016 01:31:48 PM---Apparently it is being handled by
> https://issues.apache.org/jira/browse/INFRA-10714 and I did not se
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 01/08/2016 01:31 PM
> Subject: Re: SystemML JIRA Site Is Live!
> --
>
>
>
> Apparently it is being handled by
> https://issues.apache.org/jira/browse/INFRA-10714 and I did not see the
> comment. I will request the notifications scheme in the 10714 comments.
>
> Deron
>
>
> On Fri, Jan 8, 2016 at 1:26 PM, Deron Eriksson 
> wrote:
>
> > I submitted a request for a notifications scheme. See status at
> > https://issues.apache.org/jira/browse/INFRA-11050
> >
> > Deron
> >
> >
> > On Fri, Jan 8, 2016 at 12:36 PM, Luciano Resende 
> > wrote:
> >
> >> On Fri, Jan 8, 2016 at 11:09 AM, Deron Eriksson <
> deroneriks...@gmail.com>
> >> wrote:
> >>
> >> > Hi Luciano,
> >> >
> >> > Right now there is no notification scheme for the project. See
> >> >
> >> >
> >>
> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
> >> >
> >> > I don't see an option to create or modify a notification scheme.
> >> >
> >> > Do you happen to know if we need to submit a request to INFRA to do
> >> this?
> >> >
> >> >
> >> Yes, please.
> >>
> >>
> >>
> >>
> >> --
> >> Luciano Resende
> >> http://people.apache.org/~lresende
> >> http://twitter.com/lresende1975
> >> http://lresende.blogspot.com/
> >>
> >
> >
>
>
>


Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Deron Eriksson
I will go ahead and ask that the "Single Email Address (
dev@systemml.incubator.apache.org)" entries are removed from the
notifications scheme to try to get them removed as soon as possible. I'll
ask at https://issues.apache.org/jira/browse/INFRA-10714 . Please add a
comment if this is not OK.

Deron


On Tue, Jan 12, 2016 at 9:47 PM, Deron Eriksson 
wrote:

> A notifications scheme was created today by the infrastructure team as
> part of https://issues.apache.org/jira/browse/INFRA-10714. It can be seen
> at
> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
>
> Should we ask to remove all the "Single Email Address (
> dev@systemml.incubator.apache.org)" entries?
>
> Here was our notifications scheme prior to incubation.
> The following events had notifications for All Watchers, Current Assignee,
> and Reporter:
>   Issue Created
>   Issue Updated
>   Issue Assigned
>   Issue Resolved
>   Issue Closed
>   Issue Commented
>   Issue Comment Edited
>   Issue Reopened
>   Issue Deleted
>   Issue Moved
>   Work Logged On Issue
>   Work Started On Issue
>   Work Stopped On Issue
>   Issue Worklog Updated
>   Issue Worklog Deleted
>   Generic Event
> The following event had no notifications:
>   Issue Comment Deleted
>
> Deron
>
>
> On Tue, Jan 12, 2016 at 7:59 PM, Matthias Boehm  wrote:
>
>> Could we please disable sending notifications for every JIRA update to
>> our dev list? Thanks.
>>
>> Regards,
>> Matthias
>>
>>
>> [image: Inactive hide details for Deron Eriksson ---01/08/2016 01:31:48
>> PM---Apparently it is being handled by https://issues.apache.or]Deron
>> Eriksson ---01/08/2016 01:31:48 PM---Apparently it is being handled by
>> https://issues.apache.org/jira/browse/INFRA-10714 and I did not se
>>
>> From: Deron Eriksson 
>> To: dev@systemml.incubator.apache.org
>> Date: 01/08/2016 01:31 PM
>> Subject: Re: SystemML JIRA Site Is Live!
>> --
>>
>>
>>
>> Apparently it is being handled by
>> https://issues.apache.org/jira/browse/INFRA-10714 and I did not see the
>> comment. I will request the notifications scheme in the 10714 comments.
>>
>> Deron
>>
>>
>> On Fri, Jan 8, 2016 at 1:26 PM, Deron Eriksson 
>> wrote:
>>
>> > I submitted a request for a notifications scheme. See status at
>> > https://issues.apache.org/jira/browse/INFRA-11050
>> >
>> > Deron
>> >
>> >
>> > On Fri, Jan 8, 2016 at 12:36 PM, Luciano Resende 
>> > wrote:
>> >
>> >> On Fri, Jan 8, 2016 at 11:09 AM, Deron Eriksson <
>> deroneriks...@gmail.com>
>> >> wrote:
>> >>
>> >> > Hi Luciano,
>> >> >
>> >> > Right now there is no notification scheme for the project. See
>> >> >
>> >> >
>> >>
>> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
>> >> >
>> >> > I don't see an option to create or modify a notification scheme.
>> >> >
>> >> > Do you happen to know if we need to submit a request to INFRA to do
>> >> this?
>> >> >
>> >> >
>> >> Yes, please.
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Luciano Resende
>> >> http://people.apache.org/~lresende
>> >> http://twitter.com/lresende1975
>> >> http://lresende.blogspot.com/
>> >>
>> >
>> >
>>
>>
>>
>


Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Deron Eriksson
I would be in favor of not having these notifications going to the dev
list. I definitely want to be notified if I am watching an issue, or am a
reporter, or am the assignee, but I would prefer these to go to my personal
email rather than the dev list. I was very happy with the old scheme with
regards to the notifications. I posted the old scheme to the INFRA-10714
comments.

Sounds like there is a general consensus regarding the notifications. 15
minutes ago I added a comment on
https://issues.apache.org/jira/browse/INFRA-10714 to remove all the "Single
Email Address" entries, since I was concerned about too many notifications
going to the email list, and with time zone delays I thought it was a good
idea to get things rolling as soon as possible. We can add further comments
if they are required.

Deron



On Tue, Jan 12, 2016 at 10:04 PM, Matthias Boehm  wrote:

> sure, let's agree on the scheme. I'm strongly in favor of removing the dev
> list from this scheme (yes, that would affect all the "Single Email
> Address" entries) because it would create a lot of traffic and we don't
> want every description update persistent in our mail archive. Thanks.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Luciano Resende ---01/12/2016 09:55:16
> PM---Can we have consensus on what we want first ? Why notific]Luciano
> Resende ---01/12/2016 09:55:16 PM---Can we have consensus on what we want
> first ? Why notification updates are an issue ?
>
> From: Luciano Resende 
> To: dev@systemml.incubator.apache.org
> Date: 01/12/2016 09:55 PM
> Subject: Re: SystemML JIRA Site Is Live!
> --
>
>
>
> Can we have consensus on what we want first ?
>
> Why notification updates are an issue ?
> Should we disable or use a different list (e.g. issues@s.a.o) to receive
> these notes ?
> If we disable, how do you get to know there there is updates ?
>
> On Tue, Jan 12, 2016 at 9:50 PM, Deron Eriksson 
> wrote:
>
> > I will go ahead and ask that the "Single Email Address (
> > dev@systemml.incubator.apache.org)" entries are removed from the
> > notifications scheme to try to get them removed as soon as possible. I'll
> > ask at https://issues.apache.org/jira/browse/INFRA-10714 . Please add a
> > comment if this is not OK.
> >
> > Deron
> >
> >
> > On Tue, Jan 12, 2016 at 9:47 PM, Deron Eriksson  >
> > wrote:
> >
> > > A notifications scheme was created today by the infrastructure team as
> > > part of https://issues.apache.org/jira/browse/INFRA-10714. It can be
> > seen
> > > at
> > >
> >
> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
> > >
> > > Should we ask to remove all the "Single Email Address (
> > > dev@systemml.incubator.apache.org)" entries?
> > >
> > > Here was our notifications scheme prior to incubation.
> > > The following events had notifications for All Watchers, Current
> > Assignee,
> > > and Reporter:
> > >   Issue Created
> > >   Issue Updated
> > >   Issue Assigned
> > >   Issue Resolved
> > >   Issue Closed
> > >   Issue Commented
> > >   Issue Comment Edited
> > >   Issue Reopened
> > >   Issue Deleted
> > >   Issue Moved
> > >   Work Logged On Issue
> > >   Work Started On Issue
> > >   Work Stopped On Issue
> > >   Issue Worklog Updated
> > >   Issue Worklog Deleted
> > >   Generic Event
> > > The following event had no notifications:
> > >   Issue Comment Deleted
> > >
> > > Deron
> > >
> > >
> > > On Tue, Jan 12, 2016 at 7:59 PM, Matthias Boehm 
> > wrote:
> > >
> > >> Could we please disable sending notifications for every JIRA update to
> > >> our dev list? Thanks.
> > >>
> > >> Regards,
> > >> Matthias
> > >>
> > >>
> > >> [image: Inactive hide details for Deron Eriksson ---01/08/2016
> 01:31:48
> > >> PM---Apparently it is being handled by https://issues.apache.or]Deron
> > >> Eriksson ---01/08/2016 01:31:48 PM---Apparently it is being handled by
> > >> https://issues.apache.org/jira/browse/INFRA-10714 and I did not se
> > >>
> > >> From: Deron Eriksson 
> > >> To: dev@systemml.incubator.apache.org
> > >> Date: 01/08/2016 01:31 PM
> > >> Subject: Re: SystemML JIRA Site Is Live!
> > >> --
> > >>
> > >>
> >

Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Deron Eriksson
I guess my preferred course of action is to:
(1) Remove "Single Email Address" entries from scheme.
(2) Close 10714!
(3) In the future, if the notification scheme doesn't meet our needs, file
a new INFRA ticket to update notification scheme.



On Tue, Jan 12, 2016 at 10:22 PM, Deron Eriksson 
wrote:

> I would be in favor of not having these notifications going to the dev
> list. I definitely want to be notified if I am watching an issue, or am a
> reporter, or am the assignee, but I would prefer these to go to my personal
> email rather than the dev list. I was very happy with the old scheme with
> regards to the notifications. I posted the old scheme to the INFRA-10714
> comments.
>
> Sounds like there is a general consensus regarding the notifications. 15
> minutes ago I added a comment on
> https://issues.apache.org/jira/browse/INFRA-10714 to remove all the "Single
> Email Address" entries, since I was concerned about too many notifications
> going to the email list, and with time zone delays I thought it was a good
> idea to get things rolling as soon as possible. We can add further comments
> if they are required.
>
> Deron
>
>
>
> On Tue, Jan 12, 2016 at 10:04 PM, Matthias Boehm 
> wrote:
>
>> sure, let's agree on the scheme. I'm strongly in favor of removing the
>> dev list from this scheme (yes, that would affect all the "Single Email
>> Address" entries) because it would create a lot of traffic and we don't
>> want every description update persistent in our mail archive. Thanks.
>>
>> Regards,
>> Matthias
>>
>> [image: Inactive hide details for Luciano Resende ---01/12/2016 09:55:16
>> PM---Can we have consensus on what we want first ? Why notific]Luciano
>> Resende ---01/12/2016 09:55:16 PM---Can we have consensus on what we want
>> first ? Why notification updates are an issue ?
>>
>> From: Luciano Resende 
>> To: dev@systemml.incubator.apache.org
>> Date: 01/12/2016 09:55 PM
>> Subject: Re: SystemML JIRA Site Is Live!
>> --
>>
>>
>>
>> Can we have consensus on what we want first ?
>>
>> Why notification updates are an issue ?
>> Should we disable or use a different list (e.g. issues@s.a.o) to receive
>> these notes ?
>> If we disable, how do you get to know there there is updates ?
>>
>> On Tue, Jan 12, 2016 at 9:50 PM, Deron Eriksson 
>> wrote:
>>
>> > I will go ahead and ask that the "Single Email Address (
>> > dev@systemml.incubator.apache.org)" entries are removed from the
>> > notifications scheme to try to get them removed as soon as possible.
>> I'll
>> > ask at https://issues.apache.org/jira/browse/INFRA-10714 . Please add a
>> > comment if this is not OK.
>> >
>> > Deron
>> >
>> >
>> > On Tue, Jan 12, 2016 at 9:47 PM, Deron Eriksson <
>> deroneriks...@gmail.com>
>> > wrote:
>> >
>> > > A notifications scheme was created today by the infrastructure team as
>> > > part of https://issues.apache.org/jira/browse/INFRA-10714. It can be
>> > seen
>> > > at
>> > >
>> >
>> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
>> > >
>> > > Should we ask to remove all the "Single Email Address (
>> > > dev@systemml.incubator.apache.org)" entries?
>> > >
>> > > Here was our notifications scheme prior to incubation.
>> > > The following events had notifications for All Watchers, Current
>> > Assignee,
>> > > and Reporter:
>> > >   Issue Created
>> > >   Issue Updated
>> > >   Issue Assigned
>> > >   Issue Resolved
>> > >   Issue Closed
>> > >   Issue Commented
>> > >   Issue Comment Edited
>> > >   Issue Reopened
>> > >   Issue Deleted
>> > >   Issue Moved
>> > >   Work Logged On Issue
>> > >   Work Started On Issue
>> > >   Work Stopped On Issue
>> > >   Issue Worklog Updated
>> > >   Issue Worklog Deleted
>> > >   Generic Event
>> > > The following event had no notifications:
>> > >   Issue Comment Deleted
>> > >
>> > > Deron
>> > >
>> > >
>> > > On Tue, Jan 12, 2016 at 7:59 PM, Matthias Boehm 
>> > wrote:
>> > >
>> > >> Could we please disable sending notifications for every JIRA update
>> to
>> > >&g

Re: SystemML JIRA Site Is Live!

2016-01-12 Thread Deron Eriksson
Sorry for the spam. I just noticed that
https://issues.apache.org/jira/browse/INFRA-10714 has Resolution "Fixed"
and Status "Closed". I will add a comment to 10714 and then file a new JIRA
to remove the email notifications to the dev list.

Deron

On Tue, Jan 12, 2016 at 10:40 PM, Deron Eriksson 
wrote:

> I guess my preferred course of action is to:
> (1) Remove "Single Email Address" entries from scheme.
> (2) Close 10714!
> (3) In the future, if the notification scheme doesn't meet our needs, file
> a new INFRA ticket to update notification scheme.
>
>
>
> On Tue, Jan 12, 2016 at 10:22 PM, Deron Eriksson 
> wrote:
>
>> I would be in favor of not having these notifications going to the dev
>> list. I definitely want to be notified if I am watching an issue, or am a
>> reporter, or am the assignee, but I would prefer these to go to my personal
>> email rather than the dev list. I was very happy with the old scheme with
>> regards to the notifications. I posted the old scheme to the INFRA-10714
>> comments.
>>
>> Sounds like there is a general consensus regarding the notifications. 15
>> minutes ago I added a comment on
>> https://issues.apache.org/jira/browse/INFRA-10714 to remove all the "Single
>> Email Address" entries, since I was concerned about too many notifications
>> going to the email list, and with time zone delays I thought it was a good
>> idea to get things rolling as soon as possible. We can add further comments
>> if they are required.
>>
>> Deron
>>
>>
>>
>> On Tue, Jan 12, 2016 at 10:04 PM, Matthias Boehm 
>> wrote:
>>
>>> sure, let's agree on the scheme. I'm strongly in favor of removing the
>>> dev list from this scheme (yes, that would affect all the "Single Email
>>> Address" entries) because it would create a lot of traffic and we don't
>>> want every description update persistent in our mail archive. Thanks.
>>>
>>> Regards,
>>> Matthias
>>>
>>> [image: Inactive hide details for Luciano Resende ---01/12/2016 09:55:16
>>> PM---Can we have consensus on what we want first ? Why notific]Luciano
>>> Resende ---01/12/2016 09:55:16 PM---Can we have consensus on what we want
>>> first ? Why notification updates are an issue ?
>>>
>>> From: Luciano Resende 
>>> To: dev@systemml.incubator.apache.org
>>> Date: 01/12/2016 09:55 PM
>>> Subject: Re: SystemML JIRA Site Is Live!
>>> --
>>>
>>>
>>>
>>> Can we have consensus on what we want first ?
>>>
>>> Why notification updates are an issue ?
>>> Should we disable or use a different list (e.g. issues@s.a.o) to receive
>>> these notes ?
>>> If we disable, how do you get to know there there is updates ?
>>>
>>> On Tue, Jan 12, 2016 at 9:50 PM, Deron Eriksson >> >
>>> wrote:
>>>
>>> > I will go ahead and ask that the "Single Email Address (
>>> > dev@systemml.incubator.apache.org)" entries are removed from the
>>> > notifications scheme to try to get them removed as soon as possible.
>>> I'll
>>> > ask at https://issues.apache.org/jira/browse/INFRA-10714 . Please add
>>> a
>>> > comment if this is not OK.
>>> >
>>> > Deron
>>> >
>>> >
>>> > On Tue, Jan 12, 2016 at 9:47 PM, Deron Eriksson <
>>> deroneriks...@gmail.com>
>>> > wrote:
>>> >
>>> > > A notifications scheme was created today by the infrastructure team
>>> as
>>> > > part of https://issues.apache.org/jira/browse/INFRA-10714. It can be
>>> > seen
>>> > > at
>>> > >
>>> >
>>> https://issues.apache.org/jira/plugins/servlet/project-config/SYSTEMML/notifications
>>> > >
>>> > > Should we ask to remove all the "Single Email Address (
>>> > > dev@systemml.incubator.apache.org)" entries?
>>> > >
>>> > > Here was our notifications scheme prior to incubation.
>>> > > The following events had notifications for All Watchers, Current
>>> > Assignee,
>>> > > and Reporter:
>>> > >   Issue Created
>>> > >   Issue Updated
>>> > >   Issue Assigned
>>> > >   Issue Resolved
>>> > >   Issue Closed
>>> > >   Issue Commented
>>> > >   Issue Comment Edi

Re: SystemML JIRA Site Is Live!

2016-01-13 Thread Deron Eriksson
The "Single Email Address (dev@systemml.incubator.apache.org)" entries have
been removed from the notifications scheme. See
https://issues.apache.org/jira/browse/INFRA-11071

Deron


On Tue, Jan 12, 2016 at 10:52 PM, Deron Eriksson 
wrote:

> Sorry for the spam. I just noticed that
> https://issues.apache.org/jira/browse/INFRA-10714 has Resolution "Fixed"
> and Status "Closed". I will add a comment to 10714 and then file a new JIRA
> to remove the email notifications to the dev list.
>
> Deron
>
> On Tue, Jan 12, 2016 at 10:40 PM, Deron Eriksson 
> wrote:
>
>> I guess my preferred course of action is to:
>> (1) Remove "Single Email Address" entries from scheme.
>> (2) Close 10714!
>> (3) In the future, if the notification scheme doesn't meet our needs,
>> file a new INFRA ticket to update notification scheme.
>>
>>
>>
>> On Tue, Jan 12, 2016 at 10:22 PM, Deron Eriksson > > wrote:
>>
>>> I would be in favor of not having these notifications going to the dev
>>> list. I definitely want to be notified if I am watching an issue, or am a
>>> reporter, or am the assignee, but I would prefer these to go to my personal
>>> email rather than the dev list. I was very happy with the old scheme with
>>> regards to the notifications. I posted the old scheme to the INFRA-10714
>>> comments.
>>>
>>> Sounds like there is a general consensus regarding the notifications. 15
>>> minutes ago I added a comment on
>>> https://issues.apache.org/jira/browse/INFRA-10714 to remove all the "Single
>>> Email Address" entries, since I was concerned about too many notifications
>>> going to the email list, and with time zone delays I thought it was a good
>>> idea to get things rolling as soon as possible. We can add further comments
>>> if they are required.
>>>
>>> Deron
>>>
>>>
>>>
>>> On Tue, Jan 12, 2016 at 10:04 PM, Matthias Boehm 
>>> wrote:
>>>
>>>> sure, let's agree on the scheme. I'm strongly in favor of removing the
>>>> dev list from this scheme (yes, that would affect all the "Single Email
>>>> Address" entries) because it would create a lot of traffic and we don't
>>>> want every description update persistent in our mail archive. Thanks.
>>>>
>>>> Regards,
>>>> Matthias
>>>>
>>>> [image: Inactive hide details for Luciano Resende ---01/12/2016
>>>> 09:55:16 PM---Can we have consensus on what we want first ? Why 
>>>> notific]Luciano
>>>> Resende ---01/12/2016 09:55:16 PM---Can we have consensus on what we want
>>>> first ? Why notification updates are an issue ?
>>>>
>>>> From: Luciano Resende 
>>>> To: dev@systemml.incubator.apache.org
>>>> Date: 01/12/2016 09:55 PM
>>>> Subject: Re: SystemML JIRA Site Is Live!
>>>> --
>>>>
>>>>
>>>>
>>>> Can we have consensus on what we want first ?
>>>>
>>>> Why notification updates are an issue ?
>>>> Should we disable or use a different list (e.g. issues@s.a.o) to
>>>> receive
>>>> these notes ?
>>>> If we disable, how do you get to know there there is updates ?
>>>>
>>>> On Tue, Jan 12, 2016 at 9:50 PM, Deron Eriksson <
>>>> deroneriks...@gmail.com>
>>>> wrote:
>>>>
>>>> > I will go ahead and ask that the "Single Email Address (
>>>> > dev@systemml.incubator.apache.org)" entries are removed from the
>>>> > notifications scheme to try to get them removed as soon as possible.
>>>> I'll
>>>> > ask at https://issues.apache.org/jira/browse/INFRA-10714 . Please
>>>> add a
>>>> > comment if this is not OK.
>>>> >
>>>> > Deron
>>>> >
>>>> >
>>>> > On Tue, Jan 12, 2016 at 9:47 PM, Deron Eriksson <
>>>> deroneriks...@gmail.com>
>>>> > wrote:
>>>> >
>>>> > > A notifications scheme was created today by the infrastructure team
>>>> as
>>>> > > part of https://issues.apache.org/jira/browse/INFRA-10714. It can
>>>> be
>>>> > seen
>>>> > > at
>>>> > >
>>>> >
>>>> https://issues.apache.org/jira/plugins/servlet/project-co

Re: SystemML Daily Test Builds

2016-01-13 Thread Deron Eriksson
This is very useful. Thank you Alex!

Deron


On Wed, Jan 13, 2016 at 11:16 AM, Mike Dusenberry 
wrote:

> Hi all,
>
> Just FYI, Alex at the STC set up some daily test builds for us on Jenkins
> that will run every day at ~12 AM and ~12 PM.  The tests will run against
> the current master branch at that moment, and will list all commits added
> since the previous build.  Once everything looks okay, one possibility
> would be to have Jenkins email the dev list on any failures.  The builds
> can be found here: [
> https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/].
>
> Thanks to Alex!
>
>
> -Mike
> --
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>


Re: SystemML Daily Test Builds

2016-01-13 Thread Deron Eriksson
Emailing the dev list in case of build failures for this Jenkins job sounds
like a good idea to me.

Deron


On Wed, Jan 13, 2016 at 12:10 PM, Luciano Resende 
wrote:

> Should we enable notifications sent to dev list in case of failures ?
>
>
> On Wed, Jan 13, 2016 at 11:21 AM, Deron Eriksson 
> wrote:
>
> > This is very useful. Thank you Alex!
> >
> > Deron
> >
> >
> > On Wed, Jan 13, 2016 at 11:16 AM, Mike Dusenberry <
> dusenberr...@gmail.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > Just FYI, Alex at the STC set up some daily test builds for us on
> Jenkins
> > > that will run every day at ~12 AM and ~12 PM.  The tests will run
> against
> > > the current master branch at that moment, and will list all commits
> added
> > > since the previous build.  Once everything looks okay, one possibility
> > > would be to have Jenkins email the dev list on any failures.  The
> builds
> > > can be found here: [
> > > https://sparktc.ibmcloud.com/jenkins/job/SystemML-DailyTest/].
> > >
> > > Thanks to Alex!
> > >
> > >
> > > -Mike
> > > --
> > > Mike Dusenberry
> > > GitHub: github.com/dusenberrymw
> > > LinkedIn: linkedin.com/in/mikedusenberry
> > >
> > > Sent from my iPhone.
> > >
> >
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: SystemML JIRA Site Is Live!

2016-01-14 Thread Deron Eriksson
I prefer JIRAs going to their own mailing list rather than having them go
to the dev mailing list, at least at this stage.

Deron


On Thu, Jan 14, 2016 at 3:05 PM, Matthias Boehm  wrote:

> well, I would be fine with a separate mailing list although it's redundant
> to the jira feed.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Mike Dusenberry ---01/14/2016 02:51:17
> PM---I vote for creating a separate mailing list specifically]Mike
> Dusenberry ---01/14/2016 02:51:17 PM---I vote for creating a separate
> mailing list specifically for JIRA issues. That way, we won't lose tr
>
> From: Mike Dusenberry 
> To: dev@systemml.incubator.apache.org
> Date: 01/14/2016 02:51 PM
> Subject: Re: SystemML JIRA Site Is Live!
> --
>
>
>
> I vote for creating a separate mailing list specifically for JIRA issues.
> That way, we won't lose track of valuable conversations on the dev list
> amidst the JIRA issue flow.
>
> On Thu, Jan 14, 2016 at 2:37 PM Luciano Resende 
> wrote:
>
> > -1 for disabling it all
> >
> > With this, we get no notification weather a jira is created, etc. An
> > example is that I created SYSTEMML-462 and the list archives has no
> > indication of the jira created.
> >
> > We should get some indication of jira workflow on our mailing lists. If
> > people does not want to flood the dev list with this, I can create a
> issues
> > mailing list and folks can subscribe to that list as appropriate.
> >
> >
> >
> > On Wed, Jan 13, 2016 at 7:36 AM, Deron Eriksson  >
> > wrote:
> >
> > > The "Single Email Address (dev@systemml.incubator.apache.org)" entries
> > > have
> > > been removed from the notifications scheme. See
> > > https://issues.apache.org/jira/browse/INFRA-11071
> > >
> > > Deron
> > >
> > >
> > > On Tue, Jan 12, 2016 at 10:52 PM, Deron Eriksson <
> > deroneriks...@gmail.com>
> > > wrote:
> > >
> > > > Sorry for the spam. I just noticed that
> > > > https://issues.apache.org/jira/browse/INFRA-10714 has Resolution
> > "Fixed"
> > > > and Status "Closed". I will add a comment to 10714 and then file a
> new
> > > JIRA
> > > > to remove the email notifications to the dev list.
> > > >
> > > > Deron
> > > >
> > > > On Tue, Jan 12, 2016 at 10:40 PM, Deron Eriksson <
> > > deroneriks...@gmail.com>
> > > > wrote:
> > > >
> > > >> I guess my preferred course of action is to:
> > > >> (1) Remove "Single Email Address" entries from scheme.
> > > >> (2) Close 10714!
> > > >> (3) In the future, if the notification scheme doesn't meet our
> needs,
> > > >> file a new INFRA ticket to update notification scheme.
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Jan 12, 2016 at 10:22 PM, Deron Eriksson <
> > > deroneriks...@gmail.com
> > > >> > wrote:
> > > >>
> > > >>> I would be in favor of not having these notifications going to the
> > dev
> > > >>> list. I definitely want to be notified if I am watching an issue,
> or
> > > am a
> > > >>> reporter, or am the assignee, but I would prefer these to go to my
> > > personal
> > > >>> email rather than the dev list. I was very happy with the old
> scheme
> > > with
> > > >>> regards to the notifications. I posted the old scheme to the
> > > INFRA-10714
> > > >>> comments.
> > > >>>
> > > >>> Sounds like there is a general consensus regarding the
> notifications.
> > > 15
> > > >>> minutes ago I added a comment on
> > > >>> https://issues.apache.org/jira/browse/INFRA-10714 to remove all
> the
> > > "Single
> > > >>> Email Address" entries, since I was concerned about too many
> > > notifications
> > > >>> going to the email list, and with time zone delays I thought it
> was a
> > > good
> > > >>> idea to get things rolling as soon as possible. We can add further
> > > comments
> > > >>> if they are required.
> > > >>>
> > > >>> Deron
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Tue, Jan 12, 2016 

Re: Starting a SystemML 0.9 release

2016-01-15 Thread Deron Eriksson
Hi Luciano,

WRT docs, it would be nice to have a snapshot of the generated
documentation site (like http://apache.github.io/incubator-systemml/). This
allows a historical display of documentation similar to Spark (
http://spark.apache.org/documentation.html). If that is available online,
is it necessary to package the Algorithms Reference PDF or README with the
releases? Also, the docs/README.md describes generating the documentation
site from markdown using jekyll so I'm not sure if that would help anyone
if the *.md files aren't included in the release distributions. Or is this
a different README?

Deron


On Fri, Jan 15, 2016 at 12:00 PM, Matthias Boehm  wrote:

> great - thanks Niketan. From my perspective, we're also good to go.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Niketan Pansare---01/15/2016 11:56:55
> AM---Hi all, As FYI, I ran some performance experiments this we]Niketan
> Pansare---01/15/2016 11:56:55 AM---Hi all, As FYI, I ran some performance
> experiments this week and the release
>
> From: Niketan Pansare/Almaden/IBM@IBMUS
> To: dev@systemml.incubator.apache.org
> Date: 01/15/2016 11:56 AM
> Subject: Re: Starting a SystemML 0.9 release
> --
>
>
>
> Hi all,
>
> As FYI, I ran some performance experiments this week and the release
> SystemML 0.9 looks good to me :)
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> *http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar*
> 
>
> Mike Dusenberry ---01/14/2016 06:49:31 PM---The DML Language Reference
> would be great to have as well. Also, in general, I think we should only
>
> From: Mike Dusenberry 
> To: dev@systemml.incubator.apache.org
> Date: 01/14/2016 06:49 PM
> Subject: Re: Starting a SystemML 0.9 release
> --
>
>
>
> The DML Language Reference would be great to have as well.
>
> Also, in general, I think we should only have the Javadocs for end-user
> facing code, such MLContext, rather than for any deep internals that a user
> is not going to interact with.
> On Thu, Jan 14, 2016 at 6:26 PM Luciano Resende 
> wrote:
>
> > What should be the minimum documentation to add to the release
> distribution
> > ?
> >
> > Currently we have :
> > docs/README.txt
> > docs/SysteML_Algorithms_Reference.pdf
> >
> > I was planning to add the javadocs as well.
> >
> > But I still think we have much more available in trunk that we could
> add...
> >
> > Suggestions are welcome...
> >
> >
> > On Mon, Jan 11, 2016 at 11:44 AM, Luciano Resende 
> > wrote:
> >
> > > Also, for fixed jiras, I did the following query :
> > >
> > >
> > >
> >
> *https://issues.apache.org/jira/browse/SYSTEMML-376?jql=project%20%3D%20SYSTEMML%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20created%20%3E%3D%202015-10-27*
> 
> > >
> > > And was wondering if we could all move this to 0.9 release.
> > >
> > > Could someone please help me verify.
> > >
> > > Thanks
> > >
> > >
> > > On Mon, Jan 11, 2016 at 11:43 AM, Luciano Resende <
> luckbr1...@gmail.com>
> > > wrote:
> > >
> > >>
> > >>
> > >> On Mon, Jan 11, 2016 at 11:01 AM, Matthias Boehm 
> > >> wrote:
> > >>
> > >>> great - thanks everybody. Let's get these two fixes in and close the
> > >>> release. Until that point, please no new features. The version number
> > 0.9
> > >>> is fine with me since it's not really a pure maintenance release as
> > many
> > >>> new features went in too. Down the road, however, we need to think
> > about
> > >>> release branches.
> > >>>
> > >> We can create release branches now, or from the tag when we need a
> 0.9.1
> > >> for example. As we are not a large project with tens of prs coming
> very
> > >> quick, I would recommend to create the branch as needed for minor
> > releases.
> > >>
> > >>
> > >>
> > >> --
> > >> Luciano Resende
> > >> *http://people.apache.org/~lresende*
> 
> > >> *http://twitter.com/lresende1975* 
> > >> *http://lresende.blogspot.com/* 
> > >>
> > >
> > >
> > >
> > > --
> > > Luciano Resende
> > > *http://people.apache.org/~lresende*
> 
> > > *http://twitter.com/lresende1975* 
> > > *http://lresende.blogspot.com/* 
> > >
> >
> >
> >
> > --
> > Luciano Resende
> > *http://people.apache.org/~lresende*
> 
> > *http://twitter.com/lresende1975* 
> > *http://lresende.blogspot.com/* 
> >
> --
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my 

Re: Starting a SystemML 0.9 release

2016-01-15 Thread Deron Eriksson
If docs are included in the release distributions, it would be nice if they
were the generated (mostly) HTML documentation in docs/_site after running
jekyll. This would give users a complete set of documentation for the
release. This includes the aforementioned Algorithms Reference pdf and all
of the documentation currently available online.

Deron


On Fri, Jan 15, 2016 at 1:11 PM, Deron Eriksson 
wrote:

> Hi Luciano,
>
> WRT docs, it would be nice to have a snapshot of the generated
> documentation site (like http://apache.github.io/incubator-systemml/).
> This allows a historical display of documentation similar to Spark (
> http://spark.apache.org/documentation.html). If that is available online,
> is it necessary to package the Algorithms Reference PDF or README with the
> releases? Also, the docs/README.md describes generating the documentation
> site from markdown using jekyll so I'm not sure if that would help anyone
> if the *.md files aren't included in the release distributions. Or is this
> a different README?
>
> Deron
>
>
> On Fri, Jan 15, 2016 at 12:00 PM, Matthias Boehm 
> wrote:
>
>> great - thanks Niketan. From my perspective, we're also good to go.
>>
>> Regards,
>> Matthias
>>
>> [image: Inactive hide details for Niketan Pansare---01/15/2016 11:56:55
>> AM---Hi all, As FYI, I ran some performance experiments this we]Niketan
>> Pansare---01/15/2016 11:56:55 AM---Hi all, As FYI, I ran some performance
>> experiments this week and the release
>>
>> From: Niketan Pansare/Almaden/IBM@IBMUS
>> To: dev@systemml.incubator.apache.org
>> Date: 01/15/2016 11:56 AM
>> Subject: Re: Starting a SystemML 0.9 release
>> --
>>
>>
>>
>> Hi all,
>>
>> As FYI, I ran some performance experiments this week and the release
>> SystemML 0.9 looks good to me :)
>>
>> Thanks,
>>
>> Niketan Pansare
>> IBM Almaden Research Center
>> E-mail: npansar At us.ibm.com
>> *http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar*
>> <http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
>>
>> Mike Dusenberry ---01/14/2016 06:49:31 PM---The DML Language Reference
>> would be great to have as well. Also, in general, I think we should only
>>
>> From: Mike Dusenberry 
>> To: dev@systemml.incubator.apache.org
>> Date: 01/14/2016 06:49 PM
>> Subject: Re: Starting a SystemML 0.9 release
>> --
>>
>>
>>
>> The DML Language Reference would be great to have as well.
>>
>> Also, in general, I think we should only have the Javadocs for end-user
>> facing code, such MLContext, rather than for any deep internals that a
>> user
>> is not going to interact with.
>> On Thu, Jan 14, 2016 at 6:26 PM Luciano Resende 
>> wrote:
>>
>> > What should be the minimum documentation to add to the release
>> distribution
>> > ?
>> >
>> > Currently we have :
>> > docs/README.txt
>> > docs/SysteML_Algorithms_Reference.pdf
>> >
>> > I was planning to add the javadocs as well.
>> >
>> > But I still think we have much more available in trunk that we could
>> add...
>> >
>> > Suggestions are welcome...
>> >
>> >
>> > On Mon, Jan 11, 2016 at 11:44 AM, Luciano Resende > >
>> > wrote:
>> >
>> > > Also, for fixed jiras, I did the following query :
>> > >
>> > >
>> > >
>> >
>> *https://issues.apache.org/jira/browse/SYSTEMML-376?jql=project%20%3D%20SYSTEMML%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20created%20%3E%3D%202015-10-27*
>> <https://issues.apache.org/jira/browse/SYSTEMML-376?jql=project%20%3D%20SYSTEMML%20AND%20status%20in%20%28Resolved%2C%20Closed%29%20AND%20created%20%3E%3D%202015-10-27>
>> > >
>> > > And was wondering if we could all move this to 0.9 release.
>> > >
>> > > Could someone please help me verify.
>> > >
>> > > Thanks
>> > >
>> > >
>> > > On Mon, Jan 11, 2016 at 11:43 AM, Luciano Resende <
>> luckbr1...@gmail.com>
>> > > wrote:
>> > >
>> > >>
>> > >>
>> > >> On Mon, Jan 11, 2016 at 11:01 AM, Matthias Boehm 
>> > >> wrote:
>> > >>
>> > >>> great - thanks everybody. Let's get these two fixes in and close the
>> > >>> release. Until that point, please no new featur

Re: Starting a SystemML 0.9 release

2016-01-15 Thread Deron Eriksson
Regarding the documentation, if analytics_on in docs/_config.yml is set to
false when jekyll is run, then analytics is not added to the rendered HTML,
which would probably be a good idea for docs in a packaged release (if the
documentation site is packaged with the release).
Additionally, SYSTEMML_VERSION in docs/_config.yml sets the SystemML
version references in the rendered HTML, so it should be set to 0.9 when
docs are generated for a release.

Deron




On Fri, Jan 15, 2016 at 1:16 PM, Deron Eriksson 
wrote:

> If docs are included in the release distributions, it would be nice if
> they were the generated (mostly) HTML documentation in docs/_site after
> running jekyll. This would give users a complete set of documentation for
> the release. This includes the aforementioned Algorithms Reference pdf and
> all of the documentation currently available online.
>
> Deron
>
>
> On Fri, Jan 15, 2016 at 1:11 PM, Deron Eriksson 
> wrote:
>
>> Hi Luciano,
>>
>> WRT docs, it would be nice to have a snapshot of the generated
>> documentation site (like http://apache.github.io/incubator-systemml/).
>> This allows a historical display of documentation similar to Spark (
>> http://spark.apache.org/documentation.html). If that is available
>> online, is it necessary to package the Algorithms Reference PDF or README
>> with the releases? Also, the docs/README.md describes generating the
>> documentation site from markdown using jekyll so I'm not sure if that would
>> help anyone if the *.md files aren't included in the release distributions.
>> Or is this a different README?
>>
>> Deron
>>
>>
>> On Fri, Jan 15, 2016 at 12:00 PM, Matthias Boehm 
>> wrote:
>>
>>> great - thanks Niketan. From my perspective, we're also good to go.
>>>
>>> Regards,
>>> Matthias
>>>
>>> [image: Inactive hide details for Niketan Pansare---01/15/2016 11:56:55
>>> AM---Hi all, As FYI, I ran some performance experiments this we]Niketan
>>> Pansare---01/15/2016 11:56:55 AM---Hi all, As FYI, I ran some performance
>>> experiments this week and the release
>>>
>>> From: Niketan Pansare/Almaden/IBM@IBMUS
>>> To: dev@systemml.incubator.apache.org
>>> Date: 01/15/2016 11:56 AM
>>> Subject: Re: Starting a SystemML 0.9 release
>>> --
>>>
>>>
>>>
>>> Hi all,
>>>
>>> As FYI, I ran some performance experiments this week and the release
>>> SystemML 0.9 looks good to me :)
>>>
>>> Thanks,
>>>
>>> Niketan Pansare
>>> IBM Almaden Research Center
>>> E-mail: npansar At us.ibm.com
>>> *http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar*
>>> <http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar>
>>>
>>> Mike Dusenberry ---01/14/2016 06:49:31 PM---The DML Language Reference
>>> would be great to have as well. Also, in general, I think we should only
>>>
>>> From: Mike Dusenberry 
>>> To: dev@systemml.incubator.apache.org
>>> Date: 01/14/2016 06:49 PM
>>> Subject: Re: Starting a SystemML 0.9 release
>>> --
>>>
>>>
>>>
>>> The DML Language Reference would be great to have as well.
>>>
>>> Also, in general, I think we should only have the Javadocs for end-user
>>> facing code, such MLContext, rather than for any deep internals that a
>>> user
>>> is not going to interact with.
>>> On Thu, Jan 14, 2016 at 6:26 PM Luciano Resende 
>>> wrote:
>>>
>>> > What should be the minimum documentation to add to the release
>>> distribution
>>> > ?
>>> >
>>> > Currently we have :
>>> > docs/README.txt
>>> > docs/SysteML_Algorithms_Reference.pdf
>>> >
>>> > I was planning to add the javadocs as well.
>>> >
>>> > But I still think we have much more available in trunk that we could
>>> add...
>>> >
>>> > Suggestions are welcome...
>>> >
>>> >
>>> > On Mon, Jan 11, 2016 at 11:44 AM, Luciano Resende <
>>> luckbr1...@gmail.com>
>>> > wrote:
>>> >
>>> > > Also, for fixed jiras, I did the following query :
>>> > >
>>> > >
>>> > >
>>> >
>>> *https://issues.apache.org/jira/browse/SYSTEMML-376?jql=project%20%3D%20SYSTEMML%20AND%20status%20in%20%28Resolved%2C%20Close

Workflow for assigning issues to users

2016-01-18 Thread Deron Eriksson
What is our workflow for assigning JIRA issues to users?

For instance, Nakul Jindal asked in the comments to work on
https://issues.apache.org/jira/browse/SYSTEMML-456. What's our workflow?

For instance, one possible workflow:

1) User asks to work on issue.
2) If new user, we add user as project contributor (Contributor role).
3) We set user as Assignee for the issue.
4) User submits pull request.
5) Resolve and close issue.

It would be good to settle on such a workflow now that our JIRA server is
up to avoid confusion.

Luciano, is there an Apache standard that we should be following?

Deron


Re: Workflow for assigning issues to users

2016-01-20 Thread Deron Eriksson
Sounds good. We can follow that workflow and modify it as we need to.

I added Nakul as a contributor and assigned the JIRA to him.

Deron

On Wed, Jan 20, 2016 at 5:28 AM, Mike Dusenberry 
wrote:

> I vote yes, let's use Deron's suggestion. It's simple, and assigning a JIRA
> issue when the user requests to start working on it avoids the issue of
> duplicated work.
>
> - Mike
> On Wed, Jan 20, 2016 at 1:49 AM Frederick R Reiss 
> wrote:
>
> > Shall we use Deron's suggestion below as our process for the time being?
> >
> > Fred
> >
> > [image: Inactive hide details for Luciano Resende ---01/18/2016 03:08:58
> > PM---On Mon, Jan 18, 2016 at 1:54 PM, Deron Eriksson  > Resende ---01/18/2016 03:08:58 PM---On Mon, Jan 18, 2016 at 1:54 PM,
> Deron
> > Eriksson  wrote:
> >
> > From: Luciano Resende 
> > To: dev@systemml.incubator.apache.org
> > Date: 01/18/2016 03:08 PM
> > Subject: Re: Workflow for assigning issues to users
> > --
> >
> >
> >
> >
> > On Mon, Jan 18, 2016 at 1:54 PM, Deron Eriksson  >
> > wrote:
> >
> > > What is our workflow for assigning JIRA issues to users?
> > >
> > > For instance, Nakul Jindal asked in the comments to work on
> > > https://issues.apache.org/jira/browse/SYSTEMML-456. What's our
> workflow?
> > >
> > > For instance, one possible workflow:
> > >
> > > 1) User asks to work on issue.
> > > 2) If new user, we add user as project contributor (Contributor role).
> > > 3) We set user as Assignee for the issue.
> > > 4) User submits pull request.
> > > 5) Resolve and close issue.
> > >
> > > It would be good to settle on such a workflow now that our JIRA server
> is
> > > up to avoid confusion.
> > >
> > > Luciano, is there an Apache standard that we should be following?
> > >
> > > Deron
> > >
> >
> >
> > There is no Apache standard, each project can discuss their preference.
> >
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
> >
> > --
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>


Release source package

2016-01-20 Thread Deron Eriksson
I have a question (mostly for Luciano).

I'm examining the release candidate files to try to verify that our
candidate files are compliant with the Apache release management policy (
http://incubator.apache.org/guides/releasemanagement.html).

I see under the Release Check List (
http://incubator.apache.org/guides/releasemanagement.html#check-list), the
following:

"3.6 Release consists of source code only, no binaries.
Each Apache release must contain a source package. This package may not
contain compiled components (such as "jar" files) because compiled
components are not open source, even if they were built from open source."

Is this referring to -sources.jar or -src.tar.gz/-src.zip?
In system-ml-0.9.0-incubating-src.tar.gz, there
is src/test/config/hadoop_bin_windows/bin/winutils.exe. Does this need to
be removed? Or is the "source package" referring to the -sources.jar? Since
it says each release must contain a source package, and -sources.jar is a
source package, I assume we are fine. Is that correct?

Deron


SystemML Hadoop version support

2016-01-20 Thread Deron Eriksson
Does SystemML still support Hadoop 1?

I see in MRConfigurationNames a static initialization block for Hadoop
v1/v2 properties. I'd like to remove the current deprecated property
warnings (log messages in the console) that we get with Hadoop v2.4.1 and I
was wondering if I should add the old v1 and new v2 versions to
MRConfigurationNames or if I can just add the new v2 versions and the v1
versions can be removed.

If we still support Hadoop v1, we might want to update the pom with
profiles for Hadoop 1 and 2.

Deron


Re: [VOTE] Release SystemML 0.9-incubating (RC1)

2016-01-20 Thread Deron Eriksson
Hi,

With regards to #1, we have an existing JIRA for this
https://issues.apache.org/jira/browse/SYSTEMML-134 . I'll update this one.

A few other things I've noticed:

1) docs/README.txt in standalone distributions (and the other *.tar.gz and
*.zip) have references to BIGINSIGHTS_HOME. This is same file as
docs/Language Reference/README.txt.

2) No DISCLAIMER in standalone tar.gz and zip. I believe this is a 'should'
and not a 'must'.

3) In the directory of packages, the .asc files have .asc.asc files.

4) Since we have a -sources.jar, we might want to add a -javadoc.jar since
maven easily automates it.

Deron






On Wed, Jan 20, 2016 at 1:36 PM, Mike Dusenberry 
wrote:

> Upon further review, I've found multiple issues with the "standalone"
> package that should block this release candidate.
>
>
>
>
> 1.  The "readme.txt" in the "standalone" distribution refers to a
> "jSystemML.jar" file that will be generated for the user to allow the use
> of "java -jar jSystemML.jar".  However, we do not package a single jar, and
> instead provide a "lib" folder with all dependency jars and a base SystemML
> jar named "system-ml-0.9.0-incubating.jar".  Thus, a longer Java invocation
> using the lib classpath is needed, or the provided
> "runStandaloneSystemML.sh" script is needed, with the later being the only
> obvious route for the user.  This leads us to item 2.
>
> 2.  The provided "runStandaloneSystemML.sh" is unusable due to containing
> only Windows-style carriage returns, rather than the correct Unix newline
> characters.  Thus, the script has to be edited to remove the former
> Windows-style characters and replace them with the later Unix styled ones
> before the script will run.  This will affect any Unix derivative, such as
> Linux and OS X.  Additionally, the inclusion of Windows-styled carriage
> returns in our codebase is a huge concern, aside from Windows-specific
> files such as ".bat" or ".exe" files.  I'll open a separate thread to
> discuss this, but I believe we can fix it with the addition of a Git config
> setting in the project that forces Unix-style newlines upon checkin, and
> still allows for OS-specific characters on local machines.
>
>
>
>
> Overall, the "standalone" package is broken in this release candidate, so
> we should fix that.  In general, we need to clean up the execution paths of
> SystemML anyway, but for now I say we quickly fix this issue, and then work
> on a real solution (such as the single "SystemML.jar" for all execution
> types/modes that was discussed on one of the recent PRs) for the next
> release.
>
>
>
>
>
> - Mike
>
>
>
>
> --
>
>
>
> Michael W. Dusenberry
> LinkedIn: linkedin.com/in/mikedusenberry
>
>
> GitHub: github.com/dusenberrymw
>
> On Wed, Jan 20, 2016 at 10:44 AM, Niketan Pansare 
> wrote:
>
> > +1
> > Niketan Pansare
> > IBM Almaden Research Center
> > E-mail: npansar At us.ibm.com
> > http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
> > From: dusenberr...@gmail.com
> > To:   dev@systemml.incubator.apache.org
> > Date: 01/20/2016 05:30 AM
> > Subject:  Re: [VOTE] Release SystemML 0.9-incubating (RC1)
> > +1
> > --
> > Mike Dusenberry
> > GitHub: github.com/dusenberrymw
> > LinkedIn: linkedin.com/in/mikedusenberry
> > Sent from my iPhone.
> >> On Jan 20, 2016, at 3:39 AM, Frederick R Reiss 
> > wrote:
> >>
> >>
> >> +1
> >>
> >> Sent from my iPhone
> >>
>  On Jan 20, 2016, at 11:06 AM, Shirish Tatikonda
> >>>  wrote:
> >>>
> >>> +1
> >>>
> >>>
> >>>
> >>> On Tue, Jan 19, 2016 at 9:46 PM, Luciano Resende  >
> >>> wrote:
> >>>
>  Please vote on releasing the following candidate as Apache SystemML
> >> version
>  0.9.0!
> 
>  The vote is open for at least 72 hours and will close on Saturday,
> >> January
>  23 and passes if a majority of at least 3 +1 PMC votes are cast.
> 
>  [ ] +1 Release this package as Apache SystemML 0.9.0
>  [ ] -1 Do not release this package because ...
> 
>  To learn more about Apache SystemML, please see
>  http://systemml.apache.org/
> 
>  The tag to be voted on is v0.9.0-rc1
>  (3e7e5cf6ca697ec247a7dc4e005a7f7b1cb18856)
> >>
> >
> https://github.com/apache/incubator-systemml/tree/3e7e5cf6ca697ec247a7dc4e005a7f7b1cb18856
> >>
> 
>  The release files, including signatures, digests, etc. can be found
> at:
> >>
> >
> https://repository.apache.org/content/repositories/orgapachesystemml-1001/
> 
> 
>  =
>  == Apache Incubator release policy ==
>  =
>  Please find below the guide to release management during incubation:
>  http://incubator.apache.org/guides/releasemanagement.html
> 
>  ===
>  == How can I help test this release? ==
>  ===
>  If you are a SystemML user, you can help us test this release by
> taking
> >> an
>  existing Algorithm or w

Re: Future Release Package Naming & Structure

2016-01-27 Thread Deron Eriksson
Hi,

I would like a solution that is easy for the end user to understand.

Since the 'cluster/distrib' package seems to contain basically a subset of
the files in the 'standalone' package (standalone has the lib directory of
jars with the systemml jar sitting in lib rather than the parent directory,
it has the 'runStandalone' scripts, it has a few more dml scripts, etc), it
would seem to me that we could get rid of the 'cluster/distrib' package and
just have the 'standalone' package, but remove the '-standalone' naming.
This would allow an end user to just download the single built .tar.gz/.zip
without having to choose which .tar.gz/.zip to download. A README inside
could explain how to use SystemML via standalone mode (currently via
runStandalone scripts using lib dir's contents) or Hadoop batch or Spark
batch. Eventually, if possible, it would be nice if all these options
(standalone, hadoop batch, spark batch) could be run via the bin/systemml
sh and bat scripts (something like "bin/systemml -standalone -f
myalgorithm.dml" when prototyping and something like
"bin/systemml -sparkcluster -f myalgorithm.dml" when distributing on a
Spark cluster).

I would favor leaving the systemml jar file itself alone.

So, in summary,
(1) I like the idea of getting rid of 'cluster/distrib' package and
removing '-standalone' naming of other package. Add README explaining how
to use the remaining package for standalone, hadoop batch, and spark batch.
(2) If possible, see if bin/systemml scripts can be modified to allow
execution of standalone, hadoop batch, and spark batch modes via
bin/systemml so that the user can go to one single place to execute
SystemML (both for prototyping locally and scaling the algorithm execution
on a cluster).
(3) Don't alter the systemml jar file.

Anyone else have thoughts?

Deron



On Mon, Jan 25, 2016 at 10:49 AM,  wrote:

> Hi all,
>
> A discussion regarding the release package structure started on pull
> request 54 [https://github.com/apache/incubator-systemml/pull/54].
> Currently, we have a "distributed" release for running SystemML on a
> cluster* using Spark or Hadoop, as well as a "standalone" release for
> running SystemML on a single node with Java (no Spark or Hadoop
> installation necessary).  Given this, two questions were raised during the
> discussion:
>
>   1. Should we name our releases as "*-cluster" and "*-standalone", or
> just distinguish the standalone version as "*" and "*-standalone"?
>   2. Should we maintain the two separate releases ("distributed" and
> "standalone"), or should we move to have one single release with one JAR
> that works in all environments and execution modes?
>
> The consensus was that there are pros and cons for each option, and that
> this discussion would be more appropriate for the mailing list.
>
> Thoughts?
>
> Thanks,
> - Mike
>
> * Yes, SystemML can still be run in single node execution mode even on
> Spark or Hadoop.
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>


Re: [VOTE] Release SystemML 0.9.0-incubating (RC3)

2016-01-29 Thread Deron Eriksson
+1

Looks great! All the issues from SYSTEMML-463 appear to be fixed.

During my inspection I noticed that there might be a little redundancy with
a couple of the libraries (see META-INF/DEPENDENCIES in the jar and compare
to contents of the standalone lib directory). Later on, we might want to
modify the pom.xml and/or assembly plugin code to eliminate any duplication
(perhaps remove antlr4-annotations, antlr4-runtime, wink-json4j from the
lib directory).

Also, locally I saw some test failures when I ran 'mvn verify' locally on
the src package (probably a local configuration issue), but I ran Jenkins
on the apache branch-0.9 and all tests passed.

Great work everyone!

Deron


On Fri, Jan 29, 2016 at 11:08 AM, Glenn Weidner  wrote:

> Successfully executed commands from Quick Start guide (
> http://apache.github.io/incubator-systemml/quick-start-guide.html) using
> systemml-0.9.0-incubating-standalone.zip unzipped on local machine (Windows
> 7 with 64-bit JDK's 1.7 and 1.8) as shown in attached console results.
> *(See attached file: test_quick_start_windows_results.txt)*
>
> runStandaloneSystemML.bat scripts/algorithms/Univar-Stats.dml -nvargs
> X=data/haberman.data TYPES=data/types.csv STATS=data/univarOut.mtx
> runStandaloneSystemML.bat scripts/utils/sample.dml -nvargs
> X=data/haberman.data sv=data/perc.csv O=data/haberman.part ofmt="csv"
> runStandaloneSystemML.bat scripts/utils/splitXY.dml -nvargs
> X=data/haberman.part/1 y=4 OX=data/haberman.train.data.csv
> OY=data/haberman.train.labels.csv ofmt="csv"
> runStandaloneSystemML.bat scripts/utils/splitXY.dml -nvargs
> X=data/haberman.part/2 y=4 OX=data/haberman.test.data.csv
>  OY=data/haberman.test.labels.csv  ofmt="csv"
> runStandaloneSystemML.bat scripts/algorithms/l2-svm.dml -nvargs
> X=data/haberman.train.data.csv Y=data/haberman.train.labels.csv
> model=data/l2-svm-model.csv fmt="csv" Log=data/l2-svm-log.csv
> runStandaloneSystemML.bat scripts/algorithms/l2-svm-predict.dml -nvargs
> X=data/haberman.test.data.csv Y=data/haberman.test.labels.csv
> model=data/l2-svm-model.csv fmt="csv" confusion=data/l2-svm-confusion.csv
>
> --Glenn
>
> [image: Inactive hide details for Mike Dusenberry ---01/29/2016 10:42:14
> AM---+1 Unarchived and executed basic scripts with all distrib]Mike
> Dusenberry ---01/29/2016 10:42:14 AM---+1 Unarchived and executed basic
> scripts with all distributions on a local
>
> From: Mike Dusenberry 
> To: "dev@systemml.incubator.apache.org"  >
> Date: 01/29/2016 10:42 AM
> Subject: Re: [VOTE] Release SystemML 0.9.0-incubating (RC3)
> --
>
>
>
> +1
>
> Unarchived and executed basic scripts with all distributions on a local
> machine (OS X 10.10.5, JDK  1.8.0_66, Spark 1.6), and a cluster (CentOS 7,
> 1 + 10 nodes, 256 GB RAM each, Hadoop + YARN + HDFS, Spark 1.4, Spark 1.6).
>
> - Mike
>
> --
>
> Michael W. Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> On Jan 28, 2016, at 5:26 PM, Luciano Resende  wrote:
>
> Please vote on releasing the following candidate as Apache SystemML version
> 0.9.0!
>
> The vote is open for at least 72 hours and passes if a majority of at least
> 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache SystemML 0.9.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache SystemML, please see
> http://systemml.apache.org/
>
> The tag to be voted on is v0.9.0-rc3
> (49528085a9b2ea0babade040db821c8158a57ab5)
>
> https://github.com/apache/incubator-systemml/tree/
> 49528085a9b2ea0babade040db821c8158a57ab5
>
> The release files, including signatures, digests, etc. can be found at:
>
> https://repository.apache.org/content/repositories/orgapachesystemml-1003/
>
> The distribution is also available at:
>
> http://people.apache.org/~lresende/systemml/0.9.0-rc3/
>
> =
> == Apache Incubator release policy ==
> =
> Please find below the guide to release management during incubation:
> http://incubator.apache.org/guides/releasemanagement.html
>
> ===
> == How can I help test this release? ==
> ===
> If you are a SystemML user, you can help us test this release by taking an
> existing Algorithm or workload and running on this release candidate, then
> reporting any regressions.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> -1 votes should only occur for significant stop-ship bugs or legal related
> issues (e.g. wrong license, missing header files, etc). Minor bugs or
> regressions should not block this release.
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>
>
>


Re: [RESULT] [VOTE] Release SystemML 0.9.0-incubating (RC3)

2016-02-01 Thread Deron Eriksson
Great! Thank you Luciano.

On Mon, Feb 1, 2016 at 2:47 PM, Luciano Resende 
wrote:

> Vote passed with 5 +1 from Michael W. Dusenberry, Deron Eriksson, Matthias
> Boehm, Frederick Reiss, Luciano Resende and a non-binding +1 from Glenn
> Weidner.
>
> I'll proceed with the IPMC vote now.
>
> On Thu, Jan 28, 2016 at 5:26 PM, Luciano Resende 
> wrote:
>
> > Please vote on releasing the following candidate as Apache SystemML
> > version 0.9.0!
> >
> > The vote is open for at least 72 hours and passes if a majority of at
> > least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache SystemML 0.9.0
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache SystemML, please see
> > http://systemml.apache.org/
> >
> > The tag to be voted on is v0.9.0-rc3
> > (49528085a9b2ea0babade040db821c8158a57ab5)
> >
> > https://github.com/apache/incubator-systemml/tree/
> > 49528085a9b2ea0babade040db821c8158a57ab5
> >
> > The release files, including signatures, digests, etc. can be found at:
> >
> >
> https://repository.apache.org/content/repositories/orgapachesystemml-1003/
> >
> > The distribution is also available at:
> >
> > http://people.apache.org/~lresende/systemml/0.9.0-rc3/
> >
> > =
> > == Apache Incubator release policy ==
> > =
> > Please find below the guide to release management during incubation:
> > http://incubator.apache.org/guides/releasemanagement.html
> >
> > ===
> > == How can I help test this release? ==
> > ===
> > If you are a SystemML user, you can help us test this release by taking
> > an existing Algorithm or workload and running on this release candidate,
> > then reporting any regressions.
> >
> > 
> > == What justifies a -1 vote for this release? ==
> > 
> > -1 votes should only occur for significant stop-ship bugs or legal
> > related issues (e.g. wrong license, missing header files, etc). Minor
> bugs
> > or regressions should not block this release.
> >
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


February 2016 SystemML Incubator Podling Report - Draft

2016-02-02 Thread Deron Eriksson
Hi,

The SystemML podling report for February is due tomorrow. Please let me
know if any additions or changes need to be made to the following draft.

Deron



SystemML

SystemML provides declarative large-scale machine learning (ML) that aims at
flexible specification of ML algorithms and automatic generation of hybrid
runtime plans ranging from single node, in-memory computations, to
distributed computations running on Apache Hadoop MapReduce and Apache
Spark.

SystemML has been incubating since 2015-11-02.

Three most important issues to address in the move towards graduation:

  1. Grow SystemML community: increase mailing list activity,
 increase adoption of SystemML for scalable machine learning, encourage
 data scientists to adopt DML and PyDML algorithm scripts, respond to
 user feedback to ensure SystemML meets the requirements of real-world
 situations, write papers, and present talks about SystemML.
  2. Core library improvements, including Apache Spark integration.
  3. Produce a release.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

  Our JIRA site is up and operational.

How has the community developed since the last report?

  Our mailing list in January had 123 messages (7 of which were Jenkins). We
  have had several useful discussions on the mailing list concerning various
  aspects of SystemML. A data scientist has created JIRAs regarding
usability
  issues and we are addressing these issues. We have had several useful
  additional discussions on our JIRA site and in our Pull Request
  conversations.

How has the project developed since the last report?

  Numerous additions have been made to the project, including core
  functionality, usability improvements, and documentation. The project has
had
  96 commits since January 6. In the last 30 days, 126 new issues have been
  reported on our JIRA site and 68 issues have been resolved. We are in the
  process of producing our first Apache release, version 0.9.0.
  RC3 passed the SystemML PMC vote and is currently being voted on by the
IPMC.


Date of last release:

  NONE

When were the last committers or PMC members elected?

  NONE


Re: User friendly output of univariate statistics

2016-02-02 Thread Deron Eriksson
Hi Ethan,

I think you make a great point with regards to the readability of the
output from Univar-Stats.dml.

Do you think outputting the user-friendly results in the format you
describe to the console while still writing the more mathematical results
to a file would be the type of behavior that you would find most useful? Or
would you also like to see the user-friendly results also sent to a file?

Also, I was wondering, do you think a single user-friendly format is
sufficient, or do you think that data scientists would like (or expect) to
be able to have multiple formats such as you described?

The table format is very interesting. Currently DML has a basic print
statement, but I don't believe it can be used to format data into columns,
such as in your table format example. It might be very nice to add a
c-style "printf" statement, which would allow results to be written to the
console in a more columnar format.

Does anyone else have any thoughts?

Deron


On Tue, Feb 2, 2016 at 8:32 AM, Ethan Xu  wrote:

> dml is quite amazing. I was wondering if there is a user friendly (more
> human readable) version of outputs from Univar-Stats.dml? I ran the
> Univar-Stats.dml on my data set that contains 7 variables: two continuous,
> one categorical. The output is a csv file on HDFS that looks like this:
>
> 1 1 10.0
> 2 1 123.0
> 2 7 469.0
> 3 1 122.0
> 3 7 419.0
> 4 1 34.852512104922082
> 4 7 0.40786451178676335
> 5 1 613.6600902369631
> 5 7 1.5322171660886
> 6 1 25.566777079580508
> 6 7 5.54382044429201915
> 7 1 0.219263232610989764
> 7 7 12.14558700418414E-4
> 8 1 0.5323447433694138
> 8 7 1.23151883029726626
> 9 1 0.28352047550156284
> 9 7 23.25049533659206
> 10 1 -0.5348573740280274
> 10 7 2023.294658877635
> 11 1 2.874872545380876E-4
> 11 7 1.874872545380876E-4
> 12 1 6.0017749742760714085
> 12 7 0.00237749742760714085
> 13 1 12.0
> 14 1 30.56066514110724
> 15 2 4.0
>  truncated (numbers randomly modified)
>
> According to the documentation on
>
> http://apache.github.io/incubator-systemml/algorithms-descriptive-statistics.html#univariate-statistics
> , the first column of the matrix represents statistics type (minimum,
> mean, etc.), the second column represents variable ID and the last column
> gives the statistics value.
>
> While the documentation is very clear and the results are consistent with
> outputs of other software like R, I found the format a bit inconvenient
> since I have to refer to the reference Table (table 1 in aforementioned
> link) to understand the summary statistics.
>
> I understand that the pure numeric matrix format is easy to use as machine
> input for future steps. An additional table that is more human readable
> would be nice since the main purpose of uni-variate statistics is often
> exploratory data analysis and a clear summary is essential.
>
> Suggestions to consider in the readable summary if there's not already
> one:
> 1. Order the rows according to variables (column 2) instead of statistics
> type (column 1), so that summary statistics of the same variable are
> grouped together.
> 2. Use actual statistics labels ("min", "mean", "skewness" etc) instead of
> IDs (1, 2, etc).
> 3. Use actual predictor labels ("age", "gender", etc) instead of IDs (1,2,
> etc).
> 4. Use level labels for categorical predictors ("male", "female", etc)
> instead of IDs (1,2, etc).
> 5. Add counts of cases in each level for categorical variable in addition
> to modes. This gives the distribution information of the variable.
> 6. If the amount of data in the summary is manageable perhaps
> automatically pull the output of Univar-Stats.dml from HDFS to local
> machine and display the readable version on terminal?
>
> So the output could look like:
>
> age min 10
> age max 123
> age range 113
> age mean 60
> ...
> gender female.count 1000
> gender male.count 2000
> gender mode male
> ...
>
> or even a table format like in R:
>
> age  gender
> min10  female 1000
> max   123male 2000
> range 113mode male
> mean  60 ...
> ...
> Thanks much,
>
> Ethan Xu
>
>


Re: February 2016 SystemML Incubator Podling Report - Draft

2016-02-03 Thread Deron Eriksson
Hi,

Based on useful feedback, I modified the draft slightly.

I have posted the current report version to
https://wiki.apache.org/incubator/February2016 since it is due today. Let
me know if any further edits are required.

Thanks,
Deron


Re: February 2016 SystemML Incubator Podling Report - Draft

2016-02-03 Thread Deron Eriksson
Hi Luciano,


> >   2. Core library improvements, including Apache Spark integration.
> >
>
> This is not a block for graduating. Graduation is about learning the
> "Apache way" and being about community over code, diversity, etc. Nothing
> specific to code stage or quality.
>
>
Very good point about community over code. Do you have a suggestion for a
statement that is more in line with this way of thinking that is not
covered by point 1?

Thanks,
Deron


Re: February 2016 SystemML Incubator Podling Report - Draft

2016-02-03 Thread Deron Eriksson
I see you edited the report on the incubator wiki. Thank you!


On Wed, Feb 3, 2016 at 4:02 PM, Luciano Resende 
wrote:

> For 1, the project should focus on :
>
> 1. Grow SystemML community: Add committers and increase diversity of the
> community.
>
> We would need (active) members of at least three different companies (or
> independents)
>
>
> On Wed, Feb 3, 2016 at 3:50 PM, Deron Eriksson 
> wrote:
>
> > Hi Luciano,
> >
> >
> > > >   2. Core library improvements, including Apache Spark integration.
> > > >
> > >
> > > This is not a block for graduating. Graduation is about learning the
> > > "Apache way" and being about community over code, diversity, etc.
> Nothing
> > > specific to code stage or quality.
> > >
> > >
> > Very good point about community over code. Do you have a suggestion for a
> > statement that is more in line with this way of thinking that is not
> > covered by point 1?
> >
> > Thanks,
> > Deron
> >
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: SystemML 0.9.0 IPMC Release Approval is underway

2016-02-03 Thread Deron Eriksson
That's great news!

FYI, PR 62 (https://github.com/apache/incubator-systemml/pull/62) is merged
and closed. It adds Apache headers to the antlr-generated java files by
including the license in the g4 header sections. Additionally, the old
javacc-generated Token class (with no Apache license) should be removed
(and related code updated). Filed
https://issues.apache.org/jira/browse/SYSTEMML-502 for it.

I just created PR 64 (https://github.com/apache/incubator-systemml/pull/64)
for the CSS and JS licenses. I deleted the modernizr library and added
css/js license info to the standalone LICENSE file. Luciano, can you review
PR64 to see if this is being handled properly?

Deron




On Wed, Feb 3, 2016 at 4:14 PM, Luciano Resende 
wrote:

> Just FYI, the IPMC release approval is underway and folks that are not
> subscribed to general@a.o can follow the thread at
>
> https://www.mail-archive.com/general@incubator.apache.org/msg53110.html
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: Fixed hadoop configuration to run dml on large dataset

2016-02-04 Thread Deron Eriksson
Ethan, thank you for posting the fix to the LZO configuration issue.

Deron


On Thu, Feb 4, 2016 at 9:45 AM, Ethan Xu  wrote:

> Thanks to help from the team, we fixed a hadoop classpath configuration so
> dml successfully invokes MapReduce jobs.
>
> I'm carrying the discussion here in case other people ran into the same
> problem.
>
> Problem description
> I was running a simple dml to carry out data transformation on a hadoop
> cluster (hadoop 2.0.0 cdh4.2.1). The script ran successfully on 1GB data,
> but throws an error on ~30GB of data.
>
> It looks like SystemML didn't need to invoke MapReduce jobs on the small
> data set with console output ' Number of executed MR Jobs: 0'. On the
> larger data it attempted to run MR and threw the following error:
>
> ...
> Caused by: java.lang.ClassNotFoundException: Class
> com.hadoop.compression.lzo.LzoCodec not found
> at
>
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1493)
> at
>
> org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:127)
> ... 38 more
>
>
> Solution
> The missing class com.hadoop.compression.lzo.LzoCodec is contained in the
> lzo-hadoop jar file:
>
> http://search.maven.org/#search%7Cga%7C1%7Cfc%3A%22com.hadoop.compression.lzo.LzoCodec%22
>
> Installation and configuration information of LZO Parcel can be found
> here:
>
> http://www.cloudera.com/documentation/archive/manager/4-x/4-7-3/Cloudera-Manager-Installation-Guide/cmig_install_LZO_Compression.html
> and this stackoverflow solution:
>
> http://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5
>
> For my case it turns out we have the lzo jar but it was not included in
> the classpath. Explicitly pointing to the jar at dml job submission via
> -libjars (https://hadoop.apache.org/docs/r1.2.1/commands_manual.html#jar)
> did the trick:
>
> hadoop jar ./SystemML.jar -libjars /hadoop-lzo-0.4.15.jar
> -f ./transform.dml -nvargs X=/file-to-transform
>
> Ethan
>
>


Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Deron Eriksson
Hi Ethan,

Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar (
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/),
since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the
org.apache.hadoop.conf.Configuration class in that jar doesn't appear to
have a getDouble method, so using that version of hadoop-common won't work.

However, the hadoop-common-2.4.1.jar (
https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/)
does appear to have the getDouble method. It's possible that adding that
jar to your classpath may fix your problem, as Shirish pointed out.

It sounds like Matthias may have another fix.

Deron



On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm  wrote:

> well, we did indeed not run on MR v1 for a while now. However, I don't
> want to get that far and say we don't support it anymore. I'll fix this
> particular issue by tomorrow.
>
> In the next couple of weeks we should run our full performance testsuite
> (for broad coverage) over an MR v1 cluster and systematically remove
> unnecessary incompatibility like this instance. Any volunteers?
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Ethan Xu ---02/04/2016 05:51:28
> PM---Hello, I got an error when running the systemML/scripts/Univar-S]Ethan
> Xu ---02/04/2016 05:51:28 PM---Hello, I got an error when running the
> systemML/scripts/Univar-Stats.dml script on
>
> From: Ethan Xu 
> To: dev@systemml.incubator.apache.org
> Date: 02/04/2016 05:51 PM
> Subject: Compatibility with MR1 Cloudera cdh4.2.1
> --
>
>
>
> Hello,
>
> I got an error when running the systemML/scripts/Univar-Stats.dml script on
> a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set. Error message is at
> the bottom of the email. The same script ran fine on a smaller sample
> (several MB) of the same data set, when MR was not invoked.
>
> The main error was java.lang.NoSuchMethodError:
> org.apache.hadoop.mapred.JobConf.getDouble()
> Digging deeper, it looks like the CDH4.2.1 version of MR indeed didn't have
> the JobConf.getDouble() method.
>
> The hadoop-core jar of CDH4.2.1 can be found here:
>
> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.2.1/
>
> The calling line of SystemML is line 1194 of
>
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/mapred/MRJobConfiguration.java
>
> I was wondering, if the finding is accurate, is there a potential fix, or
> does this mean the current version of SystemML is not compatible with
> CDH4.2.1?
>
> Thank you,
>
> Ethan
>
>
> hadoop jar $sysDir/target/SystemML.jar -f
> $sysDir/scripts/algorithms/Univar-Stats.dml -nvargs
> X=$baseDirHDFS/original-coded.csv
> TYPES=$baseDirHDFS/original-coded-type.csv
> STATS=$baseDirHDFS/univariate-summary.csv
>
> 16/02/04 20:35:03 INFO api.DMLScript: BEGIN DML run 02/04/2016 20:35:03
> 16/02/04 20:35:03 INFO api.DMLScript: HADOOP_HOME: null
> 16/02/04 20:35:03 WARN conf.DMLConfig: No default SystemML config file
> (./SystemML-config.xml) found
> 16/02/04 20:35:03 WARN conf.DMLConfig: Using default settings in DMLConfig
> 16/02/04 20:35:04 WARN hops.OptimizerUtils: Auto-disable multi-threaded
> text read for 'text' and 'csv' due to thread contention on JRE < 1.8
> (java.version=1.7.0_71).
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/usr/local/explorys/datagrid/lib/slf4j-jdk14-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/usr/local/explorys/datagrid/lib/logback-classic-1.0.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> 16/02/04 20:35:07 INFO api.DMLScript: SystemML Statistics:
> Total execution time:0.880 sec.
> Number of executed MR Jobs:0.
>
> 16/02/04 20:35:07 INFO api.DMLScript: END DML run 02/04/2016 20:35:07
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.mapred.JobConf.getDouble(Ljava/lang/String;D)D
>at
>
> org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs(MRJobConfiguration.java:1195)
>at
>
> org.apache.sysml.runtime.matrix.mapred.MRJobConfiguration.setUpMultipleInputs(MRJobConfiguration.java:1129)
>at
>
> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob(CSVReblockMR.java:307)
>at
>
> org.apache.sysml.runtime.matrix.CSVReblockMR.runAssignRowIDMRJob(CSVReblockMR.java:289)
>at
> org.apache.sysml.runtime.matrix.CSVReblockMR.runJob(CSVReblockMR.java:275)
>at org.apache.sysml.lops.runtime.RunMRJobs.submitJob(RunMRJobs.java:257)
>at
>
> org.apache.sysml.lops.runtime.RunMRJobs.prepareAndSubmitJob(RunMRJobs.java:143)
>at
>
> org.apache.sysml.runtime.instruc

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-04 Thread Deron Eriksson
Hi Matthias,

Glad to hear the fix is simple. Mixing jar versions sometimes is not very
fun.

Deron


On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm  wrote:

> well, let's not mix different hadoop versions in the class path or
> client/server. If I'm not mistaken, cdh 4.x always shipped with MR v1. It's
> a trivial fix for us and will be in the repo tomorrow morning anyway.
> Thanks for catching this issue Ethan.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---02/04/2016 11:04:38
> PM---Hi Ethan, Just FYI, I looked at hadoop-common-2.0.0-cdh4.2]Deron
> Eriksson ---02/04/2016 11:04:38 PM---Hi Ethan, Just FYI, I looked at
> hadoop-common-2.0.0-cdh4.2.1.jar (
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 02/04/2016 11:04 PM
> Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
> --
>
>
>
> Hi Ethan,
>
> Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar (
>
> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/
> ),
> since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the
> org.apache.hadoop.conf.Configuration class in that jar doesn't appear to
> have a getDouble method, so using that version of hadoop-common won't work.
>
> However, the hadoop-common-2.4.1.jar (
>
> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/
> )
>
> does appear to have the getDouble method. It's possible that adding that
> jar to your classpath may fix your problem, as Shirish pointed out.
>
> It sounds like Matthias may have another fix.
>
> Deron
>
>
>
> On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm  wrote:
>
> > well, we did indeed not run on MR v1 for a while now. However, I don't
> > want to get that far and say we don't support it anymore. I'll fix this
> > particular issue by tomorrow.
> >
> > In the next couple of weeks we should run our full performance testsuite
> > (for broad coverage) over an MR v1 cluster and systematically remove
> > unnecessary incompatibility like this instance. Any volunteers?
> >
> > Regards,
> > Matthias
> >
> > [image: Inactive hide details for Ethan Xu ---02/04/2016 05:51:28
> > PM---Hello, I got an error when running the
> systemML/scripts/Univar-S]Ethan
> > Xu ---02/04/2016 05:51:28 PM---Hello, I got an error when running the
> > systemML/scripts/Univar-Stats.dml script on
> >
> > From: Ethan Xu 
> > To: dev@systemml.incubator.apache.org
> > Date: 02/04/2016 05:51 PM
> > Subject: Compatibility with MR1 Cloudera cdh4.2.1
> > --
>
> >
> >
> >
> > Hello,
> >
> > I got an error when running the systemML/scripts/Univar-Stats.dml script
> on
> > a hadoop cluster (Cloudera CDH4.2.1) on a 6GB data set. Error message is
> at
> > the bottom of the email. The same script ran fine on a smaller sample
> > (several MB) of the same data set, when MR was not invoked.
> >
> > The main error was java.lang.NoSuchMethodError:
> > org.apache.hadoop.mapred.JobConf.getDouble()
> > Digging deeper, it looks like the CDH4.2.1 version of MR indeed didn't
> have
> > the JobConf.getDouble() method.
> >
> > The hadoop-core jar of CDH4.2.1 can be found here:
> >
> >
>
> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-core/2.0.0-mr1-cdh4.2.1/
>
> >
> > The calling line of SystemML is line 1194 of
> >
> >
> https://github.com/apache/incubator-systemml/blob/master/src/main/java/org/apache/sysml/runtime/matrix/mapred/MRJobConfiguration.java
> >
> > I was wondering, if the finding is accurate, is there a potential fix, or
> > does this mean the current version of SystemML is not compatible with
> > CDH4.2.1?
> >
> > Thank you,
> >
> > Ethan
> >
> >
> > hadoop jar $sysDir/target/SystemML.jar -f
> > $sysDir/scripts/algorithms/Univar-Stats.dml -nvargs
> > X=$baseDirHDFS/original-coded.csv
> > TYPES=$baseDirHDFS/original-coded-type.csv
> > STATS=$baseDirHDFS/univariate-summary.csv
> >
> > 16/02/04 20:35:03 INFO api.DMLScript: BEGIN DML run 02/04/2016 20:35:03
> > 16/02/04 20:35:03 INFO api.DMLScript: HADOOP_HOME: null
> > 16/02/04 20:35:03 WARN conf.DMLConfig: No default SystemML config file
> > (./SystemML-config.xml) found
> > 16/02/04 20:35:03 WARN conf.DMLConfig: Using default settings in
> DMLConfig
> > 16/02/04 20:35:04 WARN hops.OptimizerUtils: Auto-disable multi-threaded
> > text read

Re: Compatibility with MR1 Cloudera cdh4.2.1

2016-02-05 Thread Deron Eriksson
Hi Ethan,

I believe your safest, cleanest bet is to wait for the fix from Matthias.
When he pushes the fix, you will see it at
https://github.com/apache/incubator-systemml/commits/master. At that point,
you can pull (git pull) the changes from GitHub to your machine and then
build with Maven utilizing the new changes.

Alternatively, it's not really recommended, but you might be able to use
-libjars to reference the hadoop-commons jar, which should be in your local
maven repository
(.m2/repository/org/apache/hadoop/hadoop-common/2.4.1/hadoop-common-2.4.1.jar).
However, mixing jar versions usually doesn't work very well (it can lead to
other problems), so waiting for the fix is best.

Deron


On Fri, Feb 5, 2016 at 6:47 AM, Ethan Xu  wrote:

> Thank you Shirish and Deron for the suggestions. Looking forward to the fix
> from Matthias!
>
> We are using the hadoop-common shipped with CDH4.2.1, and it's in
> classpath. I'm a bit hesitate to alter our hadoop configuration to include
> other versions since other people are using it too.
>
> Not sure if/how the following naive approach affects the program behavior,
> but I did try changing the scope of
>
> org.apache.hadoop
> hadoop-common
> ${hadoop.version}
>
> in SystemML's pom.xml from 'provided' to 'compile' and rebuilt the jar
> (21MB), and it threw the same error.
>
> By the way this is in pom.xml line 65 - 72:
> 
>   2.4.1
>   4.3
>   1.4.1
>
> 
> 
> 
>
> Am I supposed to modify the hadoop.version before build?
>
> Thanks again,
>
> Ethan
>
>
>
> On Fri, Feb 5, 2016 at 2:29 AM, Deron Eriksson 
> wrote:
>
> > Hi Matthias,
> >
> > Glad to hear the fix is simple. Mixing jar versions sometimes is not very
> > fun.
> >
> > Deron
> >
> >
> > On Thu, Feb 4, 2016 at 11:10 PM, Matthias Boehm 
> wrote:
> >
> > > well, let's not mix different hadoop versions in the class path or
> > > client/server. If I'm not mistaken, cdh 4.x always shipped with MR v1.
> > It's
> > > a trivial fix for us and will be in the repo tomorrow morning anyway.
> > > Thanks for catching this issue Ethan.
> > >
> > > Regards,
> > > Matthias
> > >
> > > [image: Inactive hide details for Deron Eriksson ---02/04/2016 11:04:38
> > > PM---Hi Ethan, Just FYI, I looked at hadoop-common-2.0.0-cdh4.2]Deron
> > > Eriksson ---02/04/2016 11:04:38 PM---Hi Ethan, Just FYI, I looked at
> > > hadoop-common-2.0.0-cdh4.2.1.jar (
> > >
> > > From: Deron Eriksson 
> > > To: dev@systemml.incubator.apache.org
> > > Date: 02/04/2016 11:04 PM
> > > Subject: Re: Compatibility with MR1 Cloudera cdh4.2.1
> > > --
> > >
> > >
> > >
> > > Hi Ethan,
> > >
> > > Just FYI, I looked at hadoop-common-2.0.0-cdh4.2.1.jar (
> > >
> > >
> >
> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.0.0-cdh4.2.1/
> > > ),
> > > since I don't see a 2.0.0-mr1-cdh4.2.1 version, and the
> > > org.apache.hadoop.conf.Configuration class in that jar doesn't appear
> to
> > > have a getDouble method, so using that version of hadoop-common won't
> > work.
> > >
> > > However, the hadoop-common-2.4.1.jar (
> > >
> > >
> >
> https://repository.cloudera.com/artifactory/repo/org/apache/hadoop/hadoop-common/2.4.1/
> > > )
> > >
> > > does appear to have the getDouble method. It's possible that adding
> that
> > > jar to your classpath may fix your problem, as Shirish pointed out.
> > >
> > > It sounds like Matthias may have another fix.
> > >
> > > Deron
> > >
> > >
> > >
> > > On Thu, Feb 4, 2016 at 6:40 PM, Matthias Boehm 
> > wrote:
> > >
> > > > well, we did indeed not run on MR v1 for a while now. However, I
> don't
> > > > want to get that far and say we don't support it anymore. I'll fix
> this
> > > > particular issue by tomorrow.
> > > >
> > > > In the next couple of weeks we should run our full performance
> > testsuite
> > > > (for broad coverage) over an MR v1 cluster and systematically remove
> > > > unnecessary incompatibility like this instance. Any volunteers?
> > > >
> > > > Regards,
> > > > Matthias
> > > >
> > > > [image: Inactive hide d

Re: SystemML Notebook docker image

2016-02-05 Thread Deron Eriksson
Hi Sourav,

That sounds very useful for people who are interested in running SystemML
through Zeppelin. It would be great if you could share that.

I was wondering, what is your opinion of running SystemML through Zeppelin?
Do you think that is a path that is going to be most useful for data
scientists to do exploratory work with SystemML? Is there anything that you
would like to see improved with regards to the MLContext API?

Deron


On Thu, Feb 4, 2016 at 4:01 PM, Sourav Mazumder  wrote:

> Hi,
>
> I have a complete end to end Modeling and Prediction using Zepplein and
> also visualization of the prediction using R plots.
>
> I can share the same too if that is useful.
>
> Regards,
> Sourav
>
> On Thu, Feb 4, 2016 at 3:20 PM, Luciano Resende 
> wrote:
>
> > I started experimenting with some nice ways to enable data scientists to
> > get started with SystemML with the minimum setup and a pleasant user
> > experience.
> >
> > Following the guide published in the SystemML project documentation page
> > [1], I created a docker image containing the necessary infrastructure for
> > running SystemML in a cluster mode, and also installed and configured
> > Zeppelin with SystemML and the sample notebook available.
> >
> > Please see more detailed instructions to use it at
> >
> > https://github.com/lresende/docker-systemml-notebook
> >
> > If people start to find this very useful we could move this to SystemML
> > project itself and start making more scenarios available as sample
> > Notebooks
> >
> > [1]
> >
> >
> http://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html#zeppelin-notebook-example---linear-regression-algorithm
> >
> > [2] https://github.com/lresende/docker-systemml-notebook
> >
> > --
> > Luciano Resende
> > http://people.apache.org/~lresende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
> >
>


Re: Project folder structure

2016-02-09 Thread Deron Eriksson
With regards to DML and PyDML, the test scripts are in src/test/scripts, so
something similar in main (src/main/scripts) may have some good points to
it. I've used Maven for a long time so personally I'm biased towards
Maven's "convention over configuration" paradigm. However, at the same time
it is nice to have the DML/PyDML scripts available at the root.

I'm a bit confused by the dev folder at the root of the project. If it only
contains a single script (dev/release/release-build.sh), maybe dev needs to
be deleted and the script needs to move elsewhere (the root of the project
itself or into another directory, but it doesn't seem like it would be a
good fit in the scripts directory with the DML). Is there some place under
src/ for release-build.sh? Would it fall under src/main/resources?
Somewhere else?

Perhaps we need some terminology to distinguish between DML/PyDML 'scripts'
and other 'scripts' such as release-build.sh.

Deron


On Tue, Feb 9, 2016 at 1:21 PM, Luciano Resende 
wrote:

> On Tue, Feb 9, 2016 at 12:43 PM, Matthias Boehm  wrote:
>
> > -1
> >
> > I don't see a compelling argument for this unnecessary change to a more
> > complex project structure just to follow Spark which is not directly
> > comparable - both in project size and content. For example, our
> algorithms
> > are at the same time a library of algorithms as well as samples for how
> to
> > write new algorithms. From my perspective, our major goal should be
> > "simplicity via minimality" not "simplicity via common structure" because
> > the latter would always require us to stay in sync.
> >
> > Regards,
> > Matthias
> >
> >
> I just don't see why it would make sense to add "notebooks" and "bash
> release scripts" all inside scripts which to me is currently filled with ML
> Algorithms in different stages or for different purposes.
>
> I am not keen on "simplicity via minimality" neither "simplicity via common
> structure"... I am keen on what makes sense I (and thus drive adoption) for
> someone that is first trying to look trough SystemML code, particularly
> folks that are already used to some best practices or with some other
> projects on the same area.
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Re: SystemML Notebook docker image

2016-02-12 Thread Deron Eriksson
Hi Sourav,

I recently created a "Contributing to SystemML" document that describes
working with Git (
http://apache.github.io/incubator-systemml/contributing-to-systemml.html#systemml-on-github).
The workflow described in the document may help you out.

Also, in case it helps, I found the open-source Pro Git book to be very
useful when I started working with Git (https://progit.org/).

Luciano recently created a samples/zeppelin-notebooks directory, so that is
the place to put your example notebook.

By the way, thank you for your great feedback with regards to SystemML! It
was detailed and excellent.

Deron



On Fri, Feb 12, 2016 at 10:07 AM, Luciano Resende 
wrote:

> You need to git add the folder you are trying to submit as a PR
> Something like the steps below
>
> cd samples/zeppelin-notebooks
> mkdir foo
> cp foo.json foo
> git add foo  -> this will add the folder and it's contents
> git commit -a -m"Some message"
> git push origin branch-name
>
>
>
>
> On Fri, Feb 12, 2016 at 9:27 AM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
> > Hi Deron,
> >
> > I created a Pull request (#69) for the same.
> >
> > But I'm little lost how do I add a new folder and within that the json
> file
> > for the notebook I want to upload. I tried creating a new folder under
> > incubator-systemml  >/samples
> > /
> > *zeppelin-notebooks*/ in my branch but was not able to.
> >
> > When I tried the url you provided I found that it lands to a discussion
> > thread..
> >
> > I'm little new to github. Please bear with my ignorance.
> >
> > Regards,
> > Sourav
> >
> > On Sat, Feb 6, 2016 at 11:55 AM, Luciano Resende 
> > wrote:
> >
> > > On Fri, Feb 5, 2016 at 5:26 PM, Sourav Mazumder <
> > > sourav.mazumde...@gmail.com
> > > > wrote:
> > >
> > > > Hi Deron,
> > > >
> > > > I can surely share that. Can I upload it somewhere in the SystemML's
> > > site ?
> > > >
> > >
> > > I have created a place in the source code for sample notebooks
> > >
> > >
> https://www.mail-archive.com/general@incubator.apache.org/msg53110.html
> > >
> > > Please add a pull request with your notebook when you have a chance,
> > others
> > > feel free to contribute other examples as well.
> > >
> > > --
> > > Luciano Resende
> > > http://people.apache.org/~lresende
> > > http://twitter.com/lresende1975
> > > http://lresende.blogspot.com/
> > >
> >
>
>
>
> --
> Luciano Resende
> http://people.apache.org/~lresende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>


Turn off parallelism in parfor?

2016-02-12 Thread Deron Eriksson
Hi,

Is it possible to turn off parallelism in a parfor loop using a function
parameter? I tried setting 'par' to 1.

If I print i in a for loop, the results come back in order:
 for (i in 1:5) { print(i) }
gives
1
2
3
4
5

If I print i in a parfor loop (with par=1), the results don't come back in
order:
parfor (i in 1:5, par=1) { print(i) }
gives (something similar to)
4
2
5
1
3

Is par=1 (I tried par=0 too) turning off parallelism but the print output
isn't reflecting that?

Deron


Re: SystemML Notebook docker image

2016-02-15 Thread Deron Eriksson
Hi Sourav,

I see Luciano just offered to handle it if you create a JIRA with the
notebook attached. Thank you Luciano.

In case you want to do pull requests in the future, there are a couple
things you might want to check.

1) If you try "git remote -v", the remote repos should be similar to the
following:

$ git remote -v
origin https://github.com/sourav-mazumder/incubator-systemml.git (fetch)
origin https://github.com/sourav-mazumder/incubator-systemml.git (push)
upstream https://github.com/apache/incubator-systemml.git (fetch)
upstream https://github.com/apache/incubator-systemml.git (push)

(Once you push the commits on your local branch to the 'origin' repo, you
can then do a pull request asking to add the updates on that branch in your
'origin' repo to the master branch of the main project.)

2) If those remote settings (git remote -v) are correct, you can also check
and make sure that the git email address you are using locally is the
present in your GitHub account (at https://github.com/settings/emails).

Deron


On Mon, Feb 15, 2016 at 11:54 AM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:

> Hi Deron,
>
> I tried following your instruction. But while pushing my changes back I'm
> getting the error saying
>
> git push --set-upstream origin systemml-zeppelin-spark-example
> Username for 'https://github.com': sourav.mazumde...@gmail.com
> Password for 'https://sourav.mazumde...@gmail.com@github.com
> remote: Permission to apache/incubator-systemml.git denied to
> sourav-mazumder.
>
> Alternatively if someone can create a folder for my notebook ("2BCHR4T1Q")
> under
> incubator-systemml <https://github.com/sourav-mazumder/incubator-systemml
> >/
> samples
> <https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples
> >/
> zeppelin-notebooks
> <
> https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples/zeppelin-notebooks
> >*
> then I can add a file under that folder without the need of pushing the
> branch back. Right now main problem in this approach is that I'm not able
> to create a new folder under *
> incubator-systemml <https://github.com/sourav-mazumder/incubator-systemml
> >/
> samples
> <https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples
> >/
> zeppelin-notebooks
> <
> https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples/zeppelin-notebooks
> >
> /.
>
>
> Regards,
> Sourav
>
> On Fri, Feb 12, 2016 at 10:19 AM, Deron Eriksson 
> wrote:
>
> > Hi Sourav,
> >
> > I recently created a "Contributing to SystemML" document that describes
> > working with Git (
> >
> >
> http://apache.github.io/incubator-systemml/contributing-to-systemml.html#systemml-on-github
> > ).
> > The workflow described in the document may help you out.
> >
> > Also, in case it helps, I found the open-source Pro Git book to be very
> > useful when I started working with Git (https://progit.org/).
> >
> > Luciano recently created a samples/zeppelin-notebooks directory, so that
> is
> > the place to put your example notebook.
> >
> > By the way, thank you for your great feedback with regards to SystemML!
> It
> > was detailed and excellent.
> >
> > Deron
> >
> >
> >
> > On Fri, Feb 12, 2016 at 10:07 AM, Luciano Resende 
> > wrote:
> >
> > > You need to git add the folder you are trying to submit as a PR
> > > Something like the steps below
> > >
> > > cd samples/zeppelin-notebooks
> > > mkdir foo
> > > cp foo.json foo
> > > git add foo  -> this will add the folder and it's contents
> > > git commit -a -m"Some message"
> > > git push origin branch-name
> > >
> > >
> > >
> > >
> > > On Fri, Feb 12, 2016 at 9:27 AM, Sourav Mazumder <
> > > sourav.mazumde...@gmail.com> wrote:
> > >
> > > > Hi Deron,
> > > >
> > > > I created a Pull request (#69) for the same.
> > > >
> > > > But I'm little lost how do I add a new folder and within that the
> json
> > > file
> > > > for the notebook I want to upload. I tried creating a new folder
> under
> > > > incubator-systemml <https://github.com/apache/incubator-systemml
> > > >/samples
> > > > <https://github.com/apache/incubator-systemml/tree/master/samples>/
> > > > *zeppelin-notebooks*/ in my branch but was not able to.
> > > >
> > > > When I tried the url you provided 

Re: SystemML Notebook docker image

2016-02-15 Thread Deron Eriksson
Hi Sourav,

Actually, looking at the error message again, perhaps the issue was that
you need to use your github username for authentication (sourav-mazumder)
rather than the email address.

Thank you for filing the JIRA.

Deron


On Mon, Feb 15, 2016 at 12:16 PM, Deron Eriksson 
wrote:

> Hi Sourav,
>
> I see Luciano just offered to handle it if you create a JIRA with the
> notebook attached. Thank you Luciano.
>
> In case you want to do pull requests in the future, there are a couple
> things you might want to check.
>
> 1) If you try "git remote -v", the remote repos should be similar to the
> following:
>
> $ git remote -v
> origin https://github.com/sourav-mazumder/incubator-systemml.git (fetch)
> origin https://github.com/sourav-mazumder/incubator-systemml.git (push)
> upstream https://github.com/apache/incubator-systemml.git (fetch)
> upstream https://github.com/apache/incubator-systemml.git (push)
>
> (Once you push the commits on your local branch to the 'origin' repo, you
> can then do a pull request asking to add the updates on that branch in your
> 'origin' repo to the master branch of the main project.)
>
> 2) If those remote settings (git remote -v) are correct, you can also
> check and make sure that the git email address you are using locally is the
> present in your GitHub account (at https://github.com/settings/emails).
>
> Deron
>
>
> On Mon, Feb 15, 2016 at 11:54 AM, Sourav Mazumder <
> sourav.mazumde...@gmail.com> wrote:
>
>> Hi Deron,
>>
>> I tried following your instruction. But while pushing my changes back I'm
>> getting the error saying
>>
>> git push --set-upstream origin systemml-zeppelin-spark-example
>> Username for 'https://github.com': sourav.mazumde...@gmail.com
>> Password for 'https://sourav.mazumde...@gmail.com@github.com
>> remote: Permission to apache/incubator-systemml.git denied to
>> sourav-mazumder.
>>
>> Alternatively if someone can create a folder for my notebook ("2BCHR4T1Q")
>> under
>> incubator-systemml <https://github.com/sourav-mazumder/incubator-systemml
>> >/
>> samples
>> <
>> https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples
>> >/
>> zeppelin-notebooks
>> <
>> https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples/zeppelin-notebooks
>> >*
>> then I can add a file under that folder without the need of pushing the
>> branch back. Right now main problem in this approach is that I'm not able
>> to create a new folder under *
>> incubator-systemml <https://github.com/sourav-mazumder/incubator-systemml
>> >/
>> samples
>> <
>> https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples
>> >/
>> zeppelin-notebooks
>> <
>> https://github.com/sourav-mazumder/incubator-systemml/tree/master/samples/zeppelin-notebooks
>> >
>> /.
>>
>>
>> Regards,
>> Sourav
>>
>> On Fri, Feb 12, 2016 at 10:19 AM, Deron Eriksson > >
>> wrote:
>>
>> > Hi Sourav,
>> >
>> > I recently created a "Contributing to SystemML" document that describes
>> > working with Git (
>> >
>> >
>> http://apache.github.io/incubator-systemml/contributing-to-systemml.html#systemml-on-github
>> > ).
>> > The workflow described in the document may help you out.
>> >
>> > Also, in case it helps, I found the open-source Pro Git book to be very
>> > useful when I started working with Git (https://progit.org/).
>> >
>> > Luciano recently created a samples/zeppelin-notebooks directory, so
>> that is
>> > the place to put your example notebook.
>> >
>> > By the way, thank you for your great feedback with regards to SystemML!
>> It
>> > was detailed and excellent.
>> >
>> > Deron
>> >
>> >
>> >
>> > On Fri, Feb 12, 2016 at 10:07 AM, Luciano Resende > >
>> > wrote:
>> >
>> > > You need to git add the folder you are trying to submit as a PR
>> > > Something like the steps below
>> > >
>> > > cd samples/zeppelin-notebooks
>> > > mkdir foo
>> > > cp foo.json foo
>> > > git add foo  -> this will add the folder and it's contents
>> > > git commit -a -m"Some message"
>> > > git push origin branch-name
>> > >
>> > >
>> > >
>> > >
>> > > On Fri, Feb 12, 2016 at 9

Matrix Market format with metadata file

2016-02-15 Thread Deron Eriksson
Hi,

The Matrix Market coordinate format contains # rows, # columns, and #
non-zero values as metadata near the top of a matrix data file.

If I write a matrix in mm format using SystemML, no metadata file is
created since the metadata is stored within the data file.

However, when reading a matrix with mm format, I can supply a metadata
file, even though metadata exists in the matrix data file. Is there any
reason for this, or should this be disallowed since the metadata file is
redundant and can cause confusion, since metadata values can then be
specified in two places, which then brings up the question, "which metadata
value should be used"?

Deron


Re: Matrix Market format with metadata file

2016-02-15 Thread Deron Eriksson
Hi,

I have a question with regards to text vs mm. Isn't the mm coordinate
format identical to the text format but the mm data file happens to include
the metadata line for rows, cols, and nnzs, so shouldn't they scale the
same since the text row values (i,j,v) correspond to the mm rows?

If we have the following MM:
%%MatrixMarket matrix coordinate real general
4 3 6
1 1 1.0
1 2 2.0
1 3 3.0
3 1 7.0
3 2 8.0
3 3 9.0

The corresponding text format (with accompanying metadata file) is:
1 1 1.0
1 2 2.0
1 3 3.0
3 1 7.0
3 2 8.0
3 3 9.0

So aren't these formats essentially the same?

Deron


On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm  wrote:

> The meta data file is still useful in order to get the format. In case of
> matrix market, errors will be raised if included meta data is inconsistent.
> So no, we should not disallow to specify the meta data. In general, we
> anyway recommend using text (textcell) instead mm (matrix market) for
> scalability reasons.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---02/15/2016 03:45:46
> PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> format contains # rows, # columns, and #
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 02/15/2016 03:45 PM
> Subject: Matrix Market format with metadata file
> --
>
>
>
> Hi,
>
> The Matrix Market coordinate format contains # rows, # columns, and #
> non-zero values as metadata near the top of a matrix data file.
>
> If I write a matrix in mm format using SystemML, no metadata file is
> created since the metadata is stored within the data file.
>
> However, when reading a matrix with mm format, I can supply a metadata
> file, even though metadata exists in the matrix data file. Is there any
> reason for this, or should this be disallowed since the metadata file is
> redundant and can cause confusion, since metadata values can then be
> specified in two places, which then brings up the question, "which metadata
> value should be used"?
>
> Deron
>
>
>


Re: Matrix Market format with metadata file

2016-02-15 Thread Deron Eriksson
Thank you, Shirish. That makes sense. I'll update the docs to include this
information.

Deron


On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
shirish.tatiko...@gmail.com> wrote:

> Both "mm" and "text" formats are identical except for a couple of
> differences:
>
> 1) for "mm": the matrix metadata is included in the first two lines; and
> for "text": the metadata is present in the associated .mtd file
> 2) "mm" data must be in a single file (i.e., no *part* files) where "text"
> data can span multiple *part* files (like any other file on HDFS).
>
> The support for "mm" is created mainly for the purpose of
> importing/exporting data in the format that R likes.
>
> Shirish
>
> On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson 
> wrote:
>
> > Hi,
> >
> > I have a question with regards to text vs mm. Isn't the mm coordinate
> > format identical to the text format but the mm data file happens to
> include
> > the metadata line for rows, cols, and nnzs, so shouldn't they scale the
> > same since the text row values (i,j,v) correspond to the mm rows?
> >
> > If we have the following MM:
> > %%MatrixMarket matrix coordinate real general
> > 4 3 6
> > 1 1 1.0
> > 1 2 2.0
> > 1 3 3.0
> > 3 1 7.0
> > 3 2 8.0
> > 3 3 9.0
> >
> > The corresponding text format (with accompanying metadata file) is:
> > 1 1 1.0
> > 1 2 2.0
> > 1 3 3.0
> > 3 1 7.0
> > 3 2 8.0
> > 3 3 9.0
> >
> > So aren't these formats essentially the same?
> >
> > Deron
> >
> >
> > On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm 
> wrote:
> >
> > > The meta data file is still useful in order to get the format. In case
> of
> > > matrix market, errors will be raised if included meta data is
> > inconsistent.
> > > So no, we should not disallow to specify the meta data. In general, we
> > > anyway recommend using text (textcell) instead mm (matrix market) for
> > > scalability reasons.
> > >
> > > Regards,
> > > Matthias
> > >
> > > [image: Inactive hide details for Deron Eriksson ---02/15/2016 03:45:46
> > > PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> > > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> > > format contains # rows, # columns, and #
> > >
> > > From: Deron Eriksson 
> > > To: dev@systemml.incubator.apache.org
> > > Date: 02/15/2016 03:45 PM
> > > Subject: Matrix Market format with metadata file
> > > --
> > >
> > >
> > >
> > > Hi,
> > >
> > > The Matrix Market coordinate format contains # rows, # columns, and #
> > > non-zero values as metadata near the top of a matrix data file.
> > >
> > > If I write a matrix in mm format using SystemML, no metadata file is
> > > created since the metadata is stored within the data file.
> > >
> > > However, when reading a matrix with mm format, I can supply a metadata
> > > file, even though metadata exists in the matrix data file. Is there any
> > > reason for this, or should this be disallowed since the metadata file
> is
> > > redundant and can cause confusion, since metadata values can then be
> > > specified in two places, which then brings up the question, "which
> > metadata
> > > value should be used"?
> > >
> > > Deron
> > >
> > >
> > >
> >
>


Re: Matrix Market format with metadata file

2016-02-15 Thread Deron Eriksson
Very good eye! I used "m = matrix("1 2 3 0 0 0 7 8 9 0 0 0", rows=4,
cols=3)" to generate the mm file, so the 4th row did indeed contain all
zeros.


On Mon, Feb 15, 2016 at 4:50 PM, Shirish Tatikonda <
shirish.tatiko...@gmail.com> wrote:

> Btw (Just to be precise), in your example of "mm" file.. the metadata is "4
> 3 6" but the following non-zero values are only up to row number 3. So,
> either it was a typo or the 4th row contains all zeros.
>
>
>
> On Mon, Feb 15, 2016 at 4:26 PM, Shirish Tatikonda <
> shirish.tatiko...@gmail.com> wrote:
>
> > Both "mm" and "text" formats are identical except for a couple of
> > differences:
> >
> > 1) for "mm": the matrix metadata is included in the first two lines; and
> > for "text": the metadata is present in the associated .mtd file
> > 2) "mm" data must be in a single file (i.e., no *part* files) where
> > "text" data can span multiple *part* files (like any other file on HDFS).
> >
> > The support for "mm" is created mainly for the purpose of
> > importing/exporting data in the format that R likes.
> >
> > Shirish
> >
> > On Mon, Feb 15, 2016 at 4:17 PM, Deron Eriksson  >
> > wrote:
> >
> >> Hi,
> >>
> >> I have a question with regards to text vs mm. Isn't the mm coordinate
> >> format identical to the text format but the mm data file happens to
> >> include
> >> the metadata line for rows, cols, and nnzs, so shouldn't they scale the
> >> same since the text row values (i,j,v) correspond to the mm rows?
> >>
> >> If we have the following MM:
> >> %%MatrixMarket matrix coordinate real general
> >> 4 3 6
> >> 1 1 1.0
> >> 1 2 2.0
> >> 1 3 3.0
> >> 3 1 7.0
> >> 3 2 8.0
> >> 3 3 9.0
> >>
> >> The corresponding text format (with accompanying metadata file) is:
> >> 1 1 1.0
> >> 1 2 2.0
> >> 1 3 3.0
> >> 3 1 7.0
> >> 3 2 8.0
> >> 3 3 9.0
> >>
> >> So aren't these formats essentially the same?
> >>
> >> Deron
> >>
> >>
> >> On Mon, Feb 15, 2016 at 3:56 PM, Matthias Boehm 
> >> wrote:
> >>
> >> > The meta data file is still useful in order to get the format. In case
> >> of
> >> > matrix market, errors will be raised if included meta data is
> >> inconsistent.
> >> > So no, we should not disallow to specify the meta data. In general, we
> >> > anyway recommend using text (textcell) instead mm (matrix market) for
> >> > scalability reasons.
> >> >
> >> > Regards,
> >> > Matthias
> >> >
> >> > [image: Inactive hide details for Deron Eriksson ---02/15/2016
> 03:45:46
> >> > PM---Hi, The Matrix Market coordinate format contains # rows, #]Deron
> >> > Eriksson ---02/15/2016 03:45:46 PM---Hi, The Matrix Market coordinate
> >> > format contains # rows, # columns, and #
> >> >
> >> > From: Deron Eriksson 
> >> > To: dev@systemml.incubator.apache.org
> >> > Date: 02/15/2016 03:45 PM
> >> > Subject: Matrix Market format with metadata file
> >> > --
> >> >
> >> >
> >> >
> >> > Hi,
> >> >
> >> > The Matrix Market coordinate format contains # rows, # columns, and #
> >> > non-zero values as metadata near the top of a matrix data file.
> >> >
> >> > If I write a matrix in mm format using SystemML, no metadata file is
> >> > created since the metadata is stored within the data file.
> >> >
> >> > However, when reading a matrix with mm format, I can supply a metadata
> >> > file, even though metadata exists in the matrix data file. Is there
> any
> >> > reason for this, or should this be disallowed since the metadata file
> is
> >> > redundant and can cause confusion, since metadata values can then be
> >> > specified in two places, which then brings up the question, "which
> >> metadata
> >> > value should be used"?
> >> >
> >> > Deron
> >> >
> >> >
> >> >
> >>
> >
> >
>


"sparse" metadata attribute default value for writing csv

2016-02-16 Thread Deron Eriksson
Hi,

Right now the DML Language Ref states that the default value for the
"sparse" metadata attribute (for the write function) is true.
However, DEFAULT_DELIM_SPARSE in DataExpression is false.

Which value is 'correct'? I assume the docs should be updated to reflect
the code?

Deron


DMLRuntimeException

2016-02-29 Thread Deron Eriksson
Hi,

Can we change DMLRuntimeException to extend RuntimeException rather than
DMLException?

1) The javadocs say DMLRuntimeException is equivalent to RuntimeException.
RuntimeException is an uncaught exception.
2) However, DMLRuntimeException extends DMLException which extends
Exception which is a caught exception.

So, this means that currently DMLRuntimeException in this example needs a
throws clause on the method (or the throw needs to be wrapped in a
try/catch).

public void example() throws DMLRuntimeException {
throw new DMLRuntimeException("Example");
}
If it's a RuntimeException, it should really be:

public void example() {
throw new DMLRuntimeException("Example");
}

Deron


"Scalable Machine Learning with Apache SystemML" talk tonight

2016-03-09 Thread Deron Eriksson
Berthold Reinwald is giving a talk tonight (Wednesday, March 9, 2016) at
6:30pm at the IBM Spark Technology Center in San Francisco about "Scalable
Machine Learning with Apache SystemML."

Information about the talk can be found here:
http://www.meetup.com/SF-Spark-and-Friends/events/229165430/

If you would like to attend the meetup, please join the meetup by 3pm today.

Deron


Remove any of these classes?

2016-04-04 Thread Deron Eriksson
Hi,

If I search for classes/enums that aren't referenced by other classes in
SystemML, I get the following partial list:

org.apache.sysml.hops.Hop.ExtBuiltInOp
org.apache.sysml.parser.Expression.AggOp
org.apache.sysml.parser.Expression.ExtBuiltinFunctionOp
org.apache.sysml.parser.Expression.ReorgOp
org.apache.sysml.runtime.controlprogram.parfor.opt.MemoTable
org.apache.sysml.runtime.functionobjects.MaxIndex
org.apache.sysml.runtime.functionobjects.MinIndex
org.apache.sysml.runtime.instructions.cp.FileObject
org.apache.sysml.runtime.instructions.spark.data.CountLinesInfo
org.apache.sysml.runtime.instructions.spark.functions.ConvertColumnRDDToBinaryBlock
org.apache.sysml.runtime.instructions.spark.functions.ConvertMLLibBlocksToBinaryBlocks
org.apache.sysml.runtime.instructions.spark.functions.ConvertTextLineToBinaryCellFunction
org.apache.sysml.runtime.instructions.spark.functions.ConvertTextToString
org.apache.sysml.runtime.instructions.spark.functions.FindMatrixBlockFromMatrixIndexes
org.apache.sysml.runtime.instructions.spark.functions.GetMLLibBlocks
org.apache.sysml.runtime.instructions.spark.functions.LastCellInMatrixBlock
org.apache.sysml.runtime.instructions.spark.functions.MatrixVectorBinaryOpFunction
org.apache.sysml.runtime.io.FrameReaderFactory
org.apache.sysml.runtime.io.FrameWriterFactory
org.apache.sysml.runtime.io.WriterMatrixMarketParallel
org.apache.sysml.runtime.matrix.data.PoissonRandomMatrixGenerator
org.apache.sysml.runtime.matrix.data.TaggedInt
org.apache.sysml.runtime.matrix.data.TaggedPartialBlock
org.apache.sysml.runtime.matrix.data.WeightedPairToSortInputConverter
org.apache.sysml.runtime.matrix.data.RuntimeDataFormat
org.apache.sysml.runtime.matrix.mapred.CachedMap
org.apache.sysml.runtime.matrix.mapred.MMCJMRCombiner
org.apache.sysml.runtime.matrix.mapred.MMCJMRReducer
org.apache.sysml.runtime.matrix.sort.CompactDoubleIntInputFormat
org.apache.sysml.runtime.util.BinaryBlockInputFormat
org.apache.sysml.runtime.util.RandN
org.apache.sysml.utils.AppException
org.apache.sysml.utils.Timer
org.apache.sysml.yarn.ropt.ResourceOptimizerCPMigration

Can any of these be deleted?

Deron


Re: Discussion SYSTEMML-593 MLContext Resign

2016-04-04 Thread Deron Eriksson
Hi Matthias,


On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm  wrote:

>
> Also rather than introducing another exception class, couldn't we just
> reuse DMLException by making it an uncaught exception?
>

Thank you for the feedback. My thoughts for creating an uncaught
MLContextException center around a few issues:

1) The exception happens through the MLContext API, so to me the naming
makes sense.

2) Placement of the new exception is in a org.apache.sysml.api.mlcontext
package, along with all other primary classes that the user interacts with
when using the MLContext API. This standardization is useful for things
such as Javadocs (all API classes are under this single package) and code
maintainability (all primary API classes are centralized in a single place).

3) This exception would handle errors that happen at the MLContext level
and additionally be a wrapper for deeper levels of exceptions.

4) The naming of DMLException can be confusing to an end user. If a user
runs PyDML and gets a DMLException, this is a little disconcerting.

5) It seems to me the cleanest least-invasive solution that gives the user
the simplest experience is a new runtime exception. Given the
interrelations of existing exception classes, almost all of which are
actually caught exceptions, a new runtime exception avoids any retrofitting
needed of existing exception handling. In the MLContext API, you just catch
all existing caught exceptions and rethrow as MLContextExceptions, or
generate new MLContextExceptions where appropriate.

Throwable
|-Exception
  |-DMLException (323 matches)
|-AppException (not used)
|-DMLDebuggerException (13 matches)
|-DMLRuntimeException (3,689 matches)
  |-CacheException (82 matches)
|-CacheIOException (26 matches)
|-CacheStatusException (18 matches)
  |-DMLScriptException (39 matches)
|-DMLUnsupportedOperationException (1,068 matches)
|-HopsException (748 matches)
|-LanguageException (390 matches)
|-LopsException (596 matches)
  |-ParseException (320 matches)
|-DMLParseException (50 matches)
  |-RuntimeException
|-PackageRuntimeException (70 matches)

Since there are already 15 existing exception classes, adding an additional
exception class for a programmatic API doesn't seem to me to be an
unnecessary proliferation of classes.

Deron


Re: Discussion SYSTEMML-593 MLContext Resign

2016-04-04 Thread Deron Eriksson
Hi Matthias,

On Sat, Apr 2, 2016 at 9:34 PM, Matthias Boehm  wrote:

>
> (1) Simplicity: Given that the primary usecase of MLContext calls a script
> exactly once, I'm wondering if the separation into Script, ScriptFactory,
> ScriptExecutor and MLContext adds unnecessary complexity by requiring more
> code to setup. It would be great to see old vs new examples side by side.
>
>

With the introduction of notebooks, we might need to be calling scripts
more than once in the near future.


The current API is very procedural. For example, there are 24 execute() and
executeScript() methods on the MLContext class. Encapsulating concepts such
as 'Script' can bring a significant amount of power, flexibility, and
actually simplicity to the system. These 24 execute methods can be replaced
by a single execute(Script script) method on MLContext. We can also include
a second execute method, execute(Script script, ScriptExecutor
scriptExecutor) so that an advanced user can easily modify the execution
steps. The existing API is not extensible in this way. A user would need to
modify the source code of the MLContext
executeUsingSimplifiedCompilationChain in order to do this, whereas with
this redesign a user can subclass ScriptExecutor and modify it as needed.
If a user wants to use the default execution, the user can just call the
MLContext execute(Script script) method.


This is an example of simplifying the end user experience (replace 24
execute methods with 1 execute method). However, it is also nice to add
extensibility (via a second execute method that takes a ScriptExecutor) for
advanced cases. A normal user probably would not really care about
ScriptExecutor and wouldn't need to use it.


As another potential benefit of Script objects, we could conceivably do
things like encapsulate a namespace into the script object, or have a
script object encapsulate a list of other script objects, etc.


As a further example of simplicity, the current 18 registerInput methods on
MLContext can be replaced by a single Script in(String str, Object obj)
method. Chaining method calls by returning a Script object from the method
call (script.in("$a", 5).in("$b", true)...) is a convenient way of setting
multiple inputs in a single line of code consisting of multiple method
calls.


In terms of user interaction, at its most basic, a script consists of some
text (string), has a type (DML or PYDML), can have inputs, and can have
outputs. If things are broken down any further and we lose the
encapsulation, then we have a procedural API with lots of registerInputs
and executes, for example.


WRT factories, they can be a very useful design pattern. Here's an example
of creating a DML script from a String, a DML script from a file, a PYDML
script from a URL, and a PYDML script from an input stream. That's 4
scripts from four different sources in four lines of code with no
boilerplate or boolean flags.


Script scr1 = ScriptFactory.createDMLScriptFromString("print('hi');");

Script scr2 = ScriptFactory.createDMLScriptFromFile("ex.dml");

Script scr3 = ScriptFactory.createPYDMLScriptFromUrl("
http://example.com/alg.pydml";);

Script scr4 = ScriptFactory.createPYDMLScriptFromInputStream(myInputStream);


As for other code examples, we can replace MLContext's 18 registerInputs:


registerInput(String, DataFrame)

registerInput(String, DataFrame, boolean)

registerInput(String, JavaPairRDD, String, long, long,
long, FileFormatProperties)

registerInput(String, JavaPairRDD, long, long)

registerInput(String, JavaPairRDD, long, long,
int, int)

registerInput(String, JavaPairRDD, long, long,
int, int, long)

registerInput(String, JavaPairRDD,
MatrixCharacteristics)

registerInput(String, JavaRDD, String)

registerInput(String, JavaRDD, String, boolean, String, boolean,
double)

registerInput(String, JavaRDD, String, boolean, String, boolean,
double, long, long, long)

registerInput(String, JavaRDD, String, long, long)

registerInput(String, JavaRDD, String, long, long, long)

registerInput(String, MLMatrix)

registerInput(String, RDD, String)

registerInput(String, RDD, String, boolean, String, boolean, double)

registerInput(String, RDD, String, boolean, String, boolean,
double, long, long, long)

registerInput(String, RDD, String, long, long)

registerInput(String, RDD, String, long, long, long)


with one Script in(String, Object) method (and perhaps another in() method
for a Scala immutable Map).


We can replace the existing 24 execute and executeScript methods on
MLContext:


execute(String)

execute(String, ArrayList)

execute(String, ArrayList, boolean)

execute(String, ArrayList, boolean, String)

execute(String, ArrayList, String)

execute(String, boolean)

execute(String, boolean, String)

execute(String, HashMap)

execute(String, HashMap, boolean)

execute(String, HashMap, boolean, String)

execute(String, HashMap, String)

execute(String, Map)

execute(String, Map, boolean)

execute(String, String)

execute(String, String[])

execut

Change commons-math3 to compile scope?

2016-04-06 Thread Deron Eriksson
WRT SYSTEMML-489 (https://issues.apache.org/jira/browse/SYSTEMML-489),
support for older Hadoop clusters was assisted by changing the
commons-math3 pom.xml scope from "provided" to "compile". Can we update the
project to reflect this, or are there any reasons not to?

Deron


Re: Change commons-math3 to compile scope?

2016-04-06 Thread Deron Eriksson
Adding it to a troubleshooting guide sounds like a reasonable approach.

Deron

On Wed, Apr 6, 2016 at 2:44 PM, Matthias Boehm  wrote:

> well, we don't want to get into having multiple commons math versions in
> the classpath and newer hadoop distributions have it by default. So I would
> rather add it to a trouble shooting guide. Alternatively, we could have two
> different 'distribution' profiles for releases.
>
> Regards,
> Matthias
>
> [image: Inactive hide details for Deron Eriksson ---04/06/2016 02:40:13
> PM---WRT SYSTEMML-489 (https://issues.apache.org/jira/browse/SY]Deron
> Eriksson ---04/06/2016 02:40:13 PM---WRT SYSTEMML-489 (
> https://issues.apache.org/jira/browse/SYSTEMML-489), support for older
> Hadoop clus
>
> From: Deron Eriksson 
> To: dev@systemml.incubator.apache.org
> Date: 04/06/2016 02:40 PM
> Subject: Change commons-math3 to compile scope?
> --
>
>
>
> WRT SYSTEMML-489 (https://issues.apache.org/jira/browse/SYSTEMML-489),
> support for older Hadoop clusters was assisted by changing the
> commons-math3 pom.xml scope from "provided" to "compile". Can we update the
> project to reflect this, or are there any reasons not to?
>
> Deron
>
>
>


Re: Fw: Updating documentation for notebook

2016-04-11 Thread Deron Eriksson
Hi Niketan,

I think a separate section for Notebooks is a great idea since, as you
point out, they are hidden under the MLContext section. Also, I really like
the idea of making it as easy as possible for a new user to try out
SystemML in a Notebook. Very good points.

Tutorials for all the algorithms using real-world data would be fantastic.
To me, I would also like to see single-line algorithm invocations (possibly
with generated data) that could be copy/pasted that work with no
modifications needed by the user. This would probably mean either including
small sets of example data in the project, or allowing the reading of data
from URLs.

It would be nice to take something like these 5 commands:
---
$ wget
https://raw.githubusercontent.com/apache/incubator-systemml/master/scripts/datagen/genRandData4Univariate.dml
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
genRandData4Univariate.dml -exec hybrid_spark -args 100 100 10 1 2 3 4
uni.mtx
$ echo '1' > uni-types.csv
$ echo '{"rows": 1, "cols": 1, "format": "csv"}' > uni-types.csv.mtd
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
$SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs
X=uni.mtx TYPES=uni-types.csv STATS=uni-stats.txt
---
and reduce this to 1 command (in the documentation) that the user can
copy/paste and the algorithm runs without any additional work needed by the
user:
---
$ $SPARK_HOME/bin/spark-submit $SYSTEMML_HOME/SystemML.jar -f
$SYSTEMML_HOME/algorithms/Univar-Stats.dml -exec hybrid_spark -nvargs X=
http://www.example.com/uni.mtx TYPES=http://www.example.com/uni-types.csv
STATS=uni-stats.txt
---
If we had this for each of the main algorithms, this would give the users
working examples to start with, which is easier than trying to figure out
this kind of thing by reading the comments in the DML algorithm files.

Deron


On Fri, Apr 8, 2016 at 4:51 PM, Niketan Pansare  wrote:

> Hi all,
>
> As per Luciano's suggestion, I have create a PR with bluemix/datascientist
> tutorial and have flagged it with "Please DONOT push this PR until the
> discussion on dev mailing list is complete." :)
>
> Also, I apologize for incorrect indentation in last email. Here is another
> attempt:
> - How do you want try SystemML ?
> --+ Notebook on cloud
> * Bluemix
> -- + Zeppelin
> --- Using Python Kernel
>  + Learn how to write DML program--(something along the lines
> of
> http://apache.github.io/incubator-systemml/beginners-guide-to-dml-and-pydml.html
> )
>  + Try out pre-packaged algorithms on real-world dataset
> -- * Linear Regression
> -- * GLM
> -- * ALS
> -- * ...
>  + Learn how to pass RDD/DataFrame to SystemML
>  + Learn how to use SystemML as MLPipeline
> estimator/transformer
>  + Learn how to use SystemML with existing Python packages
> --- Using Scala Kernel
>  + ... similar to Python kernel
> --- Using DML Kernel
>  + Learn how to write DML program
> -- + Jupyter
> - Using Python Kernel
> - Using Scala Kernel
> - Using DML Kernel
> * Data scientist's work bench
> * Databricks cloud
> * ...
> --+ Notebook on laptop/cluster
> * Zeppelin
> * Jupyter
> --+ Laptop
> * Run SystemML as Standalone jar:
> http://apache.github.io/incubator-systemml/quick-start-guide.html
> * Embed SystemML into other Java program:
> http://apache.github.io/incubator-systemml/jmlc.html
> * Debug a DML script:
> http://apache.github.io/incubator-systemml/debugger-guide.html
> * Spark local mode
> --+ Spark Cluster
> * Batch invocation
> * Using Spark REPL
> --+ Learn how to pass RDD/DataFrame to SystemML
> --+ Learn how to use SystemML as MLPipeline estimator/transformer
> * Using PySpark REPL
> --+ Learn how to pass RDD/DataFrame to SystemML
> --+ Learn how to use SystemML as MLPipeline estimator/transformer
> --+ Hadoop Cluster
> --+ Spark Cluster on EC2
>
> Thanks,
>
> Niketan Pansare
> IBM Almaden Research Center
> E-mail: npansar At us.ibm.com
> http://researcher.watson.ibm.com/researcher/view.php?person=us-npansar
>
> - Forwarded by Niketan Pansare/Almaden/IBM on 04/08/2016 04:48 PM
> -
>
>
>
> *Fw: Updating documentation for notebook*
>
> *Niketan Pansare *
> to:
> dev
> 04/08/2016 01:11 PM
>
>
>
>
> From:
> Niketan Pansare/Almaden/IBM
>
>
>
>
> To:
> dev 
>
> Hi all,
>
> Here are few suggestions to get things started:
> 1. Have a "Quick Start" (or "Get Started") button besides "Get SystemML"
> on http://systemml.apache.org/.
>
> 2. Then user can go through following questionnaire/bulleted list which
> points people to appropriate link:
> - How do you want try SystemML ?
> + Notebook on cloud
> * Bluemix
> + Zeppelin
> - Using Python Kernel
> + Learn how to write DML program (something along the lines of
> http://apache.github.io/incubator-systemml/beginn

  1   2   3   >