Re: Java Code Style Template

2017-06-15 Thread Nakul Jindal
1) +1
2) -1 --- i agree with Mike, also it seems like there is more setup needed
in various editors to preserve trailing whitespace than to get rid of it.
3) Soft -1 --- We can try to remain consistent with the java spec. Since
Point 1 (switch indentation) is followed from the java spec, maybe we could
adhere to it completely. But, I have no strong opinion.

-Nakul

On Thu, Jun 15, 2017 at 10:51 AM,  wrote:

> Thanks for bringing this up, Matthias.  Here are my thoughts on those
> items:
>
> 1) +1
> 2) -1 -- I would prefer no extraneous whitespace at the end of any lines.
> 3) I don't have a strong opinion here.  I would suggest also including
> thoughts on the positioning of curly braces, i.e. on the same line
> (Java-style) versus on a new line (C-style).
>
> Looking forward to thoughts from others as well.
>
> --
>
> Mike Dusenberry
> GitHub: github.com/dusenberrymw
> LinkedIn: linkedin.com/in/mikedusenberry
>
> Sent from my iPhone.
>
>
> > On Jun 10, 2017, at 10:39 PM, Matthias Boehm 
> wrote:
> >
> > thanks Deron for preparing the initial version of our code style
> templates
> > - these templates are certainly very useful for consistency. I finally
> got
> > a chance to have a look and would like to propose some minor changes for
> > the Java code style.
> >
> > 1) Switch statement indentation: Right now, the template does not use
> > indentation for case labels in switch statements. However, in our current
> > code base, we use indentation for almost all switch statements because it
> > is much easier to read. I know there is some controversy about this but
> > even the Java spec uses indentation for switch statements [1].
> >
> > 2) Empty line indentation: We also generally indent empty lines to align
> > with the previous line, which makes the code faster to navigate and edit.
> >
> > 3) White spaces / new lines around braces: I would prefer to use (a)
> white
> > spaces after opening and before closing parentheses, but not before
> opening
> > parentheses (in for, while, if, switch, etc), and (b) insert newlines
> > before keywords such else and catch.
> >
> > Finally, we also need to discuss if we should auto format the existing
> > code. In my opinion, the auto formatted code usually looks quite poorly.
> > Hence, I would restrict any auto formatting to files which formatting is
> > really off (e.g., files with space indentation).
> >
> > Regards,
> > Matthias
> >
> > [1] http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.
> html#jls-14.11
>


Re: Java Code Style Template

2017-06-15 Thread dusenberrymw
Thanks for bringing this up, Matthias.  Here are my thoughts on those items:

1) +1
2) -1 -- I would prefer no extraneous whitespace at the end of any lines.
3) I don't have a strong opinion here.  I would suggest also including thoughts 
on the positioning of curly braces, i.e. on the same line (Java-style) versus 
on a new line (C-style).

Looking forward to thoughts from others as well.

--

Mike Dusenberry
GitHub: github.com/dusenberrymw
LinkedIn: linkedin.com/in/mikedusenberry

Sent from my iPhone.


> On Jun 10, 2017, at 10:39 PM, Matthias Boehm  wrote:
> 
> thanks Deron for preparing the initial version of our code style templates
> - these templates are certainly very useful for consistency. I finally got
> a chance to have a look and would like to propose some minor changes for
> the Java code style.
> 
> 1) Switch statement indentation: Right now, the template does not use
> indentation for case labels in switch statements. However, in our current
> code base, we use indentation for almost all switch statements because it
> is much easier to read. I know there is some controversy about this but
> even the Java spec uses indentation for switch statements [1].
> 
> 2) Empty line indentation: We also generally indent empty lines to align
> with the previous line, which makes the code faster to navigate and edit.
> 
> 3) White spaces / new lines around braces: I would prefer to use (a) white
> spaces after opening and before closing parentheses, but not before opening
> parentheses (in for, while, if, switch, etc), and (b) insert newlines
> before keywords such else and catch.
> 
> Finally, we also need to discuss if we should auto format the existing
> code. In my opinion, the auto formatted code usually looks quite poorly.
> Hence, I would restrict any auto formatting to files which formatting is
> really off (e.g., files with space indentation).
> 
> Regards,
> Matthias
> 
> [1] http://docs.oracle.com/javase/specs/jls/se8/html/jls-14.html#jls-14.11


Re: Rework inter-procedural analysis

2017-06-15 Thread Deron Eriksson
Documentation is an essential part of the software development process,
especially when working on a complex system in a collaborative environment
where we want to encourage community growth.



On Wed, Jun 14, 2017 at 10:11 PM, Nakul Jindal  wrote:

> Thank you Matthias for agreeing to do this!
>
> "Having a very verbose doc quickly gets outdated" is a problem many
> projects deal with. We can have the community comment on PRs that changes
> those parts, if the documentation does not reflect the submitted change.
> As a starting point, since you are most familiar with the component, very
> verbose documentation is VERY welcome :)
> Specially for a complicated component like this one. It would greatly help
> existing and new members. (Unless someone on the mailing list feels
> otherwise).
>
> -Nakul
>
>
>
>
> On Wed, Jun 14, 2017 at 9:04 PM, Matthias Boehm 
> wrote:
>
> > sure - I'll try to add some documentation of IPA, probably directly
> inlined
> > into the code. Unfortunately, a too verbose dev documentation quickly
> gets
> > outdated because nobody updates it - let's see if we find the sweet spot
> > that works for the project.
> >
> > Regards,
> > Matthias
> >
> >
> > On Wed, Jun 14, 2017 at 4:15 PM,  wrote:
> >
> > > Agreed.  More documentation, especially within the optimizer portion of
> > > the engine, is quite useful.  Given that a large number of our bugs and
> > > performance issues stem from this area, it would be good for it to be
> > clean
> > > and well documented so that future bug searches/fixes can be completed
> > in a
> > > more expedient manner.
> > >
> > > --
> > >
> > > Mike Dusenberry
> > > GitHub: github.com/dusenberrymw
> > > LinkedIn: linkedin.com/in/mikedusenberry
> > >
> > > Sent from my iPhone.
> > >
> > >
> > > > On Jun 14, 2017, at 8:51 AM, Nakul Jindal  wrote:
> > > >
> > > > Hi Matthias,
> > > >
> > > > If its not too much trouble, could you please create a design
> document
> > > for
> > > > this change.
> > > > This will help the rest of the contributors work on this component as
> > > well.
> > > >
> > > > Thanks,
> > > > Nakul
> > > >
> > > >
> > > > On Wed, Jun 14, 2017 at 12:00 AM, Matthias Boehm <
> > mboe...@googlemail.com
> > > >
> > > > wrote:
> > > >
> > > >> just a quick heads up: in the next couple of days, I'll rework our
> > > existing
> > > >> inter-procedural analysis (IPA) in order to (1) create well-defined
> > IPA
> > > >> passes, (2) reuse functional call graphs across multiple rounds of
> > IPA,
> > > and
> > > >> (3) introduce new IPA passes such as fine-grained literal
> propagation
> > > and
> > > >> replacements as well as inlining of functions with control
> structures.
> > > This
> > > >> will help improve the performance and debugging of scripts with
> > complex
> > > >> function call patterns. However, since this is a rather disruptive
> > > change,
> > > >> we might experience temporarily some compiler issues - if that
> happens
> > > >> please file anything you encounter against SYSTEMML-1668.
> > > >>
> > > >> Regards,
> > > >> Matthias
> > > >>
> > >
> >
>



-- 
Deron Eriksson
Spark Technology Center
http://www.spark.tc/


Re: Unexpected Executor Crash

2017-06-15 Thread Glenn Weidner

Hi Anthony,

Could you retry your scenario without the '-exec spark' option?  By
default, SystemML will run in hybrid_spark mode which is more efficient.

Thanks,
Glenn




From:   Anthony Thomas 
To: dev@systemml.apache.org
Date:   06/15/2017 09:50 AM
Subject:Unexpected Executor Crash



Hi SystemML Developers,

I'm running the following simple DML script under SystemML 0.14:

M = read('/scratch/M5.csv')
N = read('/scratch/M5.csv')
MN = M %*% N
if (1 == 1) {
print(as.scalar(MN[1,1]))
}

The matrix M is square and about 5GB on disk (stored in HDFS). I am
submitting the script to a 2 node spark cluster where each physical machine
has 30GB of RAM. I am using the following command to submit the job:

$SPARK_HOME/bin/spark-submit --driver-memory=5G --executor-memory=25G
--conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128
--verbose --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer
$SYSTEMML_HOME/SystemML.jar -f example.dml -exec spark -explain runtime

However, I consistently run into errors like:

ERROR TaskSchedulerImpl: Lost executor 1 on 172.31.3.116: Remote RPC client
disassociated. Likely due to containers exceeding thresholds, or network
issues. Check driver logs for WARN messages.

and the job eventually aborts. Consulting the output of executors shows
they are crashing with OutOfMemory exceptions. Even if one executor needed
to store M,N and MN in memory simultaneously it seems like there should be
enough memory so I'm unsure why the executor is crashing. In addition, I
was under the impression that Spark would spill to disk if there was
insufficient memory. I've tried various combinations of
increasing/decreasing the number of executor cores (from 1 to 8), using
more/fewer executors, increasing/decreasing Spark's memoryFraction, and
increasing/decreasing Spark's default parallelism all without success. Can
anyone offer any advice or suggestions to debug this issue further? I'm not
a very experienced Spark user so it's very possible I haven't configured
something correctly. Please let me know if you'd like any further
information.

Best,

Anthony Thomas




Unexpected Executor Crash

2017-06-15 Thread Anthony Thomas
Hi SystemML Developers,

I'm running the following simple DML script under SystemML 0.14:

M = read('/scratch/M5.csv')
N = read('/scratch/M5.csv')
MN = M %*% N
if (1 == 1) {
print(as.scalar(MN[1,1]))
}

The matrix M is square and about 5GB on disk (stored in HDFS). I am
submitting the script to a 2 node spark cluster where each physical machine
has 30GB of RAM. I am using the following command to submit the job:

$SPARK_HOME/bin/spark-submit --driver-memory=5G --executor-memory=25G
--conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128
--verbose --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer
$SYSTEMML_HOME/SystemML.jar -f example.dml -exec spark -explain runtime

However, I consistently run into errors like:

ERROR TaskSchedulerImpl: Lost executor 1 on 172.31.3.116: Remote RPC client
disassociated. Likely due to containers exceeding thresholds, or network
issues. Check driver logs for WARN messages.

and the job eventually aborts. Consulting the output of executors shows
they are crashing with OutOfMemory exceptions. Even if one executor needed
to store M,N and MN in memory simultaneously it seems like there should be
enough memory so I'm unsure why the executor is crashing. In addition, I
was under the impression that Spark would spill to disk if there was
insufficient memory. I've tried various combinations of
increasing/decreasing the number of executor cores (from 1 to 8), using
more/fewer executors, increasing/decreasing Spark's memoryFraction, and
increasing/decreasing Spark's default parallelism all without success. Can
anyone offer any advice or suggestions to debug this issue further? I'm not
a very experienced Spark user so it's very possible I haven't configured
something correctly. Please let me know if you'd like any further
information.

Best,

Anthony Thomas