Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
Will make a fix to the site. Thanks all. 2018년 8월 24일 (금) 오전 9:41, Reynold Xin 님이 작성: > I wrote both the Spark one and later the Databricks one. The latter had a > lot more work put into it and is consistent with the Spark style. I'd just > use the second one and link to it, if possible. > > > >

Re: [MLlib][Test] Smoke and Metamorphic Testing of MLlib

2018-08-23 Thread Matei Zaharia
Yes, that makes sense, but just to be clear, using the same seed does *not* imply that the algorithm should produce “equivalent” results by some definition of equivalent if you change the input data. For example, in SGD, the random seed might be used to select the next minibatch of examples,

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Reynold Xin
I wrote both the Spark one and later the Databricks one. The latter had a lot more work put into it and is consistent with the Spark style. I'd just use the second one and link to it, if possible. On Thu, Aug 23, 2018 at 6:38 PM Hyukjin Kwon wrote: > If you meant "Code Style Guide", many of

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
If you meant "Code Style Guide", many of them are missing and it refers https://docs.scala-lang.org/style/ not https://github.com/databricks/scala-style-guide (please correct me if I misunderstood). For instance, I lately guided 2 indents for line continuation but I found it's actually not in the

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Sean Owen
Seems OK to me. The style is pretty standard Scala style anyway. My guidance is always to follow the code around the code you're changing. On Thu, Aug 23, 2018 at 8:14 PM Hyukjin Kwon wrote: > Hi all, > > I usually follow https://github.com/databricks/scala-style-guide for > Apache Spark's

Re: Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Matei Zaharia
There’s already a code style guide listed on http://spark.apache.org/contributing.html. Maybe it’s the same? We should decide which one we actually want and update this page if it’s wrong. Matei > On Aug 23, 2018, at 6:33 PM, Sean Owen wrote: > > Seems OK to me. The style is pretty standard

Porting or explicitly linking project style in Apache Spark based on https://github.com/databricks/scala-style-guide

2018-08-23 Thread Hyukjin Kwon
Hi all, I usually follow https://github.com/databricks/scala-style-guide for Apache Spark's style, which is usually generally the same with the Spark's code base in practice. Thing is, we don't explicitly mention this within Apache Spark as far as I can tell. Can we explicitly mention this or

Re: [MLlib][Test] Smoke and Metamorphic Testing of MLlib

2018-08-23 Thread Erik Erlandson
Behaviors at this level of detail, across different ML implementations, are highly unlikely to ever align exactly. Statistically small changes in logic, such as "<" versus "<=", or differences in random number generators, etc, (to say nothing of different implementation languages) will accumulate

Re: [MLlib][Test] Smoke and Metamorphic Testing of MLlib

2018-08-23 Thread Steffen Herbold
Dear Matei, thanks for the feedback! I used the setSeed option for all randomized classifiers and always used the same seeds for training with the hope that this deals with the non-determinism. I did not run any significance tests, because I was considering this from a functional