[GitHub] spark pull request #20759: Added description of checkpointInterval parameter

2018-07-27 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20759#discussion_r205838324
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -19,6 +19,7 @@ by a small set of latent factors that can be used to 
predict missing entries.
 algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
 following parameters:
 
+* *checkpointInterval* helps with recovery when nodes fail and 
StackOverflow exceptions caused by long lineage. **Will be silently ignored if 
*SparkContext.CheckpointDir* is not set.** (defaults to 10).
--- End diff --

Nit: StackOverflow exceptions -> either StackOverflowError or stack 
overflow errors. Also you're nesting `*` and `**` in the markdown; does that 
work?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20759: Added description of checkpointInterval parameter

2018-03-09 Thread MrMathias
Github user MrMathias commented on a diff in the pull request:

https://github.com/apache/spark/pull/20759#discussion_r173563217
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -19,6 +19,7 @@ by a small set of latent factors that can be used to 
predict missing entries.
 algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
 following parameters:
 
+* *checkpointInterval* helps with recovery when nodes fail and 
StackOverflow exceptions caused by long lineage. **Will be silently ignored if 
*SparkContext.CheckpointDir* is not set.** (defaults to 10).
--- End diff --

Checkpointing exists to better deal with node failure and decrease memory 
consumption from lineage. This wording is taken from the parameter-comment in 
the ALS implementation itself, so I think it is fitting.

This list of parameters is both a sub-set and unordered. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20759: Added description of checkpointInterval parameter

2018-03-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/20759#discussion_r173379554
  
--- Diff: docs/ml-collaborative-filtering.md ---
@@ -19,6 +19,7 @@ by a small set of latent factors that can be used to 
predict missing entries.
 algorithm to learn these latent factors. The implementation in `spark.ml` 
has the
 following parameters:
 
+* *checkpointInterval* helps with recovery when nodes fail and 
StackOverflow exceptions caused by long lineage. **Will be silently ignored if 
*SparkContext.CheckpointDir* is not set.** (defaults to 10).
--- End diff --

the wording is a bit severe... do we have to say node failure or 
stackoverflow (latter should be rare anyway?)

also is this list of param sorted in any way? perhaps add 
checkpointInterval to the end?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20759: Added description of checkpointInterval parameter

2018-03-07 Thread MrMathias
GitHub user MrMathias opened a pull request:

https://github.com/apache/spark/pull/20759

Added description of checkpointInterval parameter

Current behavior of ALS and checkpointInterval can result in unexpected 
behavior, I have added explicit description to hopefully reduce confusion.

## What changes were proposed in this pull request?

better documentation of ml.ALS

## How was this patch tested?

compiled the docs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/MrMathias/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20759.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20759


commit 17a71c357bcc5ca68f3fd11f49bb61a91603527a
Author: Mathias Andersen 
Date:   2018-03-07T13:50:20Z

Added description of checkpointInterval parameter

Current behavior of ALS and checkpointInterval can result in unexpected 
behavior, I have added explicit description to hopefully reduce confusion.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org