[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-21 Thread keypointt
Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-213159510
  
Thank you for your review :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/12432


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-21 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-213158352
  
Nope, this LGTM
Thank you for the PR!
Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-21 Thread keypointt
Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-213098554
  
hi @jkbradley I just fixed those points you mentioned, is there anything 
extra should I do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread keypointt
Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212625966
  
hi @jkbradley I just fixed those points you mentioned, is there anything 
extra should I do?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212624337
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56410/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212624334
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212624080
  
**[Test build #56410 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56410/consoleFull)**
 for PR 12432 at commit 
[`5ef6f70`](https://github.com/apache/spark/commit/5ef6f70ef16b742962184d729da3623fea1d703b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212608802
  
**[Test build #56410 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56410/consoleFull)**
 for PR 12432 at commit 
[`5ef6f70`](https://github.com/apache/spark/commit/5ef6f70ef16b742962184d729da3623fea1d703b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212606502
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212606493
  
**[Test build #56409 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56409/consoleFull)**
 for PR 12432 at commit 
[`9c95790`](https://github.com/apache/spark/commit/9c95790a4cbef9a7bc5c55e9da9c8b095b9c6e44).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212606503
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56409/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212606188
  
**[Test build #56409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56409/consoleFull)**
 for PR 12432 at commit 
[`9c95790`](https://github.com/apache/spark/commit/9c95790a4cbef9a7bc5c55e9da9c8b095b9c6e44).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212591688
  
I just had a few small comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60481691
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -238,7 +246,9 @@ class KMeans private (
   /**
* Implementation of K-Means algorithm.
*/
-  private def runAlgorithm(data: RDD[VectorWithNorm]): KMeansModel = {
+  private def runAlgorithm(
+data: RDD[VectorWithNorm],
--- End diff --

indent 2 more space (this & next line)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60481662
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -206,12 +208,18 @@ class KMeans private (
 this
   }
 
+  def run(data: RDD[Vector]): KMeansModel = {
--- End diff --

This is the public method, so it needs to have the documentation and Since 
tag.  The private version does not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-20 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60481707
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -274,6 +284,10 @@ class KMeans private (
 
 val iterationStartTime = System.nanoTime()
 
+if (!instr.isEmpty) {
--- End diff --

simpler: ```instr.map(_.logNumFeatures(...))```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212239669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56316/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212239667
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212239615
  
**[Test build #56316 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56316/consoleFull)**
 for PR 12432 at commit 
[`61cb1de`](https://github.com/apache/spark/commit/61cb1decf6fdd03066709d04f88a077cf5d22c21).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212226074
  
**[Test build #56316 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56316/consoleFull)**
 for PR 12432 at commit 
[`61cb1de`](https://github.com/apache/spark/commit/61cb1decf6fdd03066709d04f88a077cf5d22c21).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread keypointt
Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212217086
  
Hi @jkbradley I removed null and now use Option, could you please have a 
look if it is ok now? thanks a lot


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212205981
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212205982
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56302/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212205896
  
**[Test build #56302 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56302/consoleFull)**
 for PR 12432 at commit 
[`d427bb3`](https://github.com/apache/spark/commit/d427bb3ad6ad20d9ba41226d585d6192d4f59029).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212190707
  
**[Test build #56302 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56302/consoleFull)**
 for PR 12432 at commit 
[`d427bb3`](https://github.com/apache/spark/commit/d427bb3ad6ad20d9ba41226d585d6192d4f59029).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-212156488
  
@keypointt Thanks for the updates.  I'll check back later!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324319
  
--- Diff: project/MimaExcludes.scala ---
@@ -626,6 +626,9 @@ object MimaExcludes {
 // [SPARK-13048][ML][MLLIB] keepLastCheckpoint option for LDA EM 
optimizer
 
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.DistributedLDAModel.this")
   ) ++ Seq(
+// [SPARK-14569][ML] Log instrumentation in KMeans
+
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.clustering.KMeans.run")
--- End diff --

This should not be needed after the changes I suggested above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324292
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -39,7 +39,7 @@ import org.apache.spark.sql.Dataset
  * @param dataset the training dataset
  * @tparam E the type of the estimator
  */
-private[ml] class Instrumentation[E <: Estimator[_]] private (
+class Instrumentation[E <: Estimator[_]] private (
--- End diff --

Change to ```private[spark]``` rather than making it public


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324297
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -21,6 +21,8 @@ import scala.collection.mutable.ArrayBuffer
 
 import org.apache.spark.annotation.Since
 import org.apache.spark.internal.Logging
+import org.apache.spark.ml.clustering
--- End diff --

Rename this to make it clear it is the new API:
```
import org.apache.spark.ml.clustering.{KMeans => NewKMeans}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324300
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -209,9 +211,10 @@ class KMeans private (
   /**
* Train a K-means model on the given set of points; `data` should be 
cached for high
* performance, because this is an iterative algorithm.
+   * `instr` is used to log instrumentation parameters.
--- End diff --

do not include this in public API docs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324316
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -287,6 +291,10 @@ class KMeans private (
 
   val bcActiveCenters = sc.broadcast(activeCenters)
 
+  if (instr != null) {
+instr.logNumFeatures(bcActiveCenters.value(0)(0).vector.size)
--- End diff --

This is being logged on every iteration, but it should only be logged once. 
 Move before the while loop, and set it using "centers".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324296
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala ---
@@ -95,7 +95,7 @@ private[ml] class Instrumentation[E <: Estimator[_]] 
private (
 /**
  * Some common methods for logging information about a training session.
  */
-private[ml] object Instrumentation {
+object Instrumentation {
--- End diff --

same here: ```private[spark]```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324313
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -238,7 +241,8 @@ class KMeans private (
   /**
* Implementation of K-Means algorithm.
*/
-  private def runAlgorithm(data: RDD[VectorWithNorm]): KMeansModel = {
+  private def runAlgorithm(data: RDD[VectorWithNorm],
--- End diff --

Please follow the Spark style guide: 
[https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide].
Here, for multi-line method headers, put 1 arg per line, and put the 
initial arg on the line below the method name.  Check out surrounding code for 
examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-19 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r60324307
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala ---
@@ -209,9 +211,10 @@ class KMeans private (
   /**
* Train a K-means model on the given set of points; `data` should be 
cached for high
* performance, because this is an iterative algorithm.
+   * `instr` is used to log instrumentation parameters.
*/
   @Since("0.8.0")
-  def run(data: RDD[Vector]): KMeansModel = {
+  def run(data: RDD[Vector], instr: Instrumentation[clustering.KMeans] = 
null): KMeansModel = {
--- End diff --

Default arguments are not Java friendly.  You'll need to do this:
```
def run(data: RDD[Vector]): KMeansModel = {
  run(data, None)
}

private[spark] def run(data: RDD[Vector], instr: 
Option[Instrumentation[clustering.KMeans]]): KMeansModel = ...
```
That way, we will not change the public API.  Note: I'd also use Option 
instead of null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread keypointt
Github user keypointt commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211627178
  
Hi @thunterdb, I made some changes and I'm not sure if this is the right 
way to do it. Would you mind have a look at it? thanks a lot


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211621515
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211621519
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56135/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211621243
  
**[Test build #56135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56135/consoleFull)**
 for PR 12432 at commit 
[`248c8b0`](https://github.com/apache/spark/commit/248c8b00eeebad14080e0076df476b360d953b0e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211573897
  
**[Test build #56135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56135/consoleFull)**
 for PR 12432 at commit 
[`248c8b0`](https://github.com/apache/spark/commit/248c8b00eeebad14080e0076df476b360d953b0e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211561004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56115/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211560999
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211560381
  
**[Test build #56115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56115/consoleFull)**
 for PR 12432 at commit 
[`e8acece`](https://github.com/apache/spark/commit/e8acecefee60958c8521bc82dc455061d32e1fe7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211522644
  
**[Test build #56115 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56115/consoleFull)**
 for PR 12432 at commit 
[`e8acece`](https://github.com/apache/spark/commit/e8acecefee60958c8521bc82dc455061d32e1fe7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211508789
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56106/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211508743
  
**[Test build #56106 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56106/consoleFull)**
 for PR 12432 at commit 
[`c8e35e2`](https://github.com/apache/spark/commit/c8e35e208a90f5bceb3ee84fce4af3ef887cade9).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211508786
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211504969
  
**[Test build #56106 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56106/consoleFull)**
 for PR 12432 at commit 
[`c8e35e2`](https://github.com/apache/spark/commit/c8e35e208a90f5bceb3ee84fce4af3ef887cade9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211496350
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/56102/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211496320
  
**[Test build #56102 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56102/consoleFull)**
 for PR 12432 at commit 
[`cc746e5`](https://github.com/apache/spark/commit/cc746e589dce9cce671b40d8086ce997d1afdd9d).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211496346
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-211494813
  
**[Test build #56102 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/56102/consoleFull)**
 for PR 12432 at commit 
[`cc746e5`](https://github.com/apache/spark/commit/cc746e589dce9cce671b40d8086ce997d1afdd9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread keypointt
Github user keypointt commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r59957731
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -264,6 +264,9 @@ class KMeans @Since("1.5.0") (
   override def fit(dataset: Dataset[_]): KMeansModel = {
 val rdd = dataset.select(col($(featuresCol))).rdd.map { case 
Row(point: Vector) => point }
 
+val instr = Instrumentation.create(this, rdd)
+instr.logParams(featuresCol, predictionCol, k, initMode, initSteps, 
maxIter, seed, tol)
+
 val algo = new MLlibKMeans()
--- End diff --

Thanks Timothy. I'm a starter on Spark sorry for being naive. 
I just want to confirm with you that I understand correctly.

1. for creating a new method `algo.run(rdd, instr)`, I just find I also 
need to create another method `runAlgorithm(zippedData, instr)` to take `instr` 
as a parameter 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L241
 , since inside 'runAlgorithm' is the dimension we want 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala#L295

1. class 'Instrumentation' is private and in ml package, so it cannot be 
accessed from mllib package. So I have to change it to be public by removing 
`private[ml] `? 
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala#L42


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-210675584
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55969/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-210675580
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-210675495
  
**[Test build #55969 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55969/consoleFull)**
 for PR 12432 at commit 
[`f9592e2`](https://github.com/apache/spark/commit/f9592e2588f0a5b987f6b822a06bbe2c94a3b4e6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread thunterdb
Github user thunterdb commented on a diff in the pull request:

https://github.com/apache/spark/pull/12432#discussion_r59950837
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -264,6 +264,9 @@ class KMeans @Since("1.5.0") (
   override def fit(dataset: Dataset[_]): KMeansModel = {
 val rdd = dataset.select(col($(featuresCol))).rdd.map { case 
Row(point: Vector) => point }
 
+val instr = Instrumentation.create(this, rdd)
+instr.logParams(featuresCol, predictionCol, k, initMode, initSteps, 
maxIter, seed, tol)
+
 val algo = new MLlibKMeans()
--- End diff --

one statistic that is usually very useful to get is the dimension of the 
vectors (`numFeatures`). One way to get it is to pass the instrumentation 
instance to `algo.run(rdd)` below, and mark this new method as `private[spark]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread thunterdb
Github user thunterdb commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-210671834
  
@keypointt thanks! I have one comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/12432#issuecomment-210668012
  
**[Test build #55969 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55969/consoleFull)**
 for PR 12432 at commit 
[`f9592e2`](https://github.com/apache/spark/commit/f9592e2588f0a5b987f6b822a06bbe2c94a3b4e6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14569][ML] Log instrumentation in KMean...

2016-04-15 Thread keypointt
GitHub user keypointt opened a pull request:

https://github.com/apache/spark/pull/12432

[SPARK-14569][ML] Log instrumentation in KMeans

## What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-14569

Log instrumentation in KMeans:

- featuresCol
- predictionCol
- k
- initMode
- initSteps
- maxIter
- seed
- tol
- summary

## How was this patch tested?

Manually test on local machine, by running and checking output of 
org.apache.spark.examples.ml.KMeansExample



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/keypointt/spark SPARK-14569

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/12432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12432


commit f9592e2588f0a5b987f6b822a06bbe2c94a3b4e6
Author: Xin Ren 
Date:   2016-04-15T21:48:13Z

[SPARK-14569] Log instrumentation in KMeans




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org