[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread vikasnp
Github user vikasnp commented on the pull request:

https://github.com/apache/spark/pull/9689#issuecomment-158014578
  
@yinxusen Thanks for patiently pointing out the issues. I've fixed them. 
Please test it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45323275
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/RegressionMetricsExample.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+
+package org.apache.spark.examples.mllib
+
+// $example on$
--- End diff --

remove it, there are two `$example on$`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11829] [ML] Add read/write to estimator...

2015-11-19 Thread yanboliang
GitHub user yanboliang opened a pull request:

https://github.com/apache/spark/pull/9838

[SPARK-11829] [ML] Add read/write to estimators under ml.feature (II)

Add read/write support to the following estimators under spark.ml:
* ChiSqSelector
* PCA
* VectorIndexer
* Word2Vec

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yanboliang/spark spark-11829

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9838.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9838


commit 0778a49d67162682b1caf376fcd4c9b47985a073
Author: Yanbo Liang 
Date:   2015-11-19T08:19:17Z

Add read/write support to ChiSqSelector, PCA, VectorIndexer

commit cc92a3968a39ceb2686b7098fc31965c2c0c77df
Author: Yanbo Liang 
Date:   2015-11-19T09:32:49Z

Add read/write support to Word2Vec

commit de3822efd896233313d23144a6f3fb08c380932b
Author: Yanbo Liang 
Date:   2015-11-19T10:18:44Z

code clean up

commit 5e79781fdb0aba0ae899ed16e052b4177afa855a
Author: Yanbo Liang 
Date:   2015-11-19T10:27:37Z

add @Since




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/9689#issuecomment-158018477
  
@mengxr LGTM except for some minor issues.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...

2015-11-19 Thread woj-i
Github user woj-i commented on the pull request:

https://github.com/apache/spark/pull/9837#issuecomment-158020139
  
I also see, that a change of authentication method from a simple to a 
kerberos was required while renewing credentials. I've made a commit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11829] [ML] Add read/write to estimator...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9838#issuecomment-158020055
  
**[Test build #46323 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46323/consoleFull)**
 for PR 9838 at commit 
[`5e79781`](https://github.com/apache/spark/commit/5e79781fdb0aba0ae899ed16e052b4177afa855a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9689#issuecomment-158020924
  
**[Test build #46324 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46324/consoleFull)**
 for PR 9689 at commit 
[`88512e7`](https://github.com/apache/spark/commit/88512e7ff1f1d55f31a5c12b57668216d39b22b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11829] [ML] Add read/write to estimator...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9838#issuecomment-158021745
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46323/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11829] [ML] Add read/write to estimator...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9838#issuecomment-158021742
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11829] [ML] Add read/write to estimator...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9838#issuecomment-158021739
  
**[Test build #46323 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46323/consoleFull)**
 for PR 9838 at commit 
[`5e79781`](https://github.com/apache/spark/commit/5e79781fdb0aba0ae899ed16e052b4177afa855a).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  
class LogisticRegressionModelWriter(instance: LogisticRegressionModel)`\n  * `  
class ChiSqSelectorModelWriter(instance: ChiSqSelectorModel) extends MLWriter 
`\n  * `class PCA (override val uid: String) extends Estimator[PCAModel] with 
PCAParams`\n  * `  class VectorIndexerModelWriter(instance: VectorIndexerModel) 
extends MLWriter `\n  * `final class Word2Vec(override val uid: String) extends 
Estimator[Word2VecModel] with Word2VecBase`\n  * `  class 
Word2VecModelWriter(instance: Word2VecModel) extends MLWriter `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11636][SQL] Support classes defined in ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9825#issuecomment-157988972
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46304/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11849][SQL] Analyzer should replace cur...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9833#issuecomment-157988169
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11849][SQL] Analyzer should replace cur...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9833#issuecomment-157988174
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46311/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11544][SQL][test-hadoop1.0] sqlContext ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9830#issuecomment-157984672
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46317/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11817][SQL] Truncating the fractional s...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9834#issuecomment-157987895
  
**[Test build #46310 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46310/consoleFull)**
 for PR 9834 at commit 
[`154f831`](https://github.com/apache/spark/commit/154f831e525a0a06d432d8d16f6cca231898e88d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `  
class LogisticRegressionModelWriter(instance: LogisticRegressionModel)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11849][SQL] Analyzer should replace cur...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9833#issuecomment-157988057
  
**[Test build #46311 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46311/consoleFull)**
 for PR 9833 at commit 
[`33013e6`](https://github.com/apache/spark/commit/33013e63caabf41965357bebd136759fff478717).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`public final class UnsafeSorterSpillReader extends UnsafeSorterIterator 
implements Closeable `\n  * `class Pipeline(override val uid: String) extends 
Estimator[PipelineModel] with MLWritable `\n  * `  class 
CountVectorizerModelWriter(instance: CountVectorizerModel) extends MLWriter `\n 
 * `final class IDF(override val uid: String) extends Estimator[IDFModel] with 
IDFBase`\n  * `  class MinMaxScalerModelWriter(instance: MinMaxScalerModel) 
extends MLWriter `\n  * `class SQLTransformer @Since(\"1.6.0\") (override val 
uid: String) extends Transformer`\n  * `  class 
StandardScalerModelWriter(instance: StandardScalerModel) extends MLWriter `\n  
* `  class StringIndexModelWriter(instance: StringIndexerModel) extends 
MLWriter `\n  * `class ALS(override val uid: String) extends 
Estimator[ALSModel] with ALSParams`\n  * `abstract class MLWriter extends 
BaseReadWrite with Logging `\n  * `trait MLWritable `\n  * `abstract
  class MLReader[T] extends BaseReadWrite `\n  * `trait MLReadable[T] `\n  * 
`trait ScalaReflection `\n  * `  case class Schema(dataType: DataType, 
nullable: Boolean)`\n  * `s\"Unable to generate an encoder for 
inner class `$`\n  * `public abstract class SpecificParquetRecordReaderBase 
extends RecordReader `\n  * `  abstract static class IntIterator `\n  
* `  protected static final class ValuesReaderIntIterator extends IntIterator 
`\n  * `  protected static final class RLEIntIterator extends IntIterator `\n  
* `  protected static final class NullIntIterator extends IntIterator `\n  * 
`public class UnsafeRowParquetRecordReader extends 
SpecificParquetRecordReaderBase `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11817][SQL] Truncating the fractional s...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9834#issuecomment-157988039
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11817][SQL] Truncating the fractional s...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9834#issuecomment-157988042
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46310/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11544][SQL][test-hadoop1.0] sqlContext ...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9830#issuecomment-157984623
  
**[Test build #46317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46317/consoleFull)**
 for PR 9830 at commit 
[`f6df5c8`](https://github.com/apache/spark/commit/f6df5c8a333da1ad8029c6b018dc52db4c2c488a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11544][SQL][test-hadoop1.0] sqlContext ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9830#issuecomment-157984671
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11544][SQL][test-hadoop1.0] sqlContext ...

2015-11-19 Thread dilipbiswal
Github user dilipbiswal commented on the pull request:

https://github.com/apache/spark/pull/9830#issuecomment-157985521
  
@yhuai Hello Yin, the test that fails seems to be unrelated. Failing with 

sbt.ForkMain$ForkError: Job aborted due to stage failure: Task 0 in stage 
139.0 failed 1 times, most recent failure: Lost task 0.0 in stage 139.0 (TID 
1591, localhost): java.lang.NoClassDefFoundError: 
au/com/bytecode/opencsv/CSVReader

Not sure if this test is supposed to run OK in this hadoop level. Please 
advise..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11846] Add save/load for AFTSurvivalReg...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9836#issuecomment-157985379
  
**[Test build #46321 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46321/consoleFull)**
 for PR 9836 at commit 
[`c784fab`](https://github.com/apache/spark/commit/c784fab77d658f04950b333f4ddb5302fa6e5ac6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11636][SQL] Support classes defined in ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9825#issuecomment-157988970
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11636][SQL] Support classes defined in ...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9825#issuecomment-157988787
  
**[Test build #46304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46304/consoleFull)**
 for PR 9825 at commit 
[`57a3367`](https://github.com/apache/spark/commit/57a33671285d06ea9f91bde20a586d5e0f589794).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * `
|class $`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11689] [ML] Add user guide and example ...

2015-11-19 Thread feynmanliang
Github user feynmanliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/9722#discussion_r45314139
  
--- Diff: docs/ml-clustering.md ---
@@ -0,0 +1,25 @@
+---
+layout: global
+title: Clustering - ML
+displayTitle: ML - Clustering
+---
+
+In `spark.ml`, we implement the corresponding pipeline API for 
--- End diff --

Sgtm

On Thu, Nov 19, 2015, 01:56 Yuhao Yang  wrote:

> In docs/ml-clustering.md
> :
>
> > @@ -0,0 +1,25 @@
> > +---
> > +layout: global
> > +title: Clustering - ML
> > +displayTitle: ML - Clustering
> > +---
> > +
> > +In `spark.ml`, we implement the corresponding pipeline API for
>
> How about, "In this section, we introduce the pipeline API for clustering
> in mllib " ?
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11846] Add save/load for AFTSurvivalReg...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9836#issuecomment-157993127
  
**[Test build #46321 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46321/consoleFull)**
 for PR 9836 at commit 
[`c784fab`](https://github.com/apache/spark/commit/c784fab77d658f04950b333f4ddb5302fa6e5ac6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11592][SQL]flush spark-sql command line...

2015-11-19 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/9563#issuecomment-157992982
  
How about adding a flush call after each `reader.readLine` call (there are 
only two occurrences)? I think this is more reliable than using a shutdown hook.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315224
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.regression.LinearRegressionModel;
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.SparkConf;
+// $example off$
+
+// Read in the ratings data
--- End diff --

remove the comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315283
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.regression.LinearRegressionModel;
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.SparkConf;
+// $example off$
+
+// Read in the ratings data
+public class JavaLinearRegressionExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Linear Regression 
Example");
--- End diff --

Java Regression Metrics Example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315807
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMultiLabelClassificationMetricsExample.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+
--- End diff --

remove a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316264
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMultiLabelClassificationMetricsExample.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+
+
+import java.util.Arrays;
--- End diff --

change the imports like this:

```java
// $example on$
import java.util.Arrays;
import java.util.List;

import scala.Tuple2;

import org.apache.spark.api.java.*;
import org.apache.spark.mllib.evaluation.MultilabelMetrics;
import org.apache.spark.SparkConf;
// $example off$
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316338
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMultiLabelClassificationMetricsExample.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+
+
+import java.util.Arrays;
+import java.util.List;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.mllib.evaluation.MultilabelMetrics;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+// $example off$
+import org.apache.spark.SparkContext;
+
+public class JavaMultiLabelClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multilabel Classification 
Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+List> data = Arrays.asList(
+  new Tuple2(new double[]{0.0, 1.0}, new 
double[]{0.0, 2.0}),
+  new Tuple2(new double[]{0.0, 2.0}, new 
double[]{0.0, 1.0}),
+  new Tuple2(new double[]{}, new double[]{0.0}),
+  new Tuple2(new double[]{2.0}, new double[]{2.0}),
+  new Tuple2(new double[]{2.0, 0.0}, new 
double[]{2.0, 0.0}),
+  new Tuple2(new double[]{0.0, 1.0, 2.0}, new 
double[]{0.0, 1.0}),
+  new Tuple2(new double[]{1.0}, new double[]{1.0, 
2.0})
+);
+JavaRDD> scoreAndLabels = 
sc.parallelize(data);
+
+// Instantiate metrics object
+MultilabelMetrics metrics = new 
MultilabelMetrics(scoreAndLabels.rdd());
+
+// Summary stats
+System.out.format("Recall = %f\n", metrics.recall());
+System.out.format("Precision = %f\n", metrics.precision());
+System.out.format("F1 measure = %f\n", metrics.f1Measure());
+System.out.format("Accuracy = %f\n", metrics.accuracy());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length - 1; i++) {
+  System.out.format("Class %1.1f precision = %f\n", 
metrics.labels()[i], metrics.precision
+  (metrics.labels()[i]));
+  System.out.format("Class %1.1f recall = %f\n", metrics.labels()[i], 
metrics.recall(metrics
+  .labels()[i]));
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316350
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMultiLabelClassificationMetricsExample.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+
+
+import java.util.Arrays;
+import java.util.List;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.mllib.evaluation.MultilabelMetrics;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+// $example off$
+import org.apache.spark.SparkContext;
+
+public class JavaMultiLabelClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multilabel Classification 
Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+List> data = Arrays.asList(
+  new Tuple2(new double[]{0.0, 1.0}, new 
double[]{0.0, 2.0}),
+  new Tuple2(new double[]{0.0, 2.0}, new 
double[]{0.0, 1.0}),
+  new Tuple2(new double[]{}, new double[]{0.0}),
+  new Tuple2(new double[]{2.0}, new double[]{2.0}),
+  new Tuple2(new double[]{2.0, 0.0}, new 
double[]{2.0, 0.0}),
+  new Tuple2(new double[]{0.0, 1.0, 2.0}, new 
double[]{0.0, 1.0}),
+  new Tuple2(new double[]{1.0}, new double[]{1.0, 
2.0})
+);
+JavaRDD> scoreAndLabels = 
sc.parallelize(data);
+
+// Instantiate metrics object
+MultilabelMetrics metrics = new 
MultilabelMetrics(scoreAndLabels.rdd());
+
+// Summary stats
+System.out.format("Recall = %f\n", metrics.recall());
+System.out.format("Precision = %f\n", metrics.precision());
+System.out.format("F1 measure = %f\n", metrics.f1Measure());
+System.out.format("Accuracy = %f\n", metrics.accuracy());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length - 1; i++) {
+  System.out.format("Class %1.1f precision = %f\n", 
metrics.labels()[i], metrics.precision
+  (metrics.labels()[i]));
+  System.out.format("Class %1.1f recall = %f\n", metrics.labels()[i], 
metrics.recall(metrics
+  .labels()[i]));
+  System.out.format("Class %1.1f F1 score = %f\n", 
metrics.labels()[i], metrics.f1Measure
+  (metrics.labels()[i]));
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316327
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMultiLabelClassificationMetricsExample.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+
+
+import java.util.Arrays;
+import java.util.List;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.mllib.evaluation.MultilabelMetrics;
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+// $example off$
+import org.apache.spark.SparkContext;
+
+public class JavaMultiLabelClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multilabel Classification 
Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+List> data = Arrays.asList(
+  new Tuple2(new double[]{0.0, 1.0}, new 
double[]{0.0, 2.0}),
+  new Tuple2(new double[]{0.0, 2.0}, new 
double[]{0.0, 1.0}),
+  new Tuple2(new double[]{}, new double[]{0.0}),
+  new Tuple2(new double[]{2.0}, new double[]{2.0}),
+  new Tuple2(new double[]{2.0, 0.0}, new 
double[]{2.0, 0.0}),
+  new Tuple2(new double[]{0.0, 1.0, 2.0}, new 
double[]{0.0, 1.0}),
+  new Tuple2(new double[]{1.0}, new double[]{1.0, 
2.0})
+);
+JavaRDD> scoreAndLabels = 
sc.parallelize(data);
+
+// Instantiate metrics object
+MultilabelMetrics metrics = new 
MultilabelMetrics(scoreAndLabels.rdd());
+
+// Summary stats
+System.out.format("Recall = %f\n", metrics.recall());
+System.out.format("Precision = %f\n", metrics.precision());
+System.out.format("F1 measure = %f\n", metrics.f1Measure());
+System.out.format("Accuracy = %f\n", metrics.accuracy());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length - 1; i++) {
+  System.out.format("Class %1.1f precision = %f\n", 
metrics.labels()[i], metrics.precision
+  (metrics.labels()[i]));
--- End diff --

2-indention


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316717
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
--- End diff --

remove the import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316780
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
--- End diff --

Add a "Java" in AppName of Java code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317047
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_movielens_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD ratings = data.map(
+  new Function() {
+public Rating call(String line) {
+  String[] parts = line.split("::");
+return new Rating(Integer.parseInt(parts[0]), 
Integer.parseInt(parts[1]), Double
+.parseDouble(parts[2]) - 2.5);
+}
+  }
+);
+ratings.cache();
+
+// Train an ALS model
+final MatrixFactorizationModel model = 
ALS.train(JavaRDD.toRDD(ratings), 10, 10, 0.01);
+
+// Get top 10 recommendations for every user and scale ratings from 0 
to 1
+JavaRDD> userRecs = 
model.recommendProductsForUsers(10).toJavaRDD();
+JavaRDD> userRecsScaled = userRecs.map(
+  new Function, Tuple2>() {
+public Tuple2 call(Tuple2 t) {
+  Rating[] scaledRatings = new Rating[t._2().length];
+  for (int i = 0; i < scaledRatings.length; i++) {
+double newRating = Math.max(Math.min(t._2()[i].rating(), 1.0), 
0.0);
+scaledRatings[i] = new Rating(t._2()[i].user(), 
t._2()[i].product(), newRating);
+  }
+  return new Tuple2(t._1(), scaledRatings);
+}
+  }
+);
+JavaPairRDD userRecommended = 
JavaPairRDD.fromJavaRDD(userRecsScaled);
+
+// Map ratings to 1 or 0, 1 indicating a movie that should be 
recommended
+JavaRDD binarizedRatings = ratings.map(
+  new Function() {
+public Rating call(Rating r) {
+  double binaryRating;
+  if (r.rating() > 0.0) {
+  binaryRating = 1.0;
+  } else {
+  binaryRating = 0.0;
+  }
+  return new Rating(r.user(), r.product(), binaryRating);
+}
+  }
+);
+
+// Group ratings by common user
+JavaPairRDD userMovies = 
binarizedRatings.groupBy(
+new Function() {
+  public Object call(Rating r) {
+return r.user();
+  }
+}
+);
+
+// Get true relevant documents from all user ratings
+JavaPairRDD userMoviesList = 
userMovies.mapValues(
+  new Function() {
+public List call(Iterable docs) {
+  List products = new ArrayList();
+  for (Rating r : docs) {
+if (r.rating() > 0.0) {
+  products.add(r.product());
+}
+

[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317028
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_movielens_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD ratings = data.map(
+  new Function() {
+public Rating call(String line) {
+  String[] parts = line.split("::");
+return new Rating(Integer.parseInt(parts[0]), 
Integer.parseInt(parts[1]), Double
+.parseDouble(parts[2]) - 2.5);
+}
+  }
+);
+ratings.cache();
+
+// Train an ALS model
+final MatrixFactorizationModel model = 
ALS.train(JavaRDD.toRDD(ratings), 10, 10, 0.01);
+
+// Get top 10 recommendations for every user and scale ratings from 0 
to 1
+JavaRDD> userRecs = 
model.recommendProductsForUsers(10).toJavaRDD();
+JavaRDD> userRecsScaled = userRecs.map(
+  new Function, Tuple2>() {
+public Tuple2 call(Tuple2 t) {
+  Rating[] scaledRatings = new Rating[t._2().length];
+  for (int i = 0; i < scaledRatings.length; i++) {
+double newRating = Math.max(Math.min(t._2()[i].rating(), 1.0), 
0.0);
+scaledRatings[i] = new Rating(t._2()[i].user(), 
t._2()[i].product(), newRating);
+  }
+  return new Tuple2(t._1(), scaledRatings);
+}
+  }
+);
+JavaPairRDD userRecommended = 
JavaPairRDD.fromJavaRDD(userRecsScaled);
+
+// Map ratings to 1 or 0, 1 indicating a movie that should be 
recommended
+JavaRDD binarizedRatings = ratings.map(
+  new Function() {
+public Rating call(Rating r) {
+  double binaryRating;
+  if (r.rating() > 0.0) {
+  binaryRating = 1.0;
+  } else {
+  binaryRating = 0.0;
+  }
+  return new Rating(r.user(), r.product(), binaryRating);
+}
+  }
+);
+
+// Group ratings by common user
+JavaPairRDD userMovies = 
binarizedRatings.groupBy(
+new Function() {
+  public Object call(Rating r) {
+return r.user();
+  }
+}
+);
+
+// Get true relevant documents from all user ratings
+JavaPairRDD userMoviesList = 
userMovies.mapValues(
+  new Function() {
+public List call(Iterable docs) {
+  List products = new ArrayList();
+  for (Rating r : docs) {
+if (r.rating() > 0.0) {
+  products.add(r.product());
+}
+

[GitHub] spark pull request: [SPARK-11848][SQL] Support EXPLAIN in DataSet ...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9832#issuecomment-157998646
  
**[Test build #46313 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46313/consoleFull)**
 for PR 9832 at commit 
[`1dcc888`](https://github.com/apache/spark/commit/1dcc888fd1445e9c8e31eb4fe1b3d6af5a84dc32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317289
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/RegressionMetricsExample.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.regression.LabeledPoint
--- End diff --

change the imports here like below:

```scala
// $example on$
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.evaluation.RegressionMetrics
import org.apache.spark.mllib.util.MLUtils
// $example off$
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11848][SQL] Support EXPLAIN in DataSet ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9832#issuecomment-157998901
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11750][SQL] revert SPARK-11727 and code...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9726#issuecomment-158001295
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317679
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import org.apache.spark.mllib.evaluation.MulticlassMetrics
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+// $example off$
+import org.apache.spark.{SparkContext, SparkConf}
+
+object MulticlassMetricsExample {
+
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("MulticlassMetrics")
--- End diff --

chang the AppName to `MulticlassMetricsExample`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11750][SQL] revert SPARK-11727 and code...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9726#issuecomment-158001301
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46314/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317640
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11750][SQL] revert SPARK-11727 and code...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9726#issuecomment-158000863
  
**[Test build #46314 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46314/consoleFull)**
 for PR 9726 at commit 
[`d0a294e`](https://github.com/apache/spark/commit/d0a294ed8254af1623e9ee2e90291e6790e2df67).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317830
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.evaluation.MultilabelMetrics
+import org.apache.spark.rdd.RDD;
--- End diff --

no need of `;` in the end


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317987
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.evaluation.MultilabelMetrics
+import org.apache.spark.rdd.RDD;
+// $example off$
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.{SparkContext, SparkConf}
+
+object MultiLabelMetricsExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("MultiLabelMetricsExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext.implicits._
+// $example on$
+val scoreAndLabels: RDD[(Array[Double], Array[Double])] = 
sc.parallelize(
+  Seq((Array(0.0, 1.0), Array(0.0, 2.0)),
+(Array(0.0, 2.0), Array(0.0, 1.0)),
+(Array(), Array(0.0)),
--- End diff --

change the line into `(Array.empty[Double], Array(0.0)),`, otherwise it 
mismatchs the types of `RDD[(Array[Double], Array[Double])]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11849][SQL] Analyzer should replace cur...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9833#issuecomment-158003590
  
**[Test build #46319 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46319/consoleFull)**
 for PR 9833 at commit 
[`1ddbe84`](https://github.com/apache/spark/commit/1ddbe84505344ce4bbfb7d34fc56d4becfd1f224).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11849][SQL] Analyzer should replace cur...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9833#issuecomment-158003743
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11849][SQL] Analyzer should replace cur...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9833#issuecomment-158003744
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46319/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45318615
  
--- Diff: examples/src/main/python/mllib/multi_label_metrics_example.py ---
@@ -0,0 +1,62 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# $example on$
+from pyspark.mllib.evaluation import MultilabelMetrics
+# $example off$
+from pyspark.mllib.util import MLUtils
--- End diff --

remove the import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-158006811
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-158006812
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46320/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11206] Support SQL UI on the history se...

2015-11-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9297#issuecomment-158006714
  
**[Test build #46320 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46320/consoleFull)**
 for PR 9297 at commit 
[`56f24ba`](https://github.com/apache/spark/commit/56f24bafb3f5b9306bf393220d018e0620f1296d).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`class SparkPlanInfo(`\n  * `class SQLMetricInfo(`\n  * `case class 
SparkListenerSQLExecutionStart(`\n  * `case class 
SparkListenerSQLExecutionEnd(executionId: Long, time: Long)`\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11848][SQL] Support EXPLAIN in DataSet ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9832#issuecomment-158007676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46318/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45314368
  
--- Diff: docs/mllib-evaluation-metrics.md ---
@@ -1350,163 +582,21 @@ and evaluate the performance of the algorithm by 
several regression metrics.
 
 Refer to the [`RegressionMetrics` Scala 
docs](api/scala/index.html#org.apache.spark.mllib.evaluation.RegressionMetrics) 
for details on the API.
 
-{% highlight scala %}
-import org.apache.spark.mllib.regression.LabeledPoint
-import org.apache.spark.mllib.regression.LinearRegressionModel
-import org.apache.spark.mllib.regression.LinearRegressionWithSGD
-import org.apache.spark.mllib.linalg.Vectors
-import org.apache.spark.mllib.evaluation.RegressionMetrics
-import org.apache.spark.mllib.util.MLUtils
-
-// Load the data
-val data = MLUtils.loadLibSVMFile(sc, 
"data/mllib/sample_linear_regression_data.txt").cache()
-
-// Build the model
-val numIterations = 100
-val model = LinearRegressionWithSGD.train(data, numIterations)
-
-// Get predictions
-val valuesAndPreds = data.map{ point =>
-  val prediction = model.predict(point.features)
-  (prediction, point.label)
-}
-
-// Instantiate metrics object
-val metrics = new RegressionMetrics(valuesAndPreds)
-
-// Squared error
-println(s"MSE = ${metrics.meanSquaredError}")
-println(s"RMSE = ${metrics.rootMeanSquaredError}")
-
-// R-squared
-println(s"R-squared = ${metrics.r2}")
-
-// Mean absolute error
-println(s"MAE = ${metrics.meanAbsoluteError}")
-
-// Explained variance
-println(s"Explained variance = ${metrics.explainedVariance}")
-
-{% endhighlight %}
+{% include_example 
scala/org/apache/spark/examples/mllib/RegressionMetricsExample.scala %}
 
 
 
 
 Refer to the [`RegressionMetrics` Java 
docs](api/java/org/apache/spark/mllib/evaluation/RegressionMetrics.html) for 
details on the API.
 
-{% highlight java %}
-import scala.Tuple2;
-
-import org.apache.spark.api.java.*;
-import org.apache.spark.api.java.function.Function;
-import org.apache.spark.mllib.linalg.Vectors;
-import org.apache.spark.mllib.regression.LabeledPoint;
-import org.apache.spark.mllib.regression.LinearRegressionModel;
-import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
-import org.apache.spark.mllib.evaluation.RegressionMetrics;
-import org.apache.spark.SparkConf;
-
-public class LinearRegression {
-  public static void main(String[] args) {
-SparkConf conf = new SparkConf().setAppName("Linear Regression 
Example");
-JavaSparkContext sc = new JavaSparkContext(conf);
-
-// Load and parse the data
-String path = "data/mllib/sample_linear_regression_data.txt";
-JavaRDD data = sc.textFile(path);
-JavaRDD parsedData = data.map(
-  new Function() {
-public LabeledPoint call(String line) {
-  String[] parts = line.split(" ");
-  double[] v = new double[parts.length - 1];
-  for (int i = 1; i < parts.length - 1; i++)
-v[i - 1] = Double.parseDouble(parts[i].split(":")[1]);
-  return new LabeledPoint(Double.parseDouble(parts[0]), 
Vectors.dense(v));
-}
-  }
-);
-parsedData.cache();
-
-// Building the model
-int numIterations = 100;
-final LinearRegressionModel model =
-  LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData), 
numIterations);
-
-// Evaluate model on training examples and compute training error
-JavaRDD> valuesAndPreds = parsedData.map(
-  new Function>() {
-public Tuple2 call(LabeledPoint point) {
-  double prediction = model.predict(point.features());
-  return new Tuple2(prediction, point.label());
-}
-  }
-);
-
-// Instantiate metrics object
-RegressionMetrics metrics = new 
RegressionMetrics(valuesAndPreds.rdd());
-
-// Squared error
-System.out.format("MSE = %f\n", metrics.meanSquaredError());
-System.out.format("RMSE = %f\n", metrics.rootMeanSquaredError());
-
-// R-squared
-System.out.format("R Squared = %f\n", metrics.r2());
-
-// Mean absolute error
-System.out.format("MAE = %f\n", metrics.meanAbsoluteError());
-
-// Explained variance
-System.out.format("Explained Variance = %f\n", 
metrics.explainedVariance());
-
-// Save and load model
-model.save(sc.sc(), "myModelPath");
-LinearRegressionModel sameModel = LinearRegressionModel.load(sc.sc(), 
"myModelPath");
-  }
-}
-

[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45314891
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaBinaryClassificationMetricsExample.java
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.rdd.RDD;
+// $example off$
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaBinaryClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Binary Classification 
Metrics Example");
--- End diff --

Java Binary Classification Metrics Example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45314754
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaBinaryClassificationMetricsExample.java
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
--- End diff --

add a space line here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45314829
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaBinaryClassificationMetricsExample.java
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.rdd.RDD;
--- End diff --

remove the import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45314715
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaBinaryClassificationMetricsExample.java
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315688
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.regression.LinearRegressionModel;
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.SparkConf;
+// $example off$
+
+// Read in the ratings data
+public class JavaLinearRegressionExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Linear Regression 
Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+// Load and parse the data
+String path = "data/mllib/sample_linear_regression_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD parsedData = data.map(
+  new Function() {
+public LabeledPoint call(String line) {
+  String[] parts = line.split(" ");
+  double[] v = new double[parts.length - 1];
+  for (int i = 1; i < parts.length - 1; i++)
+v[i - 1] = Double.parseDouble(parts[i].split(":")[1]);
+  return new LabeledPoint(Double.parseDouble(parts[0]), 
Vectors.dense(v));
+}
+  }
+);
+parsedData.cache();
+
+// Building the model
+int numIterations = 100;
+final LinearRegressionModel model = 
LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData),
+numIterations);
--- End diff --

2-indention here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315702
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.regression.LinearRegressionModel;
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.SparkConf;
+// $example off$
+
+// Read in the ratings data
+public class JavaLinearRegressionExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Linear Regression 
Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+// Load and parse the data
+String path = "data/mllib/sample_linear_regression_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD parsedData = data.map(
+  new Function() {
+public LabeledPoint call(String line) {
+  String[] parts = line.split(" ");
+  double[] v = new double[parts.length - 1];
+  for (int i = 1; i < parts.length - 1; i++)
+v[i - 1] = Double.parseDouble(parts[i].split(":")[1]);
+  return new LabeledPoint(Double.parseDouble(parts[0]), 
Vectors.dense(v));
+}
+  }
+);
+parsedData.cache();
+
+// Building the model
+int numIterations = 100;
+final LinearRegressionModel model = 
LinearRegressionWithSGD.train(JavaRDD.toRDD(parsedData),
+numIterations);
+
+// Evaluate model on training examples and compute training error
+JavaRDD> valuesAndPreds = parsedData.map(
+  new Function>() {
+public Tuple2 call(LabeledPoint point) {
+  double prediction = model.predict(point.features());
+  return new Tuple2(prediction, point.label());
+}
+  }
+);
+
+// Instantiate metrics object
+RegressionMetrics metrics = new 
RegressionMetrics(valuesAndPreds.rdd());
+
+// Squared error
+System.out.format("MSE = %f\n", metrics.meanSquaredError());
+System.out.format("RMSE = %f\n", metrics.rootMeanSquaredError());
+
+// R-squared
+System.out.format("R Squared = %f\n", metrics.r2());
+
+// Mean absolute error
+System.out.format("MAE = %f\n", metrics.meanAbsoluteError());
+
+// Explained variance
+System.out.format("Explained Variance = %f\n", 
metrics.explainedVariance());
+
+// Save and load model
+model.save(sc.sc(), "target/tmp/LogisticRegressionModel");
+LinearRegressionModel sameModel = LinearRegressionModel.load(sc.sc(),
+"target/tmp/LogisticRegressionModel");
--- End diff --

2-indention here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315720
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaBinaryClassificationMetricsExample.java
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.rdd.RDD;
+// $example off$
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaBinaryClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Binary Classification 
Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_binary_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
+JavaRDD training = splits[0].cache();
+JavaRDD test = splits[1];
+
+// Run training algorithm to build the model.
+final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+  .setNumClasses(2)
+  .run(training.rdd());
+
+// Clear the prediction threshold so the model will return 
probabilities
+model.clearThreshold();
+
+// Compute raw scores on the test set.
+JavaRDD> predictionAndLabels = test.map(
+  new Function>() {
+public Tuple2 call(LabeledPoint p) {
+  Double prediction = model.predict(p.features());
+  return new Tuple2(prediction, p.label());
+}
+  }
+);
+
+// Get evaluation metrics.
+BinaryClassificationMetrics metrics = new 
BinaryClassificationMetrics(predictionAndLabels.rdd());
+
+// Precision by threshold
+JavaRDD> precision = 
metrics.precisionByThreshold().toJavaRDD();
+System.out.println("Precision by threshold: " + precision.toArray());
+
+// Recall by threshold
+JavaRDD> recall = 
metrics.recallByThreshold().toJavaRDD();
+System.out.println("Recall by threshold: " + recall.toArray());
+
+// F Score by threshold
+JavaRDD> f1Score = 
metrics.fMeasureByThreshold().toJavaRDD();
+System.out.println("F1 Score by threshold: " + f1Score.toArray());
+
+JavaRDD> f2Score = 
metrics.fMeasureByThreshold(2.0).toJavaRDD();
+System.out.println("F2 Score by threshold: " + f2Score.toArray());
+
+// Precision-recall curve
+JavaRDD> prc = metrics.pr().toJavaRDD();
+System.out.println("Precision-recall curve: " + prc.toArray());
+
+// Thresholds
+JavaRDD thresholds = precision.map(
+  new Function, Double>() {
+public Double call(Tuple2 t) {
+  return new Double(t._1().toString());
+}
+  }
+);
+
+// ROC Curve
+JavaRDD> roc = metrics.roc().toJavaRDD();
+System.out.println("ROC curve: " + roc.toArray());
+
+// AUPRC
+System.out.println("Area under precision-recall curve = " + 
metrics.areaUnderPR());
+
+  

[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315795
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMultiLabelClassificationMetricsExample.java
 ---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316484
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+// $example off$
+import org.apache.spark.rdd.RDD;
--- End diff --

remove the import 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316576
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaMulticlassClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multi class 
Classification Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_multiclass_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
+JavaRDD training = splits[0].cache();
+JavaRDD test = splits[1];
+
+// Run training algorithm to build the model.
+final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+  .setNumClasses(3)
+.run(training.rdd());
+
+// Compute raw scores on the test set.
+JavaRDD> predictionAndLabels = test.map(
+  new Function>() {
+public Tuple2 call(LabeledPoint p) {
+  Double prediction = model.predict(p.features());
+  return new Tuple2(prediction, p.label());
+}
+  }
+);
+
+// Get evaluation metrics.
+MulticlassMetrics metrics = new 
MulticlassMetrics(predictionAndLabels.rdd());
+
+// Confusion matrix
+Matrix confusion = metrics.confusionMatrix();
+System.out.println("Confusion matrix: \n" + confusion);
+
+// Overall statistics
+System.out.println("Precision = " + metrics.precision());
+System.out.println("Recall = " + metrics.recall());
+System.out.println("F1 Score = " + metrics.fMeasure());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length; i++) {
+  System.out.format("Class %f precision = %f\n", 
metrics.labels()[i],metrics.precision
+  (metrics.labels()[i]));
+  System.out.format("Class %f recall = %f\n", metrics.labels()[i], 
metrics.recall(metrics
+  .labels()[i]));
+  System.out.format("Class %f F1 score = %f\n", metrics.labels()[i], 
metrics.fMeasure
+  (metrics.labels()[i]));
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316513
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaMulticlassClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multi class 
Classification Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_multiclass_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
+JavaRDD training = splits[0].cache();
+JavaRDD test = splits[1];
+
+// Run training algorithm to build the model.
+final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+  .setNumClasses(3)
+.run(training.rdd());
--- End diff --

align with previous line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317388
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/RankingMetricsExample.scala
 ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317767
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317784
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MultiLabelMetricsExample.scala
 ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45318233
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+// $example off$
+
--- End diff --

remove the space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45318263
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+// $example off$
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.sql.SQLContext
+
+object BinaryClassificationMetricsExample {
+
+  def main(args: Array[String]): Unit = {
+
+val conf = new 
SparkConf().setAppName("BinaryClassificationMetricsExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext.implicits._
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45318251
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/BinaryClassificationMetricsExample.scala
 ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+// $example on$
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.mllib.util.MLUtils
+// $example off$
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.sql.SQLContext
+
+object BinaryClassificationMetricsExample {
+
+  def main(args: Array[String]): Unit = {
+
+val conf = new 
SparkConf().setAppName("BinaryClassificationMetricsExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
--- End diff --

remove the line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45320839
  
--- Diff: 
examples/src/main/python/mllib/binary_classification_metrics_example.py ---
@@ -0,0 +1,56 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+Binary Classification Metrics Example.
+"""
+from __future__ import print_function
+import sys
+from pyspark import SparkContext, SQLContext
+# $example on$
+from pyspark.mllib.classification import LogisticRegressionWithLBFGS
+from pyspark.mllib.evaluation import BinaryClassificationMetrics
+from pyspark.mllib.regression import LabeledPoint
--- End diff --

remove the import


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45320834
  
--- Diff: 
examples/src/main/python/mllib/binary_classification_metrics_example.py ---
@@ -0,0 +1,56 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+"""
+Binary Classification Metrics Example.
+"""
+from __future__ import print_function
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11846] Add save/load for AFTSurvivalReg...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9836#issuecomment-157993279
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46321/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11846] Add save/load for AFTSurvivalReg...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9836#issuecomment-157993278
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315020
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaBinaryClassificationMetricsExample.java
 ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.rdd.RDD;
+// $example off$
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaBinaryClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Binary Classification 
Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_binary_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
--- End diff --

The line exceeds 100 chars limit. Change it to

```java
JavaRDD[] splits = 
  data.randomSplit(new double[]{0.6, 0.4}, 11L);
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315175
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315181
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315157
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.regression.LinearRegressionModel;
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.SparkConf;
+// $example off$
+
+// Read in the ratings data
+public class JavaLinearRegressionExample {
--- End diff --

Change the name to `JavaRegressionMetricsExample`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9837#issuecomment-157995224
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11821] Propagate Kerberos keytab for al...

2015-11-19 Thread woj-i
GitHub user woj-i opened a pull request:

https://github.com/apache/spark/pull/9837

[SPARK-11821] Propagate Kerberos keytab for all environments

I prepared a patch for recent bugfix. The scope of the previous bugfix is 
too narrow- it works only on YARN. I need it on local mode and I think the 
other modes also need the information (because reflection works the same for 
each environment having JVM).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/woj-i/spark branch-1.5

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9837


commit 7803fd30be4c14b23611a61216a4560da5604194
Author: woj-i 
Date:   2015-11-19T09:03:55Z

[SPARK-11821] Propagate Kerberos keytab for all environments (not only YARN)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45315584
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaLinearRegressionExample.java
 ---
@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.linalg.Vectors;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.regression.LinearRegressionModel;
+import org.apache.spark.mllib.regression.LinearRegressionWithSGD;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.SparkConf;
+// $example off$
+
+// Read in the ratings data
+public class JavaLinearRegressionExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Linear Regression 
Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+// Load and parse the data
+String path = "data/mllib/sample_linear_regression_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD parsedData = data.map(
--- End diff --

Here is strange because we do not need to parse data ourselves. Try to 
change it with `MLUtils.loadLibSVMFile(sc, path).toJavaRDD();`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316420
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316454
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316553
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaMulticlassClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multi class 
Classification Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_multiclass_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
+JavaRDD training = splits[0].cache();
+JavaRDD test = splits[1];
+
+// Run training algorithm to build the model.
+final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+  .setNumClasses(3)
+.run(training.rdd());
+
+// Compute raw scores on the test set.
+JavaRDD> predictionAndLabels = test.map(
+  new Function>() {
+public Tuple2 call(LabeledPoint p) {
+  Double prediction = model.predict(p.features());
+  return new Tuple2(prediction, p.label());
+}
+  }
+);
+
+// Get evaluation metrics.
+MulticlassMetrics metrics = new 
MulticlassMetrics(predictionAndLabels.rdd());
+
+// Confusion matrix
+Matrix confusion = metrics.confusionMatrix();
+System.out.println("Confusion matrix: \n" + confusion);
+
+// Overall statistics
+System.out.println("Precision = " + metrics.precision());
+System.out.println("Recall = " + metrics.recall());
+System.out.println("F1 Score = " + metrics.fMeasure());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length; i++) {
+  System.out.format("Class %f precision = %f\n", 
metrics.labels()[i],metrics.precision
+  (metrics.labels()[i]));
--- End diff --

2-indention


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316563
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaMulticlassClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multi class 
Classification Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_multiclass_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
+JavaRDD training = splits[0].cache();
+JavaRDD test = splits[1];
+
+// Run training algorithm to build the model.
+final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+  .setNumClasses(3)
+.run(training.rdd());
+
+// Compute raw scores on the test set.
+JavaRDD> predictionAndLabels = test.map(
+  new Function>() {
+public Tuple2 call(LabeledPoint p) {
+  Double prediction = model.predict(p.features());
+  return new Tuple2(prediction, p.label());
+}
+  }
+);
+
+// Get evaluation metrics.
+MulticlassMetrics metrics = new 
MulticlassMetrics(predictionAndLabels.rdd());
+
+// Confusion matrix
+Matrix confusion = metrics.confusionMatrix();
+System.out.println("Confusion matrix: \n" + confusion);
+
+// Overall statistics
+System.out.println("Precision = " + metrics.precision());
+System.out.println("Recall = " + metrics.recall());
+System.out.println("F1 Score = " + metrics.fMeasure());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length; i++) {
+  System.out.format("Class %f precision = %f\n", 
metrics.labels()[i],metrics.precision
+  (metrics.labels()[i]));
+  System.out.format("Class %f recall = %f\n", metrics.labels()[i], 
metrics.recall(metrics
+  .labels()[i]));
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316588
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaMulticlassClassificationMetricsExample.java
 ---
@@ -0,0 +1,96 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.examples.mllib;
+// $example on$
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.classification.LogisticRegressionModel;
+import org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS;
+import org.apache.spark.mllib.evaluation.MulticlassMetrics;
+import org.apache.spark.mllib.regression.LabeledPoint;
+import org.apache.spark.mllib.util.MLUtils;
+import org.apache.spark.mllib.linalg.Matrix;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+import org.apache.spark.SparkContext;
+
+public class JavaMulticlassClassificationMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Multi class 
Classification Metrics Example");
+SparkContext sc = new SparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_multiclass_classification_data.txt";
+JavaRDD data = MLUtils.loadLibSVMFile(sc, 
path).toJavaRDD();
+
+// Split initial RDD into two... [60% training data, 40% testing data].
+JavaRDD[] splits = data.randomSplit(new double[]{0.6, 
0.4}, 11L);
+JavaRDD training = splits[0].cache();
+JavaRDD test = splits[1];
+
+// Run training algorithm to build the model.
+final LogisticRegressionModel model = new LogisticRegressionWithLBFGS()
+  .setNumClasses(3)
+.run(training.rdd());
+
+// Compute raw scores on the test set.
+JavaRDD> predictionAndLabels = test.map(
+  new Function>() {
+public Tuple2 call(LabeledPoint p) {
+  Double prediction = model.predict(p.features());
+  return new Tuple2(prediction, p.label());
+}
+  }
+);
+
+// Get evaluation metrics.
+MulticlassMetrics metrics = new 
MulticlassMetrics(predictionAndLabels.rdd());
+
+// Confusion matrix
+Matrix confusion = metrics.confusionMatrix();
+System.out.println("Confusion matrix: \n" + confusion);
+
+// Overall statistics
+System.out.println("Precision = " + metrics.precision());
+System.out.println("Recall = " + metrics.recall());
+System.out.println("F1 Score = " + metrics.fMeasure());
+
+// Stats by labels
+for (int i = 0; i < metrics.labels().length; i++) {
+  System.out.format("Class %f precision = %f\n", 
metrics.labels()[i],metrics.precision
+  (metrics.labels()[i]));
+  System.out.format("Class %f recall = %f\n", metrics.labels()[i], 
metrics.recall(metrics
+  .labels()[i]));
+  System.out.format("Class %f F1 score = %f\n", metrics.labels()[i], 
metrics.fMeasure
+  (metrics.labels()[i]));
+}
+
+//Weighted stats
+System.out.format("Weighted precision = %f\n", 
metrics.weightedPrecision());
+System.out.format("Weighted recall = %f\n", metrics.weightedRecall());
+System.out.format("Weighted F1 score = %f\n", 
metrics.weightedFMeasure());
+System.out.format("Weighted false positive rate = %f\n", 
metrics.weightedFalsePositiveRate());
+
+// Save and load model
+model.save(sc, "target/tmp/LogisticRegressionModel");
+LogisticRegressionModel sameModel = LogisticRegressionModel.load(sc,
+"target/tmp/LogisticRegressionModel");
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply 

[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316697
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
--- End diff --

remove the line, since it's for scala code, not for java code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316908
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_movielens_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD ratings = data.map(
+  new Function() {
+public Rating call(String line) {
+  String[] parts = line.split("::");
+return new Rating(Integer.parseInt(parts[0]), 
Integer.parseInt(parts[1]), Double
+.parseDouble(parts[2]) - 2.5);
+}
+  }
+);
+ratings.cache();
+
+// Train an ALS model
+final MatrixFactorizationModel model = 
ALS.train(JavaRDD.toRDD(ratings), 10, 10, 0.01);
+
+// Get top 10 recommendations for every user and scale ratings from 0 
to 1
+JavaRDD> userRecs = 
model.recommendProductsForUsers(10).toJavaRDD();
+JavaRDD> userRecsScaled = userRecs.map(
+  new Function, Tuple2>() {
+public Tuple2 call(Tuple2 t) {
+  Rating[] scaledRatings = new Rating[t._2().length];
+  for (int i = 0; i < scaledRatings.length; i++) {
+double newRating = Math.max(Math.min(t._2()[i].rating(), 1.0), 
0.0);
+scaledRatings[i] = new Rating(t._2()[i].user(), 
t._2()[i].product(), newRating);
+  }
+  return new Tuple2(t._1(), scaledRatings);
+}
+  }
+);
+JavaPairRDD userRecommended = 
JavaPairRDD.fromJavaRDD(userRecsScaled);
+
+// Map ratings to 1 or 0, 1 indicating a movie that should be 
recommended
+JavaRDD binarizedRatings = ratings.map(
+  new Function() {
+public Rating call(Rating r) {
+  double binaryRating;
+  if (r.rating() > 0.0) {
+  binaryRating = 1.0;
+  } else {
+  binaryRating = 0.0;
+  }
+  return new Rating(r.user(), r.product(), binaryRating);
+}
+  }
+);
+
+// Group ratings by common user
+JavaPairRDD userMovies = 
binarizedRatings.groupBy(
+new Function() {
--- End diff --

fix the indention here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316866
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_movielens_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD ratings = data.map(
+  new Function() {
+public Rating call(String line) {
+  String[] parts = line.split("::");
+return new Rating(Integer.parseInt(parts[0]), 
Integer.parseInt(parts[1]), Double
+.parseDouble(parts[2]) - 2.5);
+}
+  }
+);
+ratings.cache();
+
+// Train an ALS model
+final MatrixFactorizationModel model = 
ALS.train(JavaRDD.toRDD(ratings), 10, 10, 0.01);
+
+// Get top 10 recommendations for every user and scale ratings from 0 
to 1
+JavaRDD> userRecs = 
model.recommendProductsForUsers(10).toJavaRDD();
+JavaRDD> userRecsScaled = userRecs.map(
+  new Function, Tuple2>() {
+public Tuple2 call(Tuple2 t) {
+  Rating[] scaledRatings = new Rating[t._2().length];
+  for (int i = 0; i < scaledRatings.length; i++) {
+double newRating = Math.max(Math.min(t._2()[i].rating(), 1.0), 
0.0);
+scaledRatings[i] = new Rating(t._2()[i].user(), 
t._2()[i].product(), newRating);
+  }
+  return new Tuple2(t._1(), scaledRatings);
+}
+  }
+);
+JavaPairRDD userRecommended = 
JavaPairRDD.fromJavaRDD(userRecsScaled);
+
+// Map ratings to 1 or 0, 1 indicating a movie that should be 
recommended
+JavaRDD binarizedRatings = ratings.map(
+  new Function() {
+public Rating call(Rating r) {
+  double binaryRating;
+  if (r.rating() > 0.0) {
+  binaryRating = 1.0;
--- End diff --

2-indention


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316872
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_movielens_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD ratings = data.map(
+  new Function() {
+public Rating call(String line) {
+  String[] parts = line.split("::");
+return new Rating(Integer.parseInt(parts[0]), 
Integer.parseInt(parts[1]), Double
+.parseDouble(parts[2]) - 2.5);
+}
+  }
+);
+ratings.cache();
+
+// Train an ALS model
+final MatrixFactorizationModel model = 
ALS.train(JavaRDD.toRDD(ratings), 10, 10, 0.01);
+
+// Get top 10 recommendations for every user and scale ratings from 0 
to 1
+JavaRDD> userRecs = 
model.recommendProductsForUsers(10).toJavaRDD();
+JavaRDD> userRecsScaled = userRecs.map(
+  new Function, Tuple2>() {
+public Tuple2 call(Tuple2 t) {
+  Rating[] scaledRatings = new Rating[t._2().length];
+  for (int i = 0; i < scaledRatings.length; i++) {
+double newRating = Math.max(Math.min(t._2()[i].rating(), 1.0), 
0.0);
+scaledRatings[i] = new Rating(t._2()[i].user(), 
t._2()[i].product(), newRating);
+  }
+  return new Tuple2(t._1(), scaledRatings);
+}
+  }
+);
+JavaPairRDD userRecommended = 
JavaPairRDD.fromJavaRDD(userRecsScaled);
+
+// Map ratings to 1 or 0, 1 indicating a movie that should be 
recommended
+JavaRDD binarizedRatings = ratings.map(
+  new Function() {
+public Rating call(Rating r) {
+  double binaryRating;
+  if (r.rating() > 0.0) {
+  binaryRating = 1.0;
+  } else {
+  binaryRating = 0.0;
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45316822
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/mllib/JavaRankingMetricsExample.java
 ---
@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.mllib;
+
+// $example on$
+import java.util.*;
+
+import scala.Tuple2;
+
+import org.apache.spark.api.java.*;
+import org.apache.spark.api.java.function.Function;
+import org.apache.spark.mllib.evaluation.RegressionMetrics;
+import org.apache.spark.mllib.evaluation.RankingMetrics;
+import org.apache.spark.mllib.recommendation.ALS;
+import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
+import org.apache.spark.mllib.recommendation.Rating;
+// $example off$
+import org.apache.spark.rdd.RDD;
+import org.apache.spark.SparkConf;
+
+public class JavaRankingMetricsExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("Ranking Metrics Example");
+JavaSparkContext sc = new JavaSparkContext(conf);
+// $example on$
+String path = "data/mllib/sample_movielens_data.txt";
+JavaRDD data = sc.textFile(path);
+JavaRDD ratings = data.map(
+  new Function() {
+public Rating call(String line) {
+  String[] parts = line.split("::");
+return new Rating(Integer.parseInt(parts[0]), 
Integer.parseInt(parts[1]), Double
+.parseDouble(parts[2]) - 2.5);
--- End diff --

2-indention


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317158
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/RegressionMetricsExample.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11848][SQL] Support EXPLAIN in DataSet ...

2015-11-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9832#issuecomment-157998905
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46313/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317177
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/RegressionMetricsExample.scala
 ---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
--- End diff --

ditto


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317509
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/RankingMetricsExample.scala
 ---
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// scalastyle:off println
+package org.apache.spark.examples.mllib
+
+import org.apache.spark.sql.SQLContext
--- End diff --

re-organize the imports like below:

```scala
// $example on$
import org.apache.spark.mllib.evaluation.{RegressionMetrics, RankingMetrics}
import org.apache.spark.mllib.recommendation.{ALS, Rating}
// $example off$
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11549][Docs] Replace example code in ml...

2015-11-19 Thread yinxusen
Github user yinxusen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9689#discussion_r45317621
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/MulticlassMetricsExample.scala
 ---
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
--- End diff --

add a space line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   >