[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47068305
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/PolynomialExpansionExample.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.feature.PolynomialExpansion
+import org.apache.spark.mllib.linalg.Vectors
+// $example off$
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.{SparkConf, SparkContext}
+
+object PolynomialExpansionExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("PolynomialExpansionExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+
+// $example on$
+val data = Array(
+  Vectors.dense(-2.0, 2.3),
+  Vectors.dense(0.0, 0.0),
+  Vectors.dense(0.6, -1.1)
+)
+val df = 
sqlContext.createDataFrame(data.map(Tuple1.apply)).toDF("features")
+val polynomialExpansion = new PolynomialExpansion()
+  .setInputCol("features")
+  .setOutputCol("polyFeatures")
+  .setDegree(3)
+val polyDF = polynomialExpansion.transform(df)
+polyDF.select("polyFeatures").take(3).foreach(println)
+// $example off$
+sc.stop()
+  }
+}
+// scalastyle:on println
+
+
--- End diff --

Trailing lines


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47068389
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/PCAExample.scala ---
@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.feature.PCA
+import org.apache.spark.mllib.linalg.Vectors
+// $example off$
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.{SparkConf, SparkContext}
+
+object PCAExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("PCAExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+
+// $example on$
+val data = Array(
+  Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
+  Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
+  Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
+)
+val df = 
sqlContext.createDataFrame(data.map(Tuple1.apply)).toDF("features")
+val pca = new PCA()
+  .setInputCol("features")
+  .setOutputCol("pcaFeatures")
+  .setK(3)
+  .fit(df)
+val pcaDF = pca.transform(df)
+val result = pcaDF.select("pcaFeatures")
+result.show()
+// $example off$
+sc.stop()
+  }
+}
+// scalastyle:on println
+
--- End diff --

Trailing line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47068461
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/OneHotEncoderExample.scala 
---
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer}
+// $example off$
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.{SparkConf, SparkContext}
+
+object OneHotEncoderExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("OneHotEncoderExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+
+// $example on$
+val df = sqlContext.createDataFrame(Seq(
+  (0, "a"),
+  (1, "b"),
+  (2, "c"),
+  (3, "a"),
+  (4, "a"),
+  (5, "c")
+)).toDF("id", "category")
+
+val indexer = new StringIndexer()
+  .setInputCol("category")
+  .setOutputCol("categoryIndex")
+  .fit(df)
+val indexed = indexer.transform(df)
+
+val encoder = new OneHotEncoder()
+  .setInputCol("categoryIndex")
+  .setOutputCol("categoryVec")
+val encoded = encoder.transform(indexed)
+encoded.select("id", "categoryVec").show()
+// $example off$
+sc.stop()
+  }
+}
+// scalastyle:on println
+
--- End diff --

Trailing line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47068203
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/StringIndexerExample.scala 
---
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.feature.StringIndexer
+// $example off$
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.{SparkConf, SparkContext}
+
+object StringIndexerExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("StringIndexerExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+
+// $example on$
+val df = sqlContext.createDataFrame(
+  Seq((0, "a"), (1, "b"), (2, "c"), (3, "a"), (4, "a"), (5, "c"))
+).toDF("id", "category")
+
+val indexer = new StringIndexer()
+  .setInputCol("category")
+  .setOutputCol("categoryIndex")
+
+val indexed = indexer.fit(df).transform(df)
+indexed.show()
+// $example off$
+sc.stop()
+  }
+}
+// scalastyle:on println
+
--- End diff --

Trailing line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163165116
  
I notice some formatting quirks, especially for scala examples, otherwise 
it looks good.

However, shouldn't we take advantage of this pr to standardize the output 
of the examples?
For example, I think every example should end with a `show()` or `println` 
so the user can just c/c the example and see what it does for himself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163173174
  
@BenFradet It's reasonable. I'll modify them now. Thanks for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47068916
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/ElementWiseProductExample.scala
 ---
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples.ml
+
+// $example on$
+import org.apache.spark.ml.feature.ElementwiseProduct
+import org.apache.spark.mllib.linalg.Vectors
+// $example off$
+import org.apache.spark.sql.SQLContext
+import org.apache.spark.{SparkConf, SparkContext}
+
+object ElementwiseProductExample {
+  def main(args: Array[String]): Unit = {
+val conf = new SparkConf().setAppName("ElementwiseProductExample")
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+
+// $example on$
+// Create some vector data; also works for sparse vectors
+val dataFrame = sqlContext.createDataFrame(Seq(
+  ("a", Vectors.dense(1.0, 2.0, 3.0)),
+  ("b", Vectors.dense(4.0, 5.0, 6.0.toDF("id", "vector")
+
+val transformingVector = Vectors.dense(0.0, 1.0, 2.0)
+val transformer = new ElementwiseProduct()
+  .setScalingVec(transformingVector)
+  .setInputCol("vector")
+  .setOutputCol("transformedVector")
+
+// Batch transform the vectors to create new column:
+transformer.transform(dataFrame).show()
+// $example off$
+sc.stop()
+  }
+}
+// scalastyle:on println
+
--- End diff --

Trailing line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163213528
  
@yinxusen I'll have a look later today


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163203243
  
@BenFradet Does the code look good for you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163203400
  
**[Test build #47427 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47427/consoleFull)**
 for PR 10219 at commit 
[`771d015`](https://github.com/apache/spark/commit/771d015000114828ab32e38301acbb50df150f9d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163211490
  
**[Test build #47427 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47427/consoleFull)**
 for PR 10219 at commit 
[`771d015`](https://github.com/apache/spark/commit/771d015000114828ab32e38301acbb50df150f9d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`public class JavaBinarizerExample `\n  * `public class JavaBucketizerExample 
`\n  * `public class JavaDCTExample `\n  * `public class 
JavaElementwiseProductExample `\n  * `public class JavaMinMaxScalerExample `\n  
* `public class JavaNGramExample `\n  * `public class JavaNormalizerExample `\n 
 * `public class JavaOneHotEncoderExample `\n  * `public class JavaPCAExample 
`\n  * `public class JavaPolynomialExpansionExample `\n  * `public class 
JavaRFormulaExample `\n  * `public class JavaStandardScalerExample `\n  * 
`public class JavaStopWordsRemoverExample `\n  * `public class 
JavaStringIndexerExample `\n  * `public class JavaTokenizerExample `\n  * 
`public class JavaVectorAssemblerExample `\n  * `public class 
JavaVectorIndexerExample `\n  * `public class JavaVectorSlicerExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163211572
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47427/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163211570
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47150356
  
--- Diff: examples/src/main/python/ml/polynomial_expansion_example.py ---
@@ -0,0 +1,43 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+from __future__ import print_function
+
+from pyspark import SparkContext
+from pyspark.sql import SQLContext
+# $example on$
+from pyspark.ml.feature import PolynomialExpansion
+from pyspark.mllib.linalg import Vectors
+# $example off$
+
+if __name__ == "__main__":
+sc = SparkContext(appName="PolynomialExpansionExample")
+sqlContext = SQLContext(sc)
+
+# $example on$
+df = sqlContext\
+.createDataFrame([(Vectors.dense([-2.0, 2.3]), ),
--- End diff --

nit: space


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47147519
  
--- Diff: 
examples/src/main/java/org/apache/spark/examples/ml/JavaBinarizerExample.java 
---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml;
+
+import org.apache.spark.SparkConf;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.apache.spark.sql.SQLContext;
+
+// $example on$
+import java.util.Arrays;
+
+import org.apache.spark.api.java.JavaRDD;
+import org.apache.spark.ml.feature.Binarizer;
+import org.apache.spark.sql.DataFrame;
+import org.apache.spark.sql.Row;
+import org.apache.spark.sql.RowFactory;
+import org.apache.spark.sql.types.DataTypes;
+import org.apache.spark.sql.types.Metadata;
+import org.apache.spark.sql.types.StructField;
+import org.apache.spark.sql.types.StructType;
+// $example off$
+
+public class JavaBinarizerExample {
+  public static void main(String[] args) {
+SparkConf conf = new SparkConf().setAppName("JavaBinarizerExample");
+JavaSparkContext jsc = new JavaSparkContext(conf);
+SQLContext jsql = new SQLContext(jsc);
+
+// $example on$
+JavaRDD jrdd = jsc.parallelize(Arrays.asList(
+  RowFactory.create(0, 0.1),
+  RowFactory.create(1, 0.8),
+  RowFactory.create(2, 0.2)
+));
+StructType schema = new StructType(new StructField[]{
+  new StructField("label", DataTypes.DoubleType, false, 
Metadata.empty()),
+  new StructField("feature", DataTypes.DoubleType, false, 
Metadata.empty())
+});
+DataFrame continuousDataFrame = jsql.createDataFrame(jrdd, schema);
+Binarizer binarizer = new Binarizer()
+  .setInputCol("feature")
+  .setOutputCol("binarized_feature")
+  .setThreshold(0.5);
+DataFrame binarizedDataFrame = 
binarizer.transform(continuousDataFrame);
+DataFrame binarizedFeatures = 
binarizedDataFrame.select("binarized_feature");
+for (Row r : binarizedFeatures.collect()) {
+Double binarized_value = r.getDouble(0);
--- End diff --

indent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread BenFradet
Github user BenFradet commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163389541
  
LGTM, except two minor comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163442658
  
@BenFradet I'll change it in the follow-up PR 
https://github.com/apache/spark/pull/10193


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163373814
  
Merged into master and branch-1.6. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/10219#discussion_r47143681
  
--- Diff: docs/ml-features.md ---
@@ -794,39 +411,7 @@ dctDf.select("featuresDCT").show(3)
 Refer to the [DCT Java docs](api/java/org/apache/spark/ml/feature/DCT.html)
 for more details on the API.
 
-{% highlight java %}
-import java.util.Arrays;
-
-import org.apache.spark.api.java.JavaRDD;
-import org.apache.spark.api.java.JavaSparkContext;
-import org.apache.spark.ml.feature.DCT;
-import org.apache.spark.mllib.linalg.Vector;
-import org.apache.spark.mllib.linalg.VectorUDT;
-import org.apache.spark.mllib.linalg.Vectors;
-import org.apache.spark.sql.DataFrame;
-import org.apache.spark.sql.Row;
-import org.apache.spark.sql.RowFactory;
-import org.apache.spark.sql.SQLContext;
-import org.apache.spark.sql.types.Metadata;
-import org.apache.spark.sql.types.StructField;
-import org.apache.spark.sql.types.StructType;
-
-JavaRDD data = jsc.parallelize(Arrays.asList(
-  RowFactory.create(Vectors.dense(0.0, 1.0, -2.0, 3.0)),
-  RowFactory.create(Vectors.dense(-1.0, 2.0, 4.0, -7.0)),
-  RowFactory.create(Vectors.dense(14.0, -2.0, -5.0, 1.0))
-));
-StructType schema = new StructType(new StructField[] {
-  new StructField("features", new VectorUDT(), false, Metadata.empty()),
-});
-DataFrame df = jsql.createDataFrame(data, schema);
-DCT dct = new DCT()
-  .setInputCol("features")
-  .setOutputCol("featuresDCT")
-  .setInverse(false);
-DataFrame dctDf = dct.transform(df);
-dctDf.select("featuresDCT").show(3);
-{% endhighlight %}
+{% include_example java/org/apache/spark/examples/ml/JavaDCTExample.java 
%}}
--- End diff --

Please remove the extra `}` at the end.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10219


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163135226
  
**[Test build #47413 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47413/consoleFull)**
 for PR 10219 at commit 
[`8748a88`](https://github.com/apache/spark/commit/8748a888df8d17bccc03f6c178641e04242ec157).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163137492
  
**[Test build #47413 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47413/consoleFull)**
 for PR 10219 at commit 
[`8748a88`](https://github.com/apache/spark/commit/8748a888df8d17bccc03f6c178641e04242ec157).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:\n  * 
`public class JavaBinarizerExample `\n  * `public class JavaBucketizerExample 
`\n  * `public class JavaDCTExample `\n  * `public class 
JavaElementwiseProductExample `\n  * `public class JavaMinMaxScalerExample `\n  
* `public class JavaNGramExample `\n  * `public class JavaNormalizerExample `\n 
 * `public class JavaOneHotEncoderExample `\n  * `public class JavaPCAExample 
`\n  * `public class JavaPolynomialExpansionExample `\n  * `public class 
JavaRFormulaExample `\n  * `public class JavaStandardScalerExample `\n  * 
`public class JavaStopWordsRemoverExample `\n  * `public class 
JavaStringIndexerExample `\n  * `public class JavaTokenizerExample `\n  * 
`public class JavaVectorAssemblerExample `\n  * `public class 
JavaVectorIndexerExample `\n  * `public class JavaVectorSlicerExample `\n


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163137574
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47413/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163138018
  
Ping @mengxr, this is for SPARK-11551. Please sign it off if looks good to 
you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163137570
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread yinxusen
GitHub user yinxusen opened a pull request:

https://github.com/apache/spark/pull/10219

[SPARK-11551][DOC] Replace example code in ml-features.md using 
include_example

PR on behalf of @somideshmukh, thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yinxusen/spark SPARK-11551

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10219


commit d14f55d8e842519b81423348e6656803b4c130fe
Author: somideshmukh 
Date:   2015-11-26T09:13:58Z

[SPARK-11551][DOC][Example]Replace example code in ml-features.md using 
include_example

commit 12b1cf33a1846250458f3093b7bf7b7826f5
Author: somideshmukh 
Date:   2015-11-26T10:21:05Z

[SPARK-11551][DOC][Example]Replace example code in ml-features.md using 
include_example

commit 87e673eff13799027abb6f9835223c2e3791644e
Author: Xusen Yin 
Date:   2015-11-27T05:06:52Z

fix java code issues

commit 0e19113bb4882c48bd0344cd480270ef054c9708
Author: Xusen Yin 
Date:   2015-11-27T05:52:13Z

fix scala issues

commit f6a975eaf1b6584325a1c94d99fc25bffdf1bad9
Author: Xusen Yin 
Date:   2015-11-27T06:11:09Z

add java vectorindexer, standardscaler, normalizer

commit dd1d2c12d5d7e65332c955bd63127a8b59f74502
Author: Xusen Yin 
Date:   2015-11-27T06:16:29Z

add jsc stop

commit 3d1efc3661719de9a253f862473cf9a7ede60139
Author: Xusen Yin 
Date:   2015-11-27T06:26:22Z

fix scala issues

commit c23bab4beb47fd604c153ce5c94c563eaf36361c
Author: Xusen Yin 
Date:   2015-11-27T08:02:57Z

add python examples

commit c143d4b2e35275f72ef7b6e5f73ef8cfcceddc4a
Author: somideshmukh 
Date:   2015-11-28T11:59:59Z

Merge pull request #1 from yinxusen/SomilBranch1.33

review result

commit b688b4d4055bee4e52bcfe1adf4991a60b6e55de
Author: somideshmukh 
Date:   2015-12-01T09:50:53Z

[SPARK-11551][DOC][Example]Replace example code in ml-features.md using 
include_example

commit 8a0d88332f39e44365c7cbe3fdb9fac251251d85
Author: Xusen Yin 
Date:   2015-12-01T14:53:15Z

fix minor issues

commit bed2192d58c1bce968f3aa4f191e739972dad7e6
Author: Xusen Yin 
Date:   2015-12-01T15:08:46Z

merge with master

commit e31fb4a9434fa9e5e4ce19900c2a98b24626032d
Author: Xusen Yin 
Date:   2015-12-08T11:05:55Z

fix python style

commit 8748a888df8d17bccc03f6c178641e04242ec157
Author: Xusen Yin 
Date:   2015-12-09T06:46:51Z

merge with master




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11551][DOC] Replace example code in ml-...

2015-12-08 Thread yinxusen
Github user yinxusen commented on the pull request:

https://github.com/apache/spark/pull/10219#issuecomment-163133018
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org