spark git commit: [SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide

2015-08-18 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master f4fa61eff - 747c2ba80


[SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide

Add Python example for mllib LDAModel user guide

Author: Yanbo Liang yblia...@gmail.com

Closes #8227 from yanboliang/spark-10032.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/747c2ba8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/747c2ba8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/747c2ba8

Branch: refs/heads/master
Commit: 747c2ba8006d5b86f3be8dfa9ace639042a35628
Parents: f4fa61e
Author: Yanbo Liang yblia...@gmail.com
Authored: Tue Aug 18 12:56:36 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue Aug 18 12:56:36 2015 -0700

--
 docs/mllib-clustering.md | 28 
 1 file changed, 28 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/747c2ba8/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index bb875ae..fd9ab25 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -564,6 +564,34 @@ public class JavaLDAExample {
 {% endhighlight %}
 /div
 
+div data-lang=python markdown=1
+{% highlight python %}
+from pyspark.mllib.clustering import LDA, LDAModel
+from pyspark.mllib.linalg import Vectors
+
+# Load and parse the data
+data = sc.textFile(data/mllib/sample_lda_data.txt)
+parsedData = data.map(lambda line: Vectors.dense([float(x) for x in 
line.strip().split(' ')]))
+# Index documents with unique IDs
+corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()
+
+# Cluster the documents into three topics using LDA
+ldaModel = LDA.train(corpus, k=3)
+
+# Output topics. Each is a distribution over words (matching word count 
vectors)
+print(Learned topics (as distributions over vocab of  + 
str(ldaModel.vocabSize()) +  words):)
+topics = ldaModel.topicsMatrix()
+for topic in range(3):
+print(Topic  + str(topic) + :)
+for word in range(0, ldaModel.vocabSize()):
+print(  + str(topics[word][topic]))
+   
+# Save and load model
+model.save(sc, myModelPath)
+sameModel = LDAModel.load(sc, myModelPath)
+{% endhighlight %}
+/div
+
 /div
 
 ## Streaming k-means


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide

2015-08-18 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.5 80debff12 - ec7079f9c


[SPARK-10032] [PYSPARK] [DOC] Add Python example for mllib LDAModel user guide

Add Python example for mllib LDAModel user guide

Author: Yanbo Liang yblia...@gmail.com

Closes #8227 from yanboliang/spark-10032.

(cherry picked from commit 747c2ba8006d5b86f3be8dfa9ace639042a35628)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ec7079f9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ec7079f9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ec7079f9

Branch: refs/heads/branch-1.5
Commit: ec7079f9c94cb98efdac6f92b7c85efb0e67492e
Parents: 80debff
Author: Yanbo Liang yblia...@gmail.com
Authored: Tue Aug 18 12:56:36 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Tue Aug 18 12:56:43 2015 -0700

--
 docs/mllib-clustering.md | 28 
 1 file changed, 28 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ec7079f9/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index bb875ae..fd9ab25 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -564,6 +564,34 @@ public class JavaLDAExample {
 {% endhighlight %}
 /div
 
+div data-lang=python markdown=1
+{% highlight python %}
+from pyspark.mllib.clustering import LDA, LDAModel
+from pyspark.mllib.linalg import Vectors
+
+# Load and parse the data
+data = sc.textFile(data/mllib/sample_lda_data.txt)
+parsedData = data.map(lambda line: Vectors.dense([float(x) for x in 
line.strip().split(' ')]))
+# Index documents with unique IDs
+corpus = parsedData.zipWithIndex().map(lambda x: [x[1], x[0]]).cache()
+
+# Cluster the documents into three topics using LDA
+ldaModel = LDA.train(corpus, k=3)
+
+# Output topics. Each is a distribution over words (matching word count 
vectors)
+print(Learned topics (as distributions over vocab of  + 
str(ldaModel.vocabSize()) +  words):)
+topics = ldaModel.topicsMatrix()
+for topic in range(3):
+print(Topic  + str(topic) + :)
+for word in range(0, ldaModel.vocabSize()):
+print(  + str(topics[word][topic]))
+   
+# Save and load model
+model.save(sc, myModelPath)
+sameModel = LDAModel.load(sc, myModelPath)
+{% endhighlight %}
+/div
+
 /div
 
 ## Streaming k-means


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org