No topicDistributions(..) method in ml.clustering.LocalLDAModel

2017-02-07 Thread sachintyagi22
Hi, 

I was using ml.clustering.LDA for topic modelling (with online optimizer)
and it returns ml.clustering.LocalLDAModel. However, using this model there
doesn't seem to be any way to get the topic distribution over documents.
While older mllib API (mllib.clustering.LocalLDAModel ) does have the method
for exactly that -- topicDistributions(..)

I am not sure why it so. Specially given that the new ml.LDA uses older
mllib.LDA and wraps the older mllib.LocalLDAModel in the new
ml.LocalLDAModel.

So, can someone please clarify:
1. Why this is so?
2. What is the correct way to get topic distributions in the new
LocalLDAModel?

Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/No-topicDistributions-method-in-ml-clustering-LocalLDAModel-tp28368.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Checkpointing in Iterative Graph Computation

2015-08-25 Thread sachintyagi22
Hi, 

I have stumbled upon an issue with iterative Graphx computation (using v
1.4.1). It goes thusly --

Setup
1. Construct a graph.
2. Validate that the graph satisfies certain conditions. Here I do some
assert(*conditions*) within graph.triplets.foreach(). [Notice that this
materializes the graph.]

For n iterations
3. Update graph edges and vertices.
4. Collect deltas over whole of graph (to be used in next iteration). Again,
this is done through  graph.aggregate() and this materializes the graph.
5. Update the graph and use it in next iteration (step 3).

Now the problem is -- after about 300 iterations I run into Stackoverflow
error due to the lengthy lineage. So, I decided to checkpoint the graph
after every k iterations. But it doesn't work. 

The problem is -- once a graph is materialized then calling checkpoint() on
it has no effect, even after materializing the graph again. In fact the
isCheckpointed() method on such an RDD will always return false, even after
calling checkpoint() and count() on the RDD. Following code should clarify - 

val users = sc.parallelize(Array((3L, (rxin, student)), (7L,
(jgonzal, postdoc)))
//Materialize the RDD
users.count()
//Now call the checkpoint
users.checkpoint()
users.count()

//This fails
assert(users.isCheckpointed)

And it works the same with Graph.checkpoint(). Now my problem is that in
both setup and iteration steps (Step 2 and 5 above) I have to materialize
the graph, and so it leaves me in a situation where I can not checkpoint it
in a usual fashion.

Currently, I am working around this by creating a new Graph every kth
iteration with the same edges and vertices and then checkpointing it and
then using this new graph for k+1 to 2k iterations and so on. This works.

Now my question are - 
1. Why doesn't checkpointing work on an RDD if it is materialized? 
2. My use case looks pretty common, how do people generally handle this?

Thanks in advance.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Checkpointing-in-Iterative-Graph-Computation-tp24443.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org