pyspark equivalent to Extends Serializable

2015-07-21 Thread keegan
I'm trying to define a class that contains as attributes some of Spark's objects and am running into a problem that I think would be solved I can find python's equivalent of Scala's Extends Serializable. Here's a simple class that has a Spark RDD as one of its attributes. class Foo: def

Re: The explanation of input text format using LDA in Spark

2015-05-12 Thread keegan
This matrix is the format of a Document Term Matrix. Each row represents all the words in a single document, each column represents just one of the possible words, and the elements of the matrix are the corresponding word counts. Simple example here

Re: What is difference btw reduce fold?

2015-04-27 Thread keegan
Hi Q, fold and reduce both aggregate over a collection by implementing an operation you specify, the major different is the starting point of the aggregation. For fold(), you have to specify the starting value, and for reduce() the starting value is the first (or possibly an arbitrary) element in