Hi, Anybody has any idea on below?
-Obaid On Friday, 27 May 2016, obaidul karim <obaidc...@gmail.com> wrote: > Hi Guys, > > This is my first mail to spark users mailing list. > > I need help on Dstream operation. > > In fact, I am using a MLlib randomforest model to predict using spark > streaming. In the end, I want to combine the feature Dstream & prediction > Dstream together for further downstream processing. > > I am predicting using below piece of code: > > predictions = texts.map( lambda x : getFeatures(x) ).map(lambda x : > x.split(',')).map( lambda parts : [float(i) for i in parts] > ).transform(lambda rdd: rf_model.predict(rdd)) > > Here texts is dstream having single line of text as records > getFeatures generates a comma separated features extracted from each record > > > I want the output as below tuple: > ("predicted value", "original text") > > How can I achieve that ? > or > at least can I perform .zip like normal RDD operation on two Dstreams, > like below: > output = texts.zip(predictions) > > > Thanks in advance. > > -Obaid >