Use Collaborative Filtering and Clustering Algorithm in Spark MLIB

2016-05-11 Thread Imre Nagi
Hi All, I'm newbie in spark mlib. In my office I have a statistician who work on improving our matrix model for our recommendation engine. However he works on R. He told me that it's quite possible to combine the collaborative filtering and latent dirichlet allocation (LDA) by doing some

Re: transformation - spark vs cassandra

2016-03-31 Thread Imre Nagi
I think querying by cassandra query language will be better in terms of performance if you want to pull and filter the data from your db, rather than pulling all of the data and do some filtering and transformation by using spark data frame. On 31 Mar 2016 22:19, "asethia"

Re: Restart App and consume from checkpoint using direct kafka API

2016-03-31 Thread Imre Nagi
I'm dont know how to read the data from the checkpoint. But AFAIK and based on my experience, I think the best thing that you can do is storing the offset to a particular storage such as database everytime you consume the message. Then read the offset from the database everytime you want to start

Stop spark application when the job is complete.

2016-03-19 Thread Imre Nagi
Hi, I have a spark application for batch processing in standalone cluster. The job is to query the database and then do some transformation, aggregation, and several actions such as indexing the result into the elasticsearch. If I dont call the sc.stop(), the spark application wont stop and take

Get Pair of Topic and Message from Kafka + Spark Streaming

2016-03-16 Thread Imre Nagi
Hi, I'm just trying to process the data that come from the kafka source in my spark streaming application. What I want to do is get the pair of topic and message in a tuple from the message stream. Here is my streams: val streams = KafkaUtils.createDirectStream[String, Array[Byte], >

Get Pair of Topic and Message from Kafka + Spark Streaming

2016-03-15 Thread Imre Nagi
Hi, I'm just trying to process the data that come from the kafka source in my spark streaming application. What I want to do is get the pair of topic and message in a tuple from the message stream. Here is my streams: val streams = KafkaUtils.createDirectStream[String, Array[Byte], >

Re: Streaming app consume multiple kafka topics

2016-03-15 Thread Imre Nagi
keep it this way: > >> > >> val stream1 = KafkaUtils.createStream(..) // for topic 1 > >> > >> val stream2 = KafkaUtils.createStream(..) // for topic 2 > >> > >> > >> And you will know which stream belongs to which topic. > >> >

Re: Streaming app consume multiple kafka topics

2016-03-15 Thread Imre Nagi
creating. Like, create a > tuple(topic, stream) and you will be able to access ._1 as topic and ._2 as > the stream. > > > Thanks > Best Regards > > On Tue, Mar 15, 2016 at 12:05 PM, Imre Nagi <imre.nagi2...@gmail.com> > wrote: > >> Hi, >> >> I'm j

Streaming app consume multiple kafka topics

2016-03-15 Thread Imre Nagi
Hi, I'm just trying to create a spark streaming application that consumes more than one topics sent by kafka. Then, I want to do different further processing for data sent by each topic. val kafkaStreams = { > val kafkaParameter = for (consumerGroup <- consumerGroups) yield { >

Re: Spark Twitter streaming

2016-03-08 Thread Imre Nagi
Do you mean listening to the twitter stream data? Maybe you can use the Twitter Stream API or Twitter Search API for this purpose. Imre On Tue, Mar 8, 2016 at 2:54 PM, Soni spark wrote: > Hallo friends, > > I need a urgent help. > > I am using spark streaming to get