Hi, I am just testing Spark streaming with Kafka.
Basically I am broadcasting topic every minute to Host:port -> rhes564:2181. This is sending few lines through a shell script as follows: cat ${IN_FILE} | ${KAFKA_HOME}/bin/kafka-console-producer.sh --broker-list rhes564:9092 --topic newtopic That works fine and I can see the messages in ${KAFKA_HOME}/bin/kafka-console-consumer.sh --zookeeper rhes564:2181 --topic newtopic Fri Apr 1 21:00:01 BST 2016 ======= Sending messages from rhes5 1,'OZ9062Cx22qAHo8m_fsZb16Etlq5eTnL4jYPKmgPQPyQB7Kk5IMt2xQN3yy1Qb1O3Qph16TGlHzixw02mRLAiagU0Wh17fHi5dOQ',101 2,'Py_xzno6MEWPz1bp5Cc0JBPfX90mz2uVMLPBJUWucvNPlPnVMMm81PExZ5uM0K9iEdKmleY7XFsn8O3Oxr6e07qdycvuk_lR84vI',102 3,'i2FS2ODjRBdaIpyE362JVPu4KEYSHDNTjPh46YFANquxNRK9JQT8h1W4Tph9DqGfwIgQG5ZJ8BCBklRQreyJhoLIPMbJQeH_rhN1',103 4,'Yp_q_uyH16UPTRvPdeKaslw8bhheFqqdwWaG_e8TZZ6jyscyQN556jJMxYOZjx5Zv7GV6zoa2ORsTEGcAKbKUChPFfuGAujgDkjT',104 5,'t3uuFOkNEjDE_7rc9cLbgT1o0B_jZXWsWNtmBgiC4ACffzTHUGRkl5YIZSUXB3kew2 yytvB8nbCklImDa0BWxYseSbMWiKg1R9ae',105 Now I try to see the topic in spark streaming as follows: val conf = new SparkConf(). setAppName("StreamTest"). setMaster("local[12]"). set("spark.driver.allowMultipleContexts", "true"). set("spark.hadoop.validateOutputSpecs", "false") val sc = new SparkContext(conf) // Create sqlContext based on HiveContext val sqlContext = new HiveContext(sc) val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc) // // Create a local StreamingContext with two working thread and batch interval of 1 second. // The master requires 2 cores to prevent from a starvation scenario. val ssc = new StreamingContext(conf, Minutes(1)) // Create a DStream that will connect to hostname:port, like localhost:9999 //val lines = ssc.socketTextStream("rhes564", 9092) val lines = ssc.socketTextStream("rhes564", 2181) // Split each line into words val words = lines.flatMap(_.split(" ")) val pairs = words.map(word => (word, 1)) val wordCounts = pairs.reduceByKey(_ + _) // Print the first ten elements of each RDD generated in this DStream to the console wordCounts.print() ssc.start() This is what I am getting: scala> ------------------------------------------- Time: 1459541760000 ms ------------------------------------------- But no values Have I got the port wrong in this case or the set up is incorrect? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com