Hi All, I recently started learning Spark. I need to use spark-streaming.
1) Input, need to read from MongoDB db.event_gcovs.find({executions:"56791a746e928d7b176d03c0", valid:1, infofile:{$exists:1}, geo:"sunnyvale"}, {infofile:1}).count() > Number of Info files: 24441 /* 0 */ { "_id" : ObjectId("568eaeda71404e5c563ccb86"), "infofile" : "/volume/testtech/datastore/code-coverage/p//infos/svl/6/56791a746e928d7b176d03c0/ 69958.pcp_napt44_20368.pl.30090.exhibit.R0-re0.15.1I20151218_1934_jammyc.pfe.i386.TC011.fail.FAIL.gcov.info " } One info file can have 1000 of these blocks( Each block starts from "SF" delimeter, and ends with the end_of_record.