when we use readStream to read data as Stream, how spark decides the no of RDD and partition within each RDD with respect to storage and file format.
val dsJson = sqlContext.readStream.json("/Users/sachin/testSpark/inputJson") val dsCsv = sqlContext.readStream.option("header","true").csv( "/Users/sachin/testSpark/inputCsv") val ds = sqlContext.readStream.text("/Users/sachin/testSpark/inputText") val dsText = ds.as[String].map(x =>(x.split(" ")(0),x.split(" ")(1))).toDF("name","age") val dsParquet = sqlContext.readStream.format("parquet").parquet("/Users/sachin/testSpark/inputParquet") -- Thanks & Regards Sachin Aggarwal 7760502772