when we use readStream to read data as Stream, how spark decides the no of
RDD and partition within each RDD with respect to storage and file format.

val dsJson = sqlContext.readStream.json("/Users/sachin/testSpark/inputJson")

val dsCsv = sqlContext.readStream.option("header","true").csv(
"/Users/sachin/testSpark/inputCsv")

val ds = sqlContext.readStream.text("/Users/sachin/testSpark/inputText")
val dsText = ds.as[String].map(x =>(x.split(" ")(0),x.split("
")(1))).toDF("name","age")

val dsParquet =
sqlContext.readStream.format("parquet").parquet("/Users/sachin/testSpark/inputParquet")



-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Reply via email to