Structured Streaming partition logic with respect to storage and fileformat

Sachin Aggarwal Tue, 21 Jun 2016 03:21:14 -0700

when we use readStream to read data as Stream, how spark decides the no of
RDD and partition within each RDD with respect to storage and file format.


val dsJson = sqlContext.readStream.json("/Users/sachin/testSpark/inputJson")

val dsCsv = sqlContext.readStream.option("header","true").csv(
"/Users/sachin/testSpark/inputCsv")

val ds = sqlContext.readStream.text("/Users/sachin/testSpark/inputText")
val dsText = ds.as[String].map(x =>(x.split(" ")(0),x.split("
")(1))).toDF("name","age")

val dsParquet =
sqlContext.readStream.format("parquet").parquet("/Users/sachin/testSpark/inputParquet")



-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Structured Streaming partition logic with respect to storage and fileformat

Reply via email to