subject:"Re\: Structured Streaming partition logic with respect to storage and fileformat"

Re: Structured Streaming partition logic with respect to storage and fileformat

2016-06-21 Thread Sachin Aggarwal

what will the scenario in case of s3 and  local file system?

On Tue, Jun 21, 2016 at 4:36 PM, Jörn Franke  wrote:

> Based on the underlying Hadoop FileFormat. This one does it mostly based
> on blocksize. You can change this though.
>
> On 21 Jun 2016, at 12:19, Sachin Aggarwal 
> wrote:
>
>
> when we use readStream to read data as Stream, how spark decides the no of
> RDD and partition within each RDD with respect to storage and file format.
>
> val dsJson = sqlContext.readStream.json(
> "/Users/sachin/testSpark/inputJson")
>
> val dsCsv = sqlContext.readStream.option("header","true").csv(
> "/Users/sachin/testSpark/inputCsv")
>
> val ds = sqlContext.readStream.text("/Users/sachin/testSpark/inputText")
> val dsText = ds.as[String].map(x =>(x.split(" ")(0),x.split(" 
> ")(1))).toDF("name","age")
>
> val dsParquet = 
> sqlContext.readStream.format("parquet").parquet("/Users/sachin/testSpark/inputParquet")
>
>
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>
>


-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Re: Structured Streaming partition logic with respect to storage and fileformat

2016-06-21 Thread Jörn Franke

Based on the underlying Hadoop FileFormat. This one does it mostly based on 
blocksize. You can change this though.

> On 21 Jun 2016, at 12:19, Sachin Aggarwal  wrote:
> 
> 
> when we use readStream to read data as Stream, how spark decides the no of 
> RDD and partition within each RDD with respect to storage and file format.
> 
> val dsJson = sqlContext.readStream.json("/Users/sachin/testSpark/inputJson")
> 
> val dsCsv = 
> sqlContext.readStream.option("header","true").csv("/Users/sachin/testSpark/inputCsv")
> val ds = sqlContext.readStream.text("/Users/sachin/testSpark/inputText")
> val dsText = ds.as[String].map(x =>(x.split(" ")(0),x.split(" 
> ")(1))).toDF("name","age")
> 
> val dsParquet = 
> sqlContext.readStream.format("parquet").parquet("/Users/sachin/testSpark/inputParquet")
> 
> 
> -- 
> 
> Thanks & Regards
> 
> Sachin Aggarwal
> 7760502772

Re: Structured Streaming partition logic with respect to storage and fileformat

Re: Structured Streaming partition logic with respect to storage and fileformat

2 matches

Site Navigation

Mail list logo

Footer information