Re: [SQL][Dataframe] Change data source after saveAsParquetFile

2015-05-08 Thread ayan guha
From S3. As the dependency of df will be on s3. And because rdds are not
replicated.
On 8 May 2015 23:02, Peter Rudenko petro.rude...@gmail.com wrote:

  Hi, i have a next question:

 val data = sc.textFile(s3:///)val df = data.toDF
 df.saveAsParquetFile(hdfs://)
 df.someAction(...)

 if during someAction some workers would die, would recomputation download
 files from s3 or from hdfs parquet?

 Thanks,
 Peter Rudenko
 ​



Re: [SQL][Dataframe] Change data source after saveAsParquetFile

2015-05-08 Thread Peter Rudenko

Hm, thanks.
Do you know what this setting mean: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1178 
?


Thanks,
Peter Rudenko

On 2015-05-08 17:48, ayan guha wrote:


From S3. As the dependency of df will be on s3. And because rdds are 
not replicated.


On 8 May 2015 23:02, Peter Rudenko petro.rude...@gmail.com 
mailto:petro.rude...@gmail.com wrote:


Hi, i have a next question:

|val data = sc.textFile(s3:///) val df = data.toDF
df.saveAsParquetFile(hdfs://) df.someAction(...) |

if during someAction some workers would die, would recomputation
download files from s3 or from hdfs parquet?

Thanks,
Peter Rudenko

​





Re: [SQL][Dataframe] Change data source after saveAsParquetFile

2015-05-08 Thread Michael Armbrust
Thats a feature flag for a new code path for reading parquet files.  Its
only there in case bugs are found in the old path and will be removed once
we are sure the new path is solid.

On Fri, May 8, 2015 at 8:04 AM, Peter Rudenko petro.rude...@gmail.com
wrote:

  Hm, thanks.
 Do you know what this setting mean:
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L1178
 ?

 Thanks,
 Peter Rudenko


 On 2015-05-08 17:48, ayan guha wrote:

 From S3. As the dependency of df will be on s3. And because rdds are not
 replicated.
 On 8 May 2015 23:02, Peter Rudenko  petro.rude...@gmail.com
 petro.rude...@gmail.com wrote:

  Hi, i have a next question:

 val data = sc.textFile(s3:///)val df = data.toDF
 df.saveAsParquetFile(hdfs://)
 df.someAction(...)

 if during someAction some workers would die, would recomputation download
 files from s3 or from hdfs parquet?

 Thanks,
 Peter Rudenko
 ​





[SQL][Dataframe] Change data source after saveAsParquetFile

2015-05-08 Thread Peter Rudenko

Hi, i have a next question:

|val data = sc.textFile(s3:///) val df = data.toDF 
df.saveAsParquetFile(hdfs://) df.someAction(...) |


if during someAction some workers would die, would recomputation 
download files from s3 or from hdfs parquet?


Thanks,
Peter Rudenko

​