subject:"newbie HDFS S3 best practices"

Re: newbie HDFS S3 best practices

2016-03-16 Thread Chris Miller

Date: Tuesday, March 15, 2016 at 11:59 AM > To: Andrew Davidson <a...@santacruzintegration.com> > Cc: "user @spark" <user@spark.apache.org> > Subject: Re: newbie HDFS S3 best practices > > Hard to say with #1 without knowing your application’s characteristics; &g

Re: newbie HDFS S3 best practices

2016-03-15 Thread Andy Davidson

ser @spark" <user@spark.apache.org> Subject: Re: newbie HDFS S3 best practices > Hard to say with #1 without knowing your application¹s characteristics; for > #2, we use conductor <https://github.com/BD2KGenomics/conductor> with IAM > roles, .boto/.aws/credentia

Re: newbie HDFS S3 best practices

2016-03-15 Thread Frank Austin Nothaft

Hard to say with #1 without knowing your application’s characteristics; for #2, we use conductor with IAM roles, .boto/.aws/credentials files. Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 > On Mar 15, 2016, at

newbie HDFS S3 best practices

2016-03-15 Thread Andy Davidson

We use the spark-ec2 script to create AWS clusters as needed (we do not use AWS EMR) 1. will we get better performance if we copy data to HDFS before we run instead of reading directly from S3? 2. What is a good way to move results from HDFS to S3? It seems like there are many ways to bulk copy

Re: newbie HDFS S3 best practices

Re: newbie HDFS S3 best practices

Re: newbie HDFS S3 best practices

newbie HDFS S3 best practices

4 matches

Site Navigation

Mail list logo

Footer information