RE: Spark Read from Google store and save in AWS s3

2017-01-05 Thread Manohar Reddy
understanding right? Manohar From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: Thursday, January 5, 2017 11:05 PM To: Manohar Reddy Cc: user@spark.apache.org Subject: Re: Spark Read from Google store and save in AWS s3 On 5 Jan 2017, at 09:58, Manohar753 <manohar.re...@happiestminds.

Re: Spark Read from Google store and save in AWS s3

2017-01-05 Thread Steve Loughran
On 5 Jan 2017, at 09:58, Manohar753 <manohar.re...@happiestminds.com<mailto:manohar.re...@happiestminds.com>> wrote: Hi All, Using spark is interoperability communication between two clouds(Google,AWS) possible. in my use case i need to take Google store as input to spark

Spark Read from Google store and save in AWS s3

2017-01-05 Thread Manohar753
Hi All, Using spark is interoperability communication between two clouds(Google,AWS) possible. in my use case i need to take Google store as input to spark and do some processing and finally needs to store in S3 and my spark engine runs on AWS Cluster. Please let me back is there any way

Re: Wrting data from Spark streaming to AWS Redshift?

2016-12-11 Thread kant kodali
ns as stated in the bloga shot, but changing mode to append. > > On Sat, Dec 10, 2016 at 8:25 AM, shyla deshpande <deshpandesh...@gmail.com > > wrote: > >> Hello all, >> >> Is it possible to Write data from Spark streaming to AWS Redshift? >> >> I came acros

Re: Wrting data from Spark streaming to AWS Redshift?

2016-12-09 Thread ayan guha
data from Spark streaming to AWS Redshift? > > I came across the following article, so looks like it works from a Spark > batch program. > > https://databricks.com/blog/2015/10/19/introducing- > redshift-data-source-for-spark.html > > I want to write to AWS Redshift from Spark

Wrting data from Spark streaming to AWS Redshift?

2016-12-09 Thread shyla deshpande
Hello all, Is it possible to Write data from Spark streaming to AWS Redshift? I came across the following article, so looks like it works from a Spark batch program. https://databricks.com/blog/2015/10/19/introducing-redshift-data-source-for-spark.html I want to write to AWS Redshift from

Re: How to install spark with s3 on AWS?

2016-08-26 Thread kant kodali
s3.awsAcces sKeyId",AccessKey) hadoopConf.set("fs.s3.awsSecre tAccessKey",SecretKey) var jobInput = sc.textFile("s3://path to bucket") Thanks On Fri, Aug 26, 2016 at 5:16 PM, kant kodali < kanth...@gmail.com > wrote: Hi guys, Are there any instructions on how to setup spark with S3 on AWS? Thanks!

Re: How to install spark with s3 on AWS?

2016-08-26 Thread Devi P.V
th...@gmail.com> wrote: > Hi guys, > > Are there any instructions on how to setup spark with S3 on AWS? > > Thanks! > >

How to install spark with s3 on AWS?

2016-08-26 Thread kant kodali
Hi guys, Are there any instructions on how to setup spark with S3 on AWS? Thanks!

Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-24 Thread Arun Luthra
Also for the record, turning on kryo was not able to help. On Tue, Aug 23, 2016 at 12:58 PM, Arun Luthra wrote: > Splitting up the Maps to separate objects did not help. > > However, I was able to work around the problem by reimplementing it with > RDD joins. > > On Aug

Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-23 Thread Arun Luthra
Splitting up the Maps to separate objects did not help. However, I was able to work around the problem by reimplementing it with RDD joins. On Aug 18, 2016 5:16 PM, "Arun Luthra" wrote: > This might be caused by a few large Map objects that Spark is trying to >

Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-18 Thread Arun Luthra
This might be caused by a few large Map objects that Spark is trying to serialize. These are not broadcast variables or anything, they're just regular objects. Would it help if I further indexed these maps into a two-level Map i.e. Map[String, Map[String, Int]] ? Or would this still count against

Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-15 Thread Arun Luthra
I got this OOM error in Spark local mode. The error seems to have been at the start of a stage (all of the stages on the UI showed as complete, there were more stages to do but had not showed up on the UI yet). There appears to be ~100G of free memory at the time of the error. Spark 2.0.0 200G

Specifying Fixed Duration (Spot Block) for AWS Spark EC2 Cluster

2016-07-04 Thread nsharkey
When I spin up an AWS Spark cluster per the Spark EC2 script: According to AWS: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html#fixed-duration-spot-instances there is a way of reserving for a fixed duration Spot cluster through AWSCLI and the web portal but I can't find

Re: Spark not using all the cluster instances in AWS EMR

2016-06-18 Thread Akhil Das
t only uses one of the > machines(instead of the 3 available) of the cluster. > > Is there any parameter that can be set to force it to use all the cluster. > > I am using AWS EMR with Yarn. > > > Thanks, > Natu > > > > > > > -- Cheers!

Spark not using all the cluster instances in AWS EMR

2016-06-18 Thread Natu Lauchande
Hi, I am running some spark loads . I notice that in it only uses one of the machines(instead of the 3 available) of the cluster. Is there any parameter that can be set to force it to use all the cluster. I am using AWS EMR with Yarn. Thanks, Natu

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-10 Thread Daniel Haviv
Wed, Jun 8, 2016 at 4:34 PM, Daniel Haviv >> <daniel.ha...@veracity-group.com> wrote: >> Hi, >> I'm trying to create a table on s3a but I keep hitting the following error: >> Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: >> MetaEx

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-09 Thread Gourav Sengupta
ption in thread "main" > org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(*message:com.cloudera.com.amazonaws.AmazonClientException: > Unable to load AWS credentials from any provider in the chain*) > > > > I tried setting the s3a keys using the confi

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-09 Thread Steve Loughran
On 9 Jun 2016, at 06:17, Daniel Haviv > wrote: Hi, I've set these properties both in core-site.xml and hdfs-site.xml with no luck. Thank you. Daniel That's not good. I'm afraid I don't know what version of s3a is in the

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Daniel Haviv
; wrote: >> >> Hi, >> I'm trying to create a table on s3a but I keep hitting the following error: >> Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: >> MetaException(message:com.cloudera.com.amazonaws.AmazonClientException: >>

Re: HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Steve Loughran
eption: MetaException(message:com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain) I tried setting the s3a keys using the configuration object but I might be hitting SPARK-11364<https://issues.apache.org/jira/browse/SPARK-11364> :

HiveContext: Unable to load AWS credentials from any provider in the chain

2016-06-08 Thread Daniel Haviv
Hi, I'm trying to create a table on s3a but I keep hitting the following error: Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:com.cloudera.com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the

DIMSUM among 550k objects on AWS Elastic Map Reduce fails with OOM errors

2016-05-27 Thread nmoretto
Hello everyone, I am trying to compute the similarity between 550k objects using the DIMSUM algorithm available in Spark 1.6. The cluster runs on AWS Elastic Map Reduce and consists of 6 r3.2xlarge instances (one master and five cores), having 8 vCPU and 61 GiB of RAM each. My input data

Pls Assist: error when creating cluster on AWS using spark's ec2 scripts

2016-05-17 Thread Marco Mistroni
Hi was wondering if anyone can assist here.. I am trying to create a spark cluster on AWS using scripts located in spark-1.6.1/ec2 directory When the spark_ec2.py scripts tries to do a rsync to copy directories over to teh AWS master node it fails miserably with this stack trace DEBUG:spark ecd

Re: Spark on AWS

2016-05-02 Thread Gourav Sengupta
Hi, I agree with Steve, just start using vanilla SPARK EMR. You can try to see point #4 here for dynamic allocation of executors https://blogs.aws.amazon.com/bigdata/post/Tx6J5RM20WPG5V/Building-a-Recommendation-Engine-with-Spark-ML-on-Amazon-EMR-using-Zeppelin . Note that dynamic allocation of

Re: Spark on AWS

2016-05-01 Thread Teng Qiu
Hi, here we made several optimizations for accessing s3 from spark: https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando such as: https://github.com/apache/spark/compare/branch-1.6...zalando:branch-1.6-zalando#diff-d579db9a8f27e0bbef37720ab14ec3f6R133 you can deploy

Re: Spark on AWS

2016-04-29 Thread Steve Loughran
On 28 Apr 2016, at 22:59, Alexander Pivovarov > wrote: Spark works well with S3 (read and write). However it's recommended to set spark.speculation true (it's expected that some tasks fail if you read large S3 folder, so speculation should

Re: Spark on AWS

2016-04-28 Thread Fatma Ozcan
involved are. But it required lots of tuning work, because we are > clearly under the recommended requirements. 4 of the 5 machines are > switched off during the night, only the bridge machine is alive 24/7. > > 12$ per month in total. > > Renato Perini. > > > Il 28/04/201

Re: Spark on AWS

2016-04-28 Thread Renato Perini
month in total. Renato Perini. Il 28/04/2016 23:39, Fatma Ozcan ha scritto: What is your experience using Spark on AWS? Are you setting up your own Spark cluster, and using HDFS? Or are you using Spark as a service from AWS? In the latter case, what is your experience of using S3 directly, without

Re: Spark on AWS

2016-04-28 Thread Alexander Pivovarov
Fatima, the easiest way to create Spark cluster on AWS is to create EMR cluster and select Spark application. (the latest EMR includes Spark 1.6.1) Spark works well with S3 (read and write). However it's recommended to set spark.speculation true (it's expected that some tasks fail if you read

Spark on AWS

2016-04-28 Thread Fatma Ozcan
What is your experience using Spark on AWS? Are you setting up your own Spark cluster, and using HDFS? Or are you using Spark as a service from AWS? In the latter case, what is your experience of using S3 directly, without having HDFS in between? Thanks, Fatma

Re: Anyone have a tutorial or guide to implement Spark + AWS + Caffe/CUDA?

2016-04-07 Thread jamborta
-a-tutorial-or-guide-to-implement-Spark-AWS-Caffe-CUDA-tp26705p26707.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Re: Testing spark with AWS spot instances

2016-03-27 Thread Alexander Pivovarov
I use spot instances for 100 slaves cluster (r3.2xlarge on us-west-1) Jobs I run usually take about 15 hours - cluster is stable and fast. 1-2 computers might be terminated but it's very rare event and Spark can handle it. On Fri, Mar 25, 2016 at 6:28 PM, Sven Krasser wrote:

Re: Testing spark with AWS spot instances

2016-03-25 Thread Sven Krasser
When a spot instance terminates, you lose all data (RDD partitions) stored in the executors that ran on that instance. Spark can recreate the partitions from input data, but if that requires going through multiple preceding shuffles a good chunk of the job will need to be redone. -Sven On Thu,

Testing spark with AWS spot instances

2016-03-24 Thread Dillian Murphey
I'm very new to apache spark. I'm just a user not a developer. I'm running a cluster with many spot instances. Am I correct in understanding that spark can handle an unlimited number of spot instance failures and restarts? Sometimes all the spot instances will dissapear without warning, and then

Does anyone install netlib-java on AWS EMR Spark?

2016-03-22 Thread greg huang
Hi All, I want to enable the netlib-java feather for Spark ML module base on AWS EMR. But the Spark cluster has install spark default except I install it myself and configure all the cluster. Does anyone have some idea to just enable the netlib-java base on the standard EMR Spark cluster

Restarting an executor during execution causes it to lose AWS credentials (anyone seen this?)

2016-03-20 Thread Allen George
Hi guys, I'm having a problem where respawning a failed executor during a job that reads/writes parquet on S3 causes subsequent tasks to fail because of missing AWS keys. Setup: I'm using Spark 1.5.2 with Hadoop 2.7 and running experiments on a simple standalone cluster: 1 master 2 workers My

Re: Restarting an executor during execution causes it to lose AWS credentials (anyone seen this?)

2016-03-19 Thread Steve Loughran
On 17 Mar 2016, at 16:01, Allen George <allen.geo...@gmail.com<mailto:allen.geo...@gmail.com>> wrote: Hi guys, I'm having a problem where respawning a failed executor during a job that reads/writes parquet on S3 causes subsequent tasks to fail because of missing AWS keys. Setup

Re: Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Gourav Sengupta
, 2016 at 1:20 PM, Afshartous, Nick <nafshart...@turbine.com> wrote: > > Hi, > > > On AWS EMR 4.2 / Spark 1.5.2, I tried the example here > > > > https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables > > > to load data from a file into a

Using Spark SQL / Hive on AWS EMR

2016-03-03 Thread Afshartous, Nick
Hi, On AWS EMR 4.2 / Spark 1.5.2, I tried the example here https://spark.apache.org/docs/1.5.0/sql-programming-guide.html#hive-tables to load data from a file into a Hive table. scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) scala> sqlContext.sql(&

Re: Configure Spark Resource on AWS CLI Not Working

2016-03-01 Thread Jonathan Kelly
/ElasticMapReduce/latest/ReleaseGuide/emr-release-differences.html ~ Jonathan On Fri, Feb 26, 2016 at 6:38 PM Weiwei Zhang <wzhan...@dons.usfca.edu> wrote: > Hi there, > > I am trying to configure memory for spark using AWS CLI. However, I got > the following message: &

Configure Spark Resource on AWS CLI Not Working

2016-02-26 Thread Weiwei Zhang
Hi there, I am trying to configure memory for spark using AWS CLI. However, I got the following message: *A client error (ValidationException) occurred when calling the RunJobFlow operation: Cannot specify args for application 'Spark' when release label is used.* In the aws 'create-cluster

Re: Bad Digest error while doing aws s3 put

2016-02-09 Thread Steve Loughran
> On 9 Feb 2016, at 07:19, lmk wrote: > > Hi Dhimant, > As I had indicated in my next mail, my problem was due to disk getting full > with log messages (these were dumped into the slaves) and did not have > anything to do with the content pushed into s3. So,

Re: Bad Digest error while doing aws s3 put

2016-02-08 Thread lmk
-aws-s3-put-tp10036p26174.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Bad Digest error while doing aws s3 put

2016-02-08 Thread Eugen Cepoi
I had similar problems with multi part uploads. In my case the real error was something else which was being masked by this issue https://issues.apache.org/jira/browse/SPARK-6560. In the end this bad digest exception was a side effect and not the original issue. For me it was some library version

Re: Bad Digest error while doing aws s3 put

2016-02-07 Thread Steve Loughran
> On 7 Feb 2016, at 07:57, Dhimant wrote: > >at > com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.uploadSinglePart(MultipartUploadOutputStream.java:245) >... 15 more > Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The >

Re: Bad Digest error while doing aws s3 put

2016-02-06 Thread Dhimant
in context: http://apache-spark-user-list.1001560.n3.nabble.com/Bad-Digest-error-while-doing-aws-s3-put-tp10036p26167.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread David Russell
your life easier if you do go this route. Once you've fleshed out your ideas I'm sure folks on this mailing list can provide helpful guidance based on their real world experience with Spark. > Does this pave the way into replacing > the need of a pre-instantiated cluster in AWS or

Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread Benjamin Kim
Hi David, My company uses Lamba to do simple data moving and processing using python scripts. I can see using Spark instead for the data processing would make it into a real production level platform. Does this pave the way into replacing the need of a pre-instantiated cluster in AWS or bought

[ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-01 Thread David Russell
Apache Spark package offering seamless integration with the AWS Lambda <https://aws.amazon.com/lambda/> compute service for Spark batch and streaming applications on the JVM. Within traditional Spark deployments RDD tasks are executed using fixed compute resources on worker nodes within

Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Eran Witkon
pport, an >>> interactive UI, security, and job scheduling. >>> >>> Specifically, Databricks runs standard Spark applications inside a >>> user’s AWS account, similar to EMR, but it adds a variety of features to >>> create an end-to-end environment for

Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Rakesh Soni
> > At its core, EMR just launches Spark applications, whereas Databricks is a > higher-level platform that also includes multi-user support, an interactive > UI, security, and job scheduling. > > Specifically, Databricks runs standard Spark applications inside a user’s >

Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Sourav Mazumder
rak...@databricks.com> wrote: > At its core, EMR just launches Spark applications, whereas Databricks is a >> higher-level platform that also includes multi-user support, an interactive >> UI, security, and job scheduling. >> >> Specifically, Databricks runs standard Spark applicatio

Re: Databricks Cloud vs AWS EMR

2016-01-28 Thread Michal Klos
k applications, whereas Databricks is a >>>> higher-level platform that also includes multi-user support, an >>>> interactive UI, security, and job scheduling. >>>> >>>> Specifically, Databricks runs standard Spark applications inside a user’s >&g

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-27 Thread Gourav Sengupta
Hi, It may be interesting to see this. Can you please create a hivecontext (using standard AWS Spark stack on EMR 4.0) and create a table to read the avro file and read data into a dataframe using hivecontext sql? Please let me know if i can be of any help with this. Regards, Gourav On Wed

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-27 Thread Erisa Dervishi
;>> Gourav Sengupta >>> >>> >>> On Tue, Jan 26, 2016 at 1:12 PM, Gourav Sengupta < >>> gourav.sengu...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> are

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Gourav Sengupta
. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. &g

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Erisa Dervishi
>> >>> Hi, >>> >>> are you creating RDD's out of the data? >>> >>> >>> >>> Regards, >>> Gourav >>> >>> On Tue, Jan 26, 2016 at 12:45 PM, aecc <alessandroa...@gmail.com> wrote: >>> >&g

Terminating Spark Steps in AWS

2016-01-26 Thread Daniel Imberman
-list.1001560.n3.nabble.com/Terminating-Spark-Steps-in-AWS-tp26076.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Re: Terminating Spark Steps in AWS

2016-01-26 Thread Jonathan Kelly
without terminating the entire cluster? > > Thank you, > > Daniel > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Terminating-Spark-Ste

Databricks Cloud vs AWS EMR

2016-01-26 Thread Alex Nastetsky
As a user of AWS EMR (running Spark and MapReduce), I am interested in potential benefits that I may gain from Databricks Cloud. I was wondering if anyone has used both and done comparison / contrast between the two services. In general, which resource manager(s) does Databricks Cloud use

Re: Terminating Spark Steps in AWS

2016-01-26 Thread Daniel Imberman
minating the entire cluster? >> >> Thank you, >> >> Daniel >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Terminating-Spark-Steps-in-AWS-tp26076.html >> Sent from the Apache Spark U

Re: newAPIHadoopFile uses AWS credentials from other threads

2016-01-26 Thread Wayne Song
spark-user-list.1001560.n3.nabble.com/newAPIHadoopFile-uses-AWS-credentials-from-other-threads-tp26081p26082.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: use

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Gourav Sengupta
in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > -

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread Erisa Dervishi
8. > The number of partitions used when reading data is 7315. > The maximum size of a file to read is 14G > The size of the folder is around: 270G > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-acces

Re: Spark task hangs infinitely when accessing S3 from AWS

2016-01-26 Thread aecc
Sorry, I have not been able to solve the issue. I used speculation mode as workaround to this. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p26068.html Sent from the Apache Spark User List

Re: Read from AWS s3 with out having to hard-code sensitive keys

2016-01-12 Thread ayan guha
On EMR, you can add fs.* params in emrfs-site.xml. On Tue, Jan 12, 2016 at 7:27 AM, Jonathan Kelly wrote: > Yes, IAM roles are actually required now for EMR. If you use Spark on EMR > (vs. just EC2), you get S3 configuration for free (it goes by the name > EMRFS), and it

Re: Read from AWS s3 with out having to hard-code sensitive keys

2016-01-11 Thread Jonathan Kelly
Yes, IAM roles are actually required now for EMR. If you use Spark on EMR (vs. just EC2), you get S3 configuration for free (it goes by the name EMRFS), and it will use your IAM role for communicating with S3. Here is the corresponding documentation:

Read from AWS s3 with out having to hard-code sensitive keys

2016-01-11 Thread Krishna Rao
Hi all, Is there a method for reading from s3 without having to hard-code keys? The only 2 ways I've found both require this: 1. Set conf in code e.g.: sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "") sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "") 2. Set keys in URL, e.g.:

Re: Read from AWS s3 with out having to hard-code sensitive keys

2016-01-11 Thread Sabarish Sasidharan
If you are on EMR, these can go into your hdfs site config. And will work with Spark on YARN by default. Regards Sab On 11-Jan-2016 5:16 pm, "Krishna Rao" wrote: > Hi all, > > Is there a method for reading from s3 without having to hard-code keys? > The only 2 ways I've

Re: Read from AWS s3 with out having to hard-code sensitive keys

2016-01-11 Thread Matei Zaharia
In production, I'd recommend using IAM roles to avoid having keys altogether. Take a look at http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html. Matei > On Jan 11, 2016, at 11:32 AM, Sabarish Sasidharan > wrote: > > If you are

Re: SparkSQL integration issue with AWS S3a

2016-01-06 Thread Kostiantyn Kudriavtsev
>> spark-submit. let us know if hdfs-site.xml works first. It should. >> >> Best Regards, >> >> Jerry >> >> Sent from my iPhone >> >> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev >> <kudryavtsev.konstan...@gmail.com> wr

Re: SparkSQL integration issue with AWS S3a

2016-01-06 Thread Jerry Lam
ing...@gmail.com> wrote: >>>>> Hi Kostiantyn, >>>>> >>>>> I want to confirm that it works first by using hdfs-site.xml. If yes, you >>>>> could define different spark-{user-x}.conf and source them during >>>>> spark-submit. l

Re: SparkSQL integration issue with AWS S3a

2016-01-02 Thread KOSTIANTYN Kudriavtsev
ould. >> >> Best Regards, >> >> Jerry >> >> Sent from my iPhone >> >> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev < >> kudryavtsev.konstan...@gmail.com> wrote: >> >> Hi Jerry, >> >> I want to run different

Re: SparkSQL integration issue with AWS S3a

2016-01-01 Thread Jerry Lam
let us know if hdfs-site.xml works first. It should. >> >> Best Regards, >> >> Jerry >> >> Sent from my iPhone >> >>> On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev >>> <kudryavtsev.konstan...@gmail.com> wrote: >>>

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread Steve Loughran
> On 30 Dec 2015, at 19:31, KOSTIANTYN Kudriavtsev > <kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds - > on the same instances. Could you shed some light if it's possible t

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread Brian London
; > > Hi Jerry, > > > > I want to run different jobs on different S3 buckets - different AWS > creds - on the same instances. Could you shed some light if it's possible > to achieve with hdfs-site? > > > > Thank you, > > Konstantin Kudryavtsev > > >

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread KOSTIANTYN Kudriavtsev
uld. > > Best Regards, > > Jerry > > Sent from my iPhone > > On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds >

Re: SparkSQL integration issue with AWS S3a

2015-12-31 Thread KOSTIANTYN Kudriavtsev
> Sent from my iPhone > > On 30 Dec, 2015, at 2:31 pm, KOSTIANTYN Kudriavtsev < > kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds > - on the same instances. Could you shed s

SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.amazonaws.auth.AWSCred

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Chris Fregly
ity-credentials/ > 15/12/30 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load > credentials from InstanceProfileCredentialsProvider: The requested metadata > is not found at http://x.x.x.x/latest/meta-data/iam/security-credentials/ > 15/12/30 17:00:32 ERROR Exe

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
/latest/meta-data/iam/security-credentials/ > 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> > 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) > com.amazonaws.AmazonClientException: Unable to load AWS credentials from any > provider in

SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.amazonaws.auth.AWSCred

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Blaž Šnuderl
a is not found at > http://x.x.x.x/latest/meta-data/iam/security-credentials/ > 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> > 17:00:32 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 3) > com.amazonaws.AmazonC

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
http://x.x.x.x/latest/meta-data/iam/security-credentials/ >> 15/12/30 <http://x.x.x.x/latest/meta-data/iam/security-credentials/15/12/30> >> 17:00:32 DEBUG AWSCredentialsProviderChain: Unable to load credentials from >> InstanceProfileCredentialsProvider: The requested me

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Chris Fregly
couple things: 1) switch to IAM roles if at all possible - explicitly passing AWS credentials is a long and lonely road in the end 2) one really bad workaround/hack is to run a job that hits every worker and writes the credentials to the proper location (~/.awscredentials or whatever) ^^ i

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
;ch...@fregly.com> wrote: > couple things: > > 1) switch to IAM roles if at all possible - explicitly passing AWS > credentials is a long and lonely road in the end > > 2) one really bad workaround/hack is to run a job that hits every worker > and writes the credentials to the proper

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Jerry Lam
Kudryavtsev > >> On Wed, Dec 30, 2015 at 1:48 PM, Chris Fregly <ch...@fregly.com> wrote: >> couple things: >> >> 1) switch to IAM roles if at all possible - explicitly passing AWS >> credentials is a long and lonely road in the end >> >&g

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread KOSTIANTYN Kudriavtsev
Hi Jerry, I want to run different jobs on different S3 buckets - different AWS creds - on the same instances. Could you shed some light if it's possible to achieve with hdfs-site? Thank you, Konstantin Kudryavtsev On Wed, Dec 30, 2015 at 2:10 PM, Jerry Lam <chiling...@gmail.com> wrote:

Re: SparkSQL integration issue with AWS S3a

2015-12-30 Thread Jerry Lam
pm, KOSTIANTYN Kudriavtsev > <kudryavtsev.konstan...@gmail.com> wrote: > > Hi Jerry, > > I want to run different jobs on different S3 buckets - different AWS creds - > on the same instances. Could you shed some light if it's possible to achieve > with hdfs-site

Re: which aws instance type for shuffle performance

2015-12-18 Thread Andrew Or
Hi Rastan, Unless you're using off-heap memory or starting multiple executors per machine, I would recommend the r3.2xlarge option, since you don't actually want gigantic heaps (100GB is more than enough). I've personally run Spark on a very large scale with r3.8xlarge instances, but I've been

Re: which aws instance type for shuffle performance

2015-12-18 Thread Alexander Pivovarov
Andrew, it's going to be 4 execotor jvms on each r3.8xlarge. Rastan, you can run quick test using emr spark cluster on spot instances and see what configuration works better. Without the tests it is all speculation. On Dec 18, 2015 1:53 PM, "Andrew Or" wrote: > Hi Rastan,

which aws instance type for shuffle performance

2015-12-15 Thread Rastan Boroujerdi
I'm trying to determine whether I should be using 10 r3.8xlarge or 40 r3.2xlarge. I'm mostly concerned with shuffle performance of the application. If I go with r3.8xlarge I will need to configure 4 worker instances per machine to keep the JVM size down. The worker instances will likely contend

Re: AWS CLI --jars comma problem

2015-12-07 Thread Akhil Das
Not a direct answer but you can create a big fat jar combining all the classes in the three jars and pass it. Thanks Best Regards On Thu, Dec 3, 2015 at 10:21 PM, Yusuf Can Gürkan <yu...@useinsider.com> wrote: > Hello > > I have a question about AWS CLI for people who use i

Task hung on SocketInputStream.socketRead0 when reading large a mount of data from AWS S3

2015-12-07 Thread Sa Xiao
Hi, We encounter a problem very similar to this one: https://www.mail-archive.com/search?l=user@spark.apache.org=subject:%22Spark+task+hangs+infinitely+when+accessing+S3+from+AWS%22=newest=1 When reading large amount of data from S3, one or several tasks hung. It doesn't happen every time

AWS CLI --jars comma problem

2015-12-03 Thread Yusuf Can Gürkan
Hello I have a question about AWS CLI for people who use it. I create a spark cluster with aws cli and i’m using spark step with jar dependencies. But as you can see below i can not set multiple jars because AWS CLI replaces comma with space in ARGS. Is there a way of doing it? I can accept

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread aecc
Any hints? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25365.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread Michael Cutler
input paths requires an AWS S3 API call to list everything based on the common-prefix; so if your input is something like; s3://my-bucket*.json Then the prefix "///" will be passed to the API and should be fairly efficient. However if you're doing something more adventurous like;

Re: Spark task hangs infinitely when accessing S3 from AWS

2015-11-12 Thread aecc
/Spark-task-hangs-infinitely-when-accessing-S3-from-AWS-tp25289p25367.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e

Spark-csv error on read AWS s3a in spark 1.4.1

2015-11-10 Thread Zhang, Jingyu
A small csv file in S3. I use s3a://key:seckey@bucketname/a.csv It works for SparkContext pixelsStr: SparkContext = ctx.textFile(s3pathOrg); It works for Java Spark-csv as well Java code : DataFrame careerOneDF = sqlContext.read().format( "com.databricks.spark.csv")

<    1   2   3   >