Connect to two different HDFS servers with different usernames

2016-02-03 Thread Wayne Song
Is there any way to get data from HDFS (e.g. with sc.textFile) with two separate usernames in the same Spark job? For instance, if I have a file on hdfs-server-1.com and the alice user has permission to view it, and I have a file on hdfs-server-2.com and the bob user has permission to view it,

Re: newAPIHadoopFile uses AWS credentials from other threads

2016-01-26 Thread Wayne Song
Hmmm, I seem to be able to get around this by setting hadoopConf1.setBoolean("fs.s3n.impl.disable.cache", true) in my code. Is there anybody more familiar with Hadoop who can confirm that the filesystem cache would cause this issue? -- View this message in context:

Re: Exceptions in threads in executor code don't get caught properly

2015-09-03 Thread Wayne Song
s. > > Something like this: > > myRdd.map(x => try{ //something }catch{ case e:Exception => > log.error("Whoops!! :" + e) }) > > > > > Thanks > Best Regards > > On Tue, Sep 1, 2015 at 1:22 AM, Wayne Song <wayne.e.s...@gmail.com> wrote: > >

Exceptions in threads in executor code don't get caught properly

2015-08-31 Thread Wayne Song
We've been running into a situation where exceptions in rdd.map() calls will not get recorded and shown on the web UI properly. We've discovered that this seems to occur because we're creating our own threads in foreachPartition() calls. If I have code like this: The tasks on the executors

Getting java.net.BindException when attempting to start Spark master on EC2 node with public IP

2015-07-27 Thread Wayne Song
Hello, I am trying to start a Spark master for a standalone cluster on an EC2 node. The CLI command I'm using looks like this: Note that I'm specifying the --host argument; I want my Spark master to be listening on a specific IP address. The host that I'm specifying (i.e. 54.xx.xx.xx) is the