Accessing log for lost executors

2016-12-01 Thread Nisrina Luthfiyati
Hi all, I'm trying to troubleshoot an ExecutorLostFailure issue. In Spark UI I noticed that executors tab only list active executors, is there any way that I can see the log for dead executors so that I can find out why it's dead/lost? I'm using Spark 1.5.2 on YARN 2.7.1. Thanks! Nisrina

Client process memory usage

2016-04-14 Thread Nisrina Luthfiyati
Hi all, I have a python Spark application that I'm running using spark-submit in yarn-cluster mode. If I run ps -aux | grep in the submitter node, I can find the client process that submitted the application, usually with around 300-600 MB memory use (%MEM around 1.0-2.0 in a node with 30 GB

Spark SQL - udf with entire row as parameter

2016-03-04 Thread Nisrina Luthfiyati
Hi all, I'm using spark sql in python and want to write a udf that takes an entire Row as the argument. I tried something like: def functionName(row): ... return a_string udfFunctionName=udf(functionName, StringType()) df.withColumn('columnName', udfFunctionName('*')) but this gives an

Re: Write to S3 with server side encryption in KMS mode

2016-01-26 Thread Nisrina Luthfiyati
> http://docs.aws.amazon.com/kms/latest/developerguide/services-emr.html#emrfs-encrypt > > > > If this has changed, I’d love to know, but I’m pretty sure it hasn’t. > > > > The alternative is to write to HDFS, then copy the data across in bulk. > > > > Thanks, > >

Write to S3 with server side encryption in KMS mode

2016-01-26 Thread Nisrina Luthfiyati
Hi all, I'm trying to save a spark application output to a bucket in S3. The data is supposed to be encrypted with S3's server side encryption using KMS mode, which typically (using java api/cli) would require us to pass the sse-kms key when writing the data. I currently have not found a way to

Re: In yarn-client mode, is it the driver or application master that issue commands to executors?

2015-12-07 Thread Nisrina Luthfiyati
, Jacek Laskowski <ja...@japila.pl> wrote: > On Fri, Nov 27, 2015 at 12:12 PM, Nisrina Luthfiyati < > nisrina.luthfiy...@gmail.com> wrote: > >> Hi all, >> I'm trying to understand how yarn-client mode works and found these two >> diagrams: >> >> >

In yarn-client mode, is it the driver or application master that issue commands to executors?

2015-11-27 Thread Nisrina Luthfiyati
Hi all, I'm trying to understand how yarn-client mode works and found these two diagrams: In the first diagram, it looks like the driver running in client directly communicates with executors to issue application commands, while in the second diagram it looks like application commands is sent

Re: In yarn-client mode, is it the driver or application master that issue commands to executors?

2015-11-27 Thread Nisrina Luthfiyati
bsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Ltd, its subsidiaries nor their employees > accept any responsibility. > > > > *From:* Nisrina

Is the resources specified in configuration shared by all jobs?

2015-11-04 Thread Nisrina Luthfiyati
Hi all, I'm running some spark jobs in java on top of YARN by submitting one application jar that starts multiple jobs. My question is, if I'm setting some resource configurations, either when submitting the app or in spark-defaults.conf, would this configs apply to each job or the entire

Re: Is the resources specified in configuration shared by all jobs?

2015-11-04 Thread Nisrina Luthfiyati
Got it. Thanks! On Nov 5, 2015 12:32 AM, "Sandy Ryza" <sandy.r...@cloudera.com> wrote: > Hi Nisrina, > > The resources you specify are shared by all jobs that run inside the > application. > > -Sandy > > On Wed, Nov 4, 2015 at 9:24 AM, Nisrina Luthfiyati

Re: Spark Streaming: Change Kafka topics on runtime

2015-08-14 Thread Nisrina Luthfiyati
that does what you need; stop/start; or if your batch duration isn't too small, you could run it as a series of RDDs (using the existing KafkaUtils.createRDD) where the set of topics is determined before each rdd. On Thu, Aug 13, 2015 at 4:38 AM, Nisrina Luthfiyati nisrina.luthfiy

Spark Streaming: Change Kafka topics on runtime

2015-08-13 Thread Nisrina Luthfiyati
Hi all, I want to write a Spark Streaming program that listens to Kafka for a list of topics. The list of topics that I want to consume is stored in a DB and might change dynamically. I plan to periodically refresh this list of topics in the Spark Streaming app. My question is is it possible to

Re: Grouping and storing unordered time series data stream to HDFS

2015-05-16 Thread Nisrina Luthfiyati
On May 15, 2015, at 9:59 AM, ayan guha guha.a...@gmail.com wrote: Hi Do you have a cut off time, like how late an event can be? Else, you may consider a different persistent storage like Cassandra/Hbase and delegate update: part to them. On Fri, May 15, 2015 at 8:10 PM, Nisrina Luthfiyati

Grouping and storing unordered time series data stream to HDFS

2015-05-15 Thread Nisrina Luthfiyati
Hi all, I have a stream of data from Kafka that I want to process and store in hdfs using Spark Streaming. Each data has a date/time dimension and I want to write data within the same time dimension to the same hdfs directory. The data stream might be unordered (by time dimension). I'm wondering

Performance advantage by loading data from local node over S3.

2015-04-29 Thread Nisrina Luthfiyati
Hi all, I'm new to Spark so I'm sorry if the question is too vague. I'm currently trying to deploy a Spark cluster using YARN on an amazon EMR cluster. For the data storage I'm currently using S3 but would loading the data in HDFS from local node gives considerable performance advantage over