Hi all,
I'm trying to troubleshoot an ExecutorLostFailure issue.
In Spark UI I noticed that executors tab only list active executors, is
there any way that I can see the log for dead executors so that I can find
out why it's dead/lost?
I'm using Spark 1.5.2 on YARN 2.7.1.
Thanks!
Nisrina
Hi all,
I have a python Spark application that I'm running using spark-submit in
yarn-cluster mode.
If I run ps -aux | grep in the submitter node, I can
find the client process that submitted the application, usually with around
300-600 MB memory use (%MEM around 1.0-2.0 in a node with 30 GB
Hi all,
I'm using spark sql in python and want to write a udf that takes an entire
Row as the argument.
I tried something like:
def functionName(row):
...
return a_string
udfFunctionName=udf(functionName, StringType())
df.withColumn('columnName', udfFunctionName('*'))
but this gives an
> http://docs.aws.amazon.com/kms/latest/developerguide/services-emr.html#emrfs-encrypt
>
>
>
> If this has changed, I’d love to know, but I’m pretty sure it hasn’t.
>
>
>
> The alternative is to write to HDFS, then copy the data across in bulk.
>
>
>
> Thanks,
>
>
Hi all,
I'm trying to save a spark application output to a bucket in S3. The data
is supposed to be encrypted with S3's server side encryption using KMS
mode, which typically (using java api/cli) would require us to pass the
sse-kms key when writing the data. I currently have not found a way to
, Jacek Laskowski <ja...@japila.pl> wrote:
> On Fri, Nov 27, 2015 at 12:12 PM, Nisrina Luthfiyati <
> nisrina.luthfiy...@gmail.com> wrote:
>
>> Hi all,
>> I'm trying to understand how yarn-client mode works and found these two
>> diagrams:
>>
>>
>
Hi all,
I'm trying to understand how yarn-client mode works and found these two
diagrams:
In the first diagram, it looks like the driver running in client directly
communicates with executors to issue application commands, while in the
second diagram it looks like application commands is sent
bsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Ltd, its subsidiaries nor their employees
> accept any responsibility.
>
>
>
> *From:* Nisrina
Hi all,
I'm running some spark jobs in java on top of YARN by submitting one
application jar that starts multiple jobs.
My question is, if I'm setting some resource configurations, either when
submitting the app or in spark-defaults.conf, would this configs apply to
each job or the entire
Got it. Thanks!
On Nov 5, 2015 12:32 AM, "Sandy Ryza" <sandy.r...@cloudera.com> wrote:
> Hi Nisrina,
>
> The resources you specify are shared by all jobs that run inside the
> application.
>
> -Sandy
>
> On Wed, Nov 4, 2015 at 9:24 AM, Nisrina Luthfiyati
that does what you
need; stop/start; or if your batch duration isn't too small, you could run
it as a series of RDDs (using the existing KafkaUtils.createRDD) where the
set of topics is determined before each rdd.
On Thu, Aug 13, 2015 at 4:38 AM, Nisrina Luthfiyati
nisrina.luthfiy
Hi all,
I want to write a Spark Streaming program that listens to Kafka for a list
of topics.
The list of topics that I want to consume is stored in a DB and might
change dynamically. I plan to periodically refresh this list of topics in
the Spark Streaming app.
My question is is it possible to
On May 15, 2015, at 9:59 AM, ayan guha guha.a...@gmail.com wrote:
Hi
Do you have a cut off time, like how late an event can be? Else, you may
consider a different persistent storage like Cassandra/Hbase and delegate
update: part to them.
On Fri, May 15, 2015 at 8:10 PM, Nisrina Luthfiyati
Hi all,
I have a stream of data from Kafka that I want to process and store in hdfs
using Spark Streaming.
Each data has a date/time dimension and I want to write data within the
same time dimension to the same hdfs directory. The data stream might be
unordered (by time dimension).
I'm wondering
Hi all,
I'm new to Spark so I'm sorry if the question is too vague. I'm currently
trying to deploy a Spark cluster using YARN on an amazon EMR cluster. For
the data storage I'm currently using S3 but would loading the data in HDFS
from local node gives considerable performance advantage over
15 matches
Mail list logo