Thanks for the reply. I also found that sparkConf.setExecutorEnv works for
yarn.
From: Saisai Shao <sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>>
Date: Wednesday, February 17, 2016 at 6:02 PM
To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Cc
I've been trying to set some environment variables for the spark executors but
haven't had much like. I tried editting conf/spark-env.sh but it doesn't get
through to the executors. I'm running 1.6.0 and yarn, any pointer is
appreciated.
Thanks,
Lin
tored ${storeCount.get()} messages to spark}")
}
From: Iulian DragoČ™
<iulian.dra...@typesafe.com<mailto:iulian.dra...@typesafe.com>>
Date: Thursday, January 28, 2016 at 5:33 AM
To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Cc: "user@spark.apache.or
I'm seeing this error in the driver when running a streaming job. Not sure If
it's critical.
It happens maybe half of time checkpoint is saved. There are retries in the log
but sometimes results in "Could not write checkpoint for time 145400632 ms
to file
I have an actor receiver that reads data and calls "store()" to save data to
spark. I was hoping spark.streaming.receiver.maxRate and
spark.streaming.backpressure would help me block the method when needed to
avoid overflowing the pipeline. But it doesn't. My actor pumps millions of
lines to
One solution is to read the scheduling delay and my actor can go to sleep if
needed. Is this possible?
From: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Date: Wednesday, January 27, 2016 at 5:28 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>&q
I'm using mapWithState, and hit
https://issues.apache.org/jira/browse/SPARK-12591. While 1.6.1 is not released,
I tried the workaround in the comment. But I had these error in one of the
nodes. While millions of events go throught the mapWithState, only 7 show up in
the log. Is this related to
Hi Silvio,
Can you go into a little detail how the back pressure work? Does it block
the receiver? Or does it temporarily saves the incoming messages in
mem/disk? I have a custom actor receiver that uses store() to save dataa
to spark. Would the back pressure make store() call block?
On 1/17/16,
When the state is passed to the task that handles a mapWithState for a
particular key, if the key is distributed, it seems extremely difficult to
coordinate and synchronise the state. Is there a partition by key before a
mapWithState? If not what exactly is the execution model?
Thanks,
Lin
I have requirement to route a paired DStream to a series of map and flatMap
such that entries with the same key goes to the same thread within the same
batch. Closest I can come up with is groupByKey().flatMap(_._2). But this kills
throughput by 50%.
When I think about it groupByKey is more
anager: Removing RDD 41
16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 40
16/01/15 00:31:31 INFO storage.BlockManager: Removing RDD 39
From: "Shixiong(Ryan) Zhu"
<shixi...@databricks.com<mailto:shixi...@databricks.com>>
Date: Thursday, January 14, 2016 at 4:13
<mailto:shixi...@databricks.com>>
Date: Thursday, January 14, 2016 at 4:41 PM
To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Spark Streaming: custom actor receiver losing vast majority of data
Hi,
I'm testing spark streaming with actor receiver. The actor keeps calling
store() to save a pair to Spark.
Once the job is launched, on the UI everything looks good. Millions of events
gets through every batch. However, I added logging to the first step and found
that only 20 or 40 events
My job runs fine in yarn cluster mode but I have reason to use client mode
instead. But I'm hitting this error when submitting:
> spark-submit --class com.exabeam.martini.scripts.SparkStreamingTest --master
> yarn --deploy-mode client --executor-memory 90G --num-executors 3
> --executor-cores
I have a need to route the dstream through the streming pipeline by some key,
such that data with the same key always goes through the same executor.
There doesn't seem to be a way to do manual routing with Spark Streaming. The
closest I can come up with is:
stream.foreachRDD {rdd =>
cpu:memory
ratio.
From: Tathagata Das <t...@databricks.com<mailto:t...@databricks.com>>
Date: Thursday, January 7, 2016 at 1:56 PM
To: Lin Zhao <l...@exabeam.com<mailto:l...@exabeam.com>>
Cc: user <user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re:
I tried to build 1.6.0 for yarn and scala 2.11, but have an error. Any help is
appreciated.
[warn] Strategy 'first' was applied to 2 files
[info] Assembly up to date:
/Users/lin/git/spark/network/yarn/target/scala-2.11/spark-network-yarn-1.6.0-hadoop2.7.1.jar
java.lang.IllegalStateException:
.
From: Steve Loughran <ste...@hortonworks.com>
Sent: Saturday, October 24, 2015 5:15 AM
To: Lin Zhao
Cc: user@spark.apache.org
Subject: Re: "Failed to bind to" error with spark-shell on CDH5 and YARN
On 24 Oct 2015, at 00:46, Lin Zhao <l...@exabeam.com<mailto:l...@ex
18 matches
Mail list logo