I remembered there was a PR about doing similar thing (
https://github.com/apache/spark/pull/18406). From my understanding, this
seems like a quite specific requirement, it may requires code change to
support your needs.
Thanks
Saisai
Sergey Zhemzhitsky 于2019年5月4日周六 下午4:44写道:
> Hello Spark
We are happy to announce the availability of Spark 2.3.2!
Apache Spark 2.3.2 is a maintenance release, based on the branch-2.3
maintenance branch of Spark. We strongly recommend all 2.3.x users to
upgrade to this stable release.
To download Spark 2.3.2, head over to the download page:
In Spark on YARN, error code 13 means SparkContext doesn't initialize in
time. You can check the yarn application log to get more information.
BTW, did you just write a plain python script without creating
SparkContext/SparkSession?
Aakash Basu 于2018年6月8日周五 下午4:15写道:
> Hi,
>
> I'm trying to
t is delayed), which will
lead to unexpected results.
thomas lavocat 于2018年6月5日周二
下午7:48写道:
>
> On 05/06/2018 13:44, Saisai Shao wrote:
>
> You need to read the code, this is an undocumented configuration.
>
> I'm on it right now, but, Spark is a big piece of software.
>
;
> On 05/06/2018 11:24, Saisai Shao wrote:
>
> spark.streaming.concurrentJobs is a driver side internal configuration,
> this means that how many streaming jobs can be submitted concurrently in
> one batch. Usually this should not be configured by user, unless you're
> familiar
spark.streaming.concurrentJobs is a driver side internal configuration,
this means that how many streaming jobs can be submitted concurrently in
one batch. Usually this should not be configured by user, unless you're
familiar with Spark Streaming internals, and know the implication of this
No, the underlying of DStream is RDD, so it will not leverage any SparkSQL
related feature. I think you should use Structured Streaming instead, which
is based on SparkSQL.
Khaled Zaouk 于2018年5月2日周三 下午4:51写道:
> Hi,
>
> I have a question regarding the execution engine of
Maybe you can try Livy (http://livy.incubator.apache.org/).
Thanks
Jerry
2018-04-11 15:46 GMT+08:00 杜斌 :
> Hi,
>
> Is there any way to submit some code segment to the existing SparkContext?
> Just like a web backend, send some user code to the Spark to run, but the
> initial
>
> In yarn mode, only two executor are assigned to process the task, since
> one executor can process one task only, they need 6 min in total.
>
This is not true. You should set --executor-cores/--num-executors to
increase the task parallelism for executor. To be fair, Spark application
should
pp there would be great.
> Thanks
>
> Jorge Machado
>
>
>
>
>
> On 23 Mar 2018, at 07:38, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> I think you can build your own Accumulo credential provider as similar to
> HadoopDelegationTokenProvider out of Spark,
I think you can build your own Accumulo credential provider as similar to
HadoopDelegationTokenProvider out of Spark, Spark already provided an
interface "ServiceCredentialProvider" for user to plug-in customized
credential provider.
Thanks
Jerry
2018-03-23 14:29 GMT+08:00 Jorge Machado
AFAIK, there's no large scale test for Hadoop 3.0 in the community. So it
is not clear whether it is supported or not (or has some issues). I think
in the download page "Pre-Built for Apache Hadoop 2.7 and later" mostly
means that it supports Hadoop 2.7+ (2.8...), but not 3.0 (IIUC).
Thanks
Jerry
I guess you're using Capacity Scheduler with DefaultResourceCalculator,
which doesn't count cpu cores into resource calculation, this "1" you saw
is actually meaningless. If you want to also calculate cpu resource, you
should choose DominantResourceCalculator.
Thanks
Jerry
On Sat, Sep 9, 2017 at
I think spark.yarn.am.port is not used any more, so you don't need to
consider this.
If you're running Spark on YARN, I think some YARN RM port to submit
applications should also be reachable via firewall, as well as HDFS port to
upload resources.
Also in the Spark side, executors will be
You could set "spark.jars.packages" in `conf` field of session post API (
https://github.com/apache/incubator-livy/blob/master/docs/rest-api.md#post-sessions).
This is equal to --package in spark-submit.
BTW you'd better ask livy question in u...@livy.incubator.apache.org.
Thanks
Jerry
On Thu,
Can you please post the specific problem you met?
Thanks
Jerry
On Sat, Aug 19, 2017 at 1:49 AM, Anshuman Kumar
wrote:
> Hello,
>
> I have recently installed Sparks 2.2.0, and trying to use it for some big
> data processing. Spark is installed on a server that I
Please see the reason in this thread (
https://github.com/apache/spark/pull/14340). It would better to use
structured streaming instead.
So I would like to -1 this patch. I think it's been a mistake to support
> dstream in Python -- yes it satisfies a checkbox and Spark could claim
> there's
Spark running with standalone cluster manager currently doesn't support
accessing security Hadoop. Basically the problem is that standalone mode
Spark doesn't have the facility to distribute delegation tokens.
Currently only Spark on YARN or local mode supports security Hadoop.
Thanks
Jerry
On
Current Spark doesn't support impersonate different users at run-time.
Current Spark's proxy user is application level, which means when setting
through --proxy-user the whole application will be running with that user.
On Thu, May 4, 2017 at 5:13 PM, matd wrote:
> Hi folks,
AFAIK, I don't think the off-heap memory settings is enabled automatically,
there're two configurations control the tungsten off-heap memory usage:
1. spark.memory.offHeap.enabled.
2. spark.memory.offHeap.size.
On Sat, Apr 22, 2017 at 7:44 PM, geoHeil wrote:
> Hi,
filter is not supported. It
> is a bug or expected behavior?
>
> On 14.04.2017 13:22, Saisai Shao wrote:
>
> AFAIK, For the first line, custom filter should be worked. But for the
> latter it is not supported.
>
> On Fri, Apr 14, 2017 at 6:17 PM, Sergey Grigorev <grig
s> *or
> *http://master:6066/v1/submissions/status/driver-20170414025324-
> <http://master:6066/v1/submissions/status/driver-20170414025324-> *return
> successful result. But if I open the spark master web ui then it requests
> username and password.
>
>
> O
Hi,
What specifically are you referring to "Spark API endpoint"?
Filter can only be worked with Spark Live and History web UI.
On Fri, Apr 14, 2017 at 5:18 PM, Sergey wrote:
> Hello all,
>
> I've added own spark.ui.filters to enable basic authentication to access to
>
e
> Caused by: javax.security.auth.login.LoginException: Unable to obtain
> password from user
>
>
> On Fri, Mar 31, 2017 at 9:08 AM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
>> Hi Bill,
>>
>> The exception is from executor side. From the gist you prov
Hi Bill,
The exception is from executor side. From the gist you provided, looks like
the issue is that you only configured java options in driver side, I think
you should also configure this in executor side. You could refer to here (
It's quite obvious your hdfs URL is not complete, please looks at the
exception, your hdfs URI doesn't have host, port. Normally it should be OK
if HDFS is your default FS.
I think the problem is you're running on HDI, in which default FS is wasb.
So here short name without host:port will lead to
IIUC, your scenario is quite like what currently ReliableKafkaReceiver
does. You can only send ack to the upstream source after WAL is persistent,
otherwise because of asynchronization of data processing and data
receiving, there's still a chance data could be lost if you send out ack
before WAL.
I don't think using ManualClock is a right way to fix your problem here in
Spark Streaming.
ManualClock in Spark is mainly used for unit test, it should manually
advance the time to make the unit test work. The usage looks different
compared to the scenario you mentioned.
Thanks
Jerry
On Tue,
I think it should be. These configurations doesn't depend on specific
cluster manager use chooses.
On Tue, Feb 28, 2017 at 4:42 AM, satishl wrote:
> Are spark.speculation and related settings supported on standalone mode?
>
>
>
> --
> View this message in context:
i <paragp...@gmail.com>
wrote:
> Thanks a lot the information!
>
> Is there any reason why EventLoggingListener ignore this event?
>
> *Thanks,*
>
>
> *Parag*
>
> On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
>
&g
AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will
not be written into event-log, I think that's why you cannot get such info
in history server.
On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari
wrote:
> Hi,
>
> I am running spark shell in spark
IIUC Spark doesn't strongly bind to HDFS, it uses a common FileSystem layer
which supports different FS implementations, HDFS is just one option. You
could also use S3 as a backend FS, from Spark's point it is transparent to
different FS implementations.
On Sun, Feb 12, 2017 at 5:32 PM, ayan
Hi Mich,
1. Each user could create a Livy session (batch or interactive), one
session is backed by one Spark application, and the resource quota is the
same as normal spark application (configured by
spark.executor.cores/memory,. etc), and this will be passed to yarn if
running on Yarn. This is
>From my understanding, this memory overhead should include
"spark.memory.offHeap.size", which means off-heap memory size should not be
larger than the overhead memory size when running in yarn.
On Thu, Nov 24, 2016 at 3:01 AM, Koert Kuipers wrote:
> in YarnAllocator i see
You might take a look at this project (https://github.com/vegas-viz/Vegas),
it has Spark integration.
Thanks
Saisai
On Mon, Nov 21, 2016 at 10:23 AM, wenli.o...@alibaba-inc.com <
wenli.o...@alibaba-inc.com> wrote:
> Hi anyone,
>
> is there any easy way for me to do data visualization in spark
n Fri, Oct 21, 2016 at 8:06 AM Li Li <fancye...@gmail.com> wrote:
>
> which log file should I
>
> On Thu, Oct 20, 2016 at 10:02 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
> > Looks like ApplicationMaster is killed by SIGTERM.
> >
> > 16/10/20 18:
Looks like ApplicationMaster is killed by SIGTERM.
16/10/20 18:12:04 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL TERM
16/10/20 18:12:04 INFO yarn.ApplicationMaster: Final app status:
This container may be killed by yarn NodeManager or other processes, you'd
better check yarn log to dig out
Not sure why your code will search Logging class under org/apache/spark,
this should be “org/apache/spark/internal/Logging”, and it changed long
time ago.
On Sun, Oct 16, 2016 at 3:25 AM, Brad Cox wrote:
> I'm experimenting with Spark 2.0.1 for the first time and hitting a
I think security has nothing to do with what API you use, spark sql or RDD
API.
Assuming you're running on yarn cluster (that is the only cluster manager
supports Kerberos currently).
Firstly you need to get Kerberos tgt in your local spark-submit process,
after being authenticated by Kerberos,
dalone?
>
> Why are there 2 ways to get information, REST API and this Sink?
>
>
> Best regards, Vladimir.
>
>
>
>
>
>
> On Mon, Sep 12, 2016 at 3:53 PM, Vladimir Tretyakov <
> vladimir.tretya...@sematext.com> wrote:
>
>> Hello Saisai Shao,
Here is the yarn RM REST API for you to refer (
http://hadoop.apache.org/docs/r2.7.0/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html).
You can use these APIs to query applications running on yarn.
On Sun, Sep 11, 2016 at 11:25 PM, Jacek Laskowski wrote:
> Hi Vladimir,
>
>
oud.com
>
>
> *From:* Sun Rui <sunrise_...@163.com>
> *Date:* 2016-08-24 22:17
> *To:* Saisai Shao <sai.sai.s...@gmail.com>
> *CC:* tony@tendcloud.com; user <user@spark.apache.org>
> *Subject:* Re: Can we redirect Spark shuffle spill data to HDFS or
>
ty, and also there is additional overhead of network I/O and replica
> of HDFS files.
>
> On Aug 24, 2016, at 21:02, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
> Spark Shuffle uses Java File related API to create local dirs and R/W
> data, so it can only be worked with OS suppor
Spark Shuffle uses Java File related API to create local dirs and R/W data,
so it can only be worked with OS supported FS. It doesn't leverage Hadoop
FileSystem API, so writing to Hadoop compatible FS is not worked.
Also it is not suitable to write temporary shuffle data into distributed
FS, this
This looks like Spark application is running into a abnormal status. From
the stack it means driver could not send requests to AM, can you please
check if AM is reachable or are there any other exceptions beside this one.
>From my past test, Spark's dynamic allocation may run into some corner
The implementation inside the Python API and Scala API for RDD is slightly
different, so the difference of RDD lineage you printed is expected.
On Tue, Aug 16, 2016 at 10:58 AM, DEEPAK SHARMA wrote:
> Hi All,
>
>
> Below is the small piece of code in scala and
1. Standalone mode doesn't support accessing kerberized Hadoop, simply
because it lacks the mechanism to distribute delegation tokens via cluster
manager.
2. For the HBase token fetching failure, I think you have to do kinit to
generate tgt before start spark application (
I guess you're mentioning about spark assembly uber jar. In Spark 2.0,
there's no uber jar, instead there's a jars folder which contains all jars
required in the run-time. For the end user it is transparent, the way to
submit spark application is still the same.
On Wed, Aug 3, 2016 at 4:51 PM,
Use dominant resource calculator instead of default resource calculator
will get the expected vcores as you wanted. Basically by default yarn does
not honor cpu cores as resource, so you will always see vcore is 1 no
matter what number of cores you set in spark.
On Wed, Aug 3, 2016 at 12:11 PM,
>
> java.lang.NoClassDefFoundError: spray/json/JsonReader
>
> at
> com.memsql.spark.pushdown.MemSQLPhysicalRDD$.fromAbstractQueryTree(MemSQLPhysicalRDD.scala:95)
>
> at
> com.memsql.spark.pushdown.MemSQLPushdownStrategy.apply(MemSQLPushdownStrategy.scala:49)
>
Several useful information can be found here (
https://issues.apache.org/jira/browse/YARN-1842), though personally I
haven't met this problem before.
Thanks
Saisai
On Tue, Jul 26, 2016 at 2:21 PM, Yu Wei wrote:
> Hi guys,
>
>
> When I tried to shut down spark application
I think both 6066 and 7077 can be worked. 6066 is using the REST way to
submit application, while 7077 is the legacy way. From user's aspect, it
should be transparent and no need to worry about the difference.
- *URL:* spark://hw12100.local:7077
- *REST URL:* spark://hw12100.local:6066
The error stack is throwing from your code:
Caused by: scala.MatchError: [Ljava.lang.String;@68d279ec (of class
[Ljava.lang.String;)
at com.jd.deeplog.LogAggregator$.main(LogAggregator.scala:29)
at com.jd.deeplog.LogAggregator.main(LogAggregator.scala)
I think you should debug
DStream.print() will collect some of the data to driver and display, please
see the implementation of DStream.print()
RDD.take() will collect some of the data to driver.
Normally the behavior should be consistent between cluster and local mode,
please find out the root cause of this problem,
It is not worked to configure local dirs to HDFS. Local dirs are mainly
used for data spill and shuffle data persistence, it is not suitable to use
hdfs. If you met capacity problem, you could configure multiple dirs
located in different mounted disks.
On Wed, Jul 6, 2016 at 9:05 AM, Sri
I think you cannot use sql client in the cluster mode, also for
spark-shell/pyspark which has a repl, all these application can only be
started with client deploy mode.
On Thu, Jun 30, 2016 at 12:46 PM, Mich Talebzadeh wrote:
> Hi,
>
> When you use spark-shell or for
It means several jars are missing in the yarn container environment, if you
want to submit your application through some other ways besides
spark-submit, you have to take care all the environment things yourself.
Since we don't know your implementation of java web service, so it is hard
to provide
spark.yarn.jar (none) The location of the Spark jar file, in case
overriding the default location is desired. By default, Spark on YARN will
use a Spark jar installed locally, but the Spark jar can also be in a
world-readable location on HDFS. This allows YARN to cache it on nodes so
that it
Hi Community,
In Spark 2.0.0 we upgrade to use jersey2 (
https://issues.apache.org/jira/browse/SPARK-12154) instead of jersey 1.9,
while for the whole Hadoop we still stick on the old version. This will
bring in some issues when yarn timeline service is enabled (
It works fine in my local test, I'm using latest master, maybe this bug is
already fixed.
On Wed, Jun 1, 2016 at 7:29 AM, Michael Armbrust
wrote:
> Version of Spark? What is the exception?
>
> On Tue, May 31, 2016 at 4:17 PM, Tim Gautier
> wrote:
I think it is already fixed if your problem is exactly the same as what
mentioned in this JIRA (https://issues.apache.org/jira/browse/SPARK-14423).
Thanks
Jerry
On Wed, May 18, 2016 at 2:46 AM, satish saley
wrote:
> Hello,
> I am executing a simple code with
> .mode(SaveMode.Overwrite)
>From my understanding mode is not supported in continuous query.
def mode(saveMode: SaveMode): DataFrameWriter = {
// mode() is used for non-continuous queries
// outputMode() is used for continuous queries
assertNotStreaming("mode() can only be called on
It is not supported now, currently only filestream is supported.
Thanks
Jerry
On Wed, May 18, 2016 at 10:14 AM, Todd wrote:
> Hi,
> I am wondering whether structured streaming supports Kafka as data source.
> I brief the source code(meanly related with DataSourceRegister
, May 10, 2016 at 4:17 PM, 朱旻 <z...@126.com> wrote:
>
>
> it was a product sold by huawei . name is FusionInsight. it says spark was
> 1.3 with hadoop 2.7.1
>
> where can i find the code or config file which define the files to be
> uploaded?
>
>
>
What is the version of Spark are you using? From my understanding, there's
no code in yarn#client will upload "__hadoop_conf__" into distributed cache.
On Tue, May 10, 2016 at 3:51 PM, 朱旻 wrote:
> hi all:
> I found a problem using spark .
> WHEN I use spark-submit to
same.)
>
> Ideally, it will distributed evenly across the executors, also this is
target for tuning. Normally it depends on several conditions like receiver
distribution, partition distribution.
>
> The issue raises if the amount of streaming data does not fit into these 4
> caches
, Ashok Kumar <ashok34...@yahoo.com> wrote:
> hi,
>
> so if i have 10gb of streaming data coming in does it require 10gb of
> memory in each node?
>
> also in that case why do we need using
>
> dstream.cache()
>
> thanks
>
>
> On Monday, 9 May 2016, 9:5
batch calculation ?
>
>
>
>
>
> At 2016-05-09 15:14:47, "Saisai Shao" <sai.sai.s...@gmail.com> wrote:
>
> For window related operators, Spark Streaming will cache the data into
> memory within this window, in your case your window size is up to 24 hours,
> whi
t;https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 9 May 2016 at 08:14, Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
>> For window related operators, Spark Streaming will cache the dat
For window related operators, Spark Streaming will cache the data into
memory within this window, in your case your window size is up to 24 hours,
which means data has to be in Executor's memory for more than 1 day, this
may introduce several problems when memory is not enough.
On Mon, May 9,
Writing RDD based application using pyspark will bring in additional
overheads, Spark is running on the JVM whereas your python code is running
on python runtime, so data should be communicated between JVM world and
python world, this requires additional serialization-deserialization, IPC.
Also
I guess the problem is that py4j automatically translate the python int
into java int or long according to the value of the data. If this value is
small it will translate to java int, otherwise it will translate into java
long.
But in java code, the parameter must be long type, so that's the
Hi Deepak,
I don't think supervise can be worked with yarn, it is a standalone and
Mesos specific feature.
Thanks
Saisai
On Tue, Apr 5, 2016 at 3:23 PM, Deepak Sharma wrote:
> Hi Rafael
> If you are using yarn as the engine , you can always use RM UI to see the
>
spark.jars.ivy, spark.jars.packages, spark.jars.excludes is the
configurations you can use.
Thanks
Saisai
On Sun, Apr 3, 2016 at 1:59 AM, Russell Jurney
wrote:
> Thanks, Andy!
>
> On Mon, Mar 28, 2016 at 8:44 AM, Andy Davidson <
> a...@santacruzintegration.com> wrote:
eliminate this.
>
>
> On Fri, Apr 1, 2016, 7:25 PM Saisai Shao <sai.sai.s...@gmail.com> wrote:
>
>> Hi Michael, shuffle data (mapper output) have to be materialized into
>> disk finally, no matter how large memory you have, it is the design purpose
>> of Spark. In you scenari
Hi Michael, shuffle data (mapper output) have to be materialized into disk
finally, no matter how large memory you have, it is the design purpose of
Spark. In you scenario, since you have a big memory, shuffle spill should
not happen frequently, most of the disk IO you see might be final shuffle
There's a JIRA (https://issues.apache.org/jira/browse/SPARK-14151) about
it, please take a look.
Thanks
Saisai
On Sat, Apr 2, 2016 at 6:48 AM, Walid Lezzar wrote:
> Hi,
>
> I looked into the spark code at how spark report metrics using the
> MetricsSystem class. I've seen
zhitao_yan
> QQ : 4707059
> 地址:北京市东城区东直门外大街39号院2号楼航空服务大厦602室
> 邮编:100027
>
> ----
> TalkingData.com <http://talkingdata.com/> - 让数据说话
>
>
> *From:* Saisai Shao <sai.sai.s...@gmail.com>
> *Date:* 2016-03-22 18:03
> *To:* tony@tendcloud.com
> *CC:* user <us
I'm afraid currently it is not supported by Spark to submit application
through Yarn REST API. However Yarn AMRMClient is functionally equal to
REST API, not sure which specific features are you referring?
Thanks
Saisai
On Tue, Mar 22, 2016 at 5:27 PM, tony@tendcloud.com <
I guess in local mode you're using local FS instead of HDFS, here the
exception mainly threw from HDFS when running on Yarn, I think it would be
better to check the status and configurations of HDFS to see if it normal
or not.
Thanks
Saisai
On Tue, Mar 22, 2016 at 5:46 PM, Soni spark
If you want to avoid existing job failure while restarting NM, you could
enable work preserving for NM, in this case, the restart of NM will not
affect the running containers (containers can still run). That could
alleviate NM restart problem.
Thanks
Saisai
On Wed, Mar 16, 2016 at 6:30 PM, Alex
You cannot directly invoke Spark application by using yarn#client like what
you mentioned, it is deprecated and not supported. you have to use
spark-submit to submit a Spark application to yarn.
Also here the specific problem is that you're invoking yarn#client to run
spark app as yarn-client
Currently configuration is a part of checkpoint data, and when recovering
from failure, Spark Streaming will fetch the configuration from checkpoint
data, so even if you change the configuration file, recovered Spark
Streaming application will not use it. So from my understanding currently
there's
hedExecutorIdleTimeout=60s, "--conf" was lost
> when I copied it to mail.
>
> -- Forwarded message --
> From: Jy Chen <chen.wah...@gmail.com>
> Date: 2016-03-10 10:09 GMT+08:00
> Subject: Re: Dynamic allocation doesn't work on YARN
> To: Saisai Sh
Would you please send out the configurations of dynamic allocation so we
could know better.
On Wed, Mar 9, 2016 at 4:29 PM, Jy Chen wrote:
> Hello everyone:
>
> I'm trying the dynamic allocation in Spark on YARN. I have followed
> configuration steps and started the
I think the first step is to publish your in-house built Hadoop related
jars to your local maven or ivy repo, and then change the Spark building
profiles like -Phadoop-2.x (you could use 2.7 or you have to change the pom
file if you met jar conflicts) -Dhadoop.version=3.0.0-SNAPSHOT to build
If it is due to heartbeat problem and driver explicitly killed the
executors, there should be some driver logs mentioned about it. So you
could check the driver log about it. Also container (executor) logs are
useful, if this container is killed, then there'll be some signal related
logs, like
You don't have to specify the storage level for direct Kafka API, since it
doesn't require to store the input data ahead of time. Only receiver-based
approach could specify the storage level.
Thanks
Saisai
On Wed, Mar 2, 2016 at 7:08 PM, Vinti Maheshwari
wrote:
> Hi All,
You could set this configuration "auto.offset.reset" through parameter
"kafkaParams" which is provided in some other overloaded APIs of
createStream.
By default Kafka will pick data from latest offset unless you explicitly
set it, this is the behavior Kafka, not Spark.
Thanks
Saisai
On Mon, Feb
IIUC for example you want to set environment FOO=bar in executor side, you
could use "spark.executor.Env.FOO=bar" in conf file, AM will pick this
configuration and set as environment variable through container launching.
Just list all the envs you want to set in executor side like
Hi Divya,
Would you please provide full stack of exception? From my understanding
--executor-cores should be worked, we could know better if you provide the
full stack trace.
The performance relies on many different aspects, I'd recommend you to
check the spark web UI to know the application
saying creating sparkcontext manually in your application
> still works then I'll investigate more on my side. It just before I dig
> more I wanted to know if it was still supported.
>
> Nir
>
> On Thu, Jan 28, 2016 at 7:47 PM, Saisai Shao <sai.sai.s...@gmail.com>
> wrote:
I think I met this problem before, this problem might be due to some race
conditions in exit period. The way you mentioned is still valid, this
problem only occurs when stopping the application.
Thanks
Saisai
On Fri, Jan 29, 2016 at 10:22 AM, Nirav Patel wrote:
> Hi, we
Hi Todd,
There're two levels of locality based scheduling when you run Spark on Yarn
if dynamic allocation enabled:
1. Container allocation is based on the locality ratio of pending tasks,
this is Yarn specific and only works with dynamic allocation enabled.
2. Task scheduling is locality
Any possibility that this file is still written by other application, so
what Spark Streaming processed is an incomplete file.
On Tue, Jan 26, 2016 at 5:30 AM, Shixiong(Ryan) Zhu wrote:
> Did you move the file into "hdfs://helmhdfs/user/patcharee/cerdata/", or
> write
You could try increase the driver memory by "--driver-memory", looks like
the OOM is came from driver side, so the simple solution is to increase the
memory of driver.
On Tue, Jan 19, 2016 at 1:15 PM, Julio Antonio Soto wrote:
> Hi,
>
> I'm having trouble when uploadig spark
Stdout will not be sent back to driver, no matter you use Scala or Java.
You must do something wrongly that makes you think it is an expected
behavior.
On Mon, Dec 28, 2015 at 5:33 PM, David John
wrote:
> I have used Spark *1.4* for 6 months. Thanks all the
ark-1.6.0 on one yarn
> cluster?
>
>
>
> *From:* Saisai Shao [mailto:sai.sai.s...@gmail.com]
> *Sent:* Monday, December 28, 2015 2:29 PM
> *To:* Jeff Zhang
> *Cc:* 顾亮亮; user@spark.apache.org; 刘骋昺
> *Subject:* Re: Opening Dynamic Scaling Executors on Yarn
>
>
&g
Replace all the shuffle jars and restart the NodeManager is enough, no need
to restart NN.
On Mon, Dec 28, 2015 at 2:05 PM, Jeff Zhang wrote:
> See
> http://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation
>
>
>
> On Mon, Dec 28, 2015 at 2:00 PM,
I think SparkContext is thread-safe, you could concurrently submit jobs
from different threads, the problem you hit might not relate to this. Can
you reproduce this issue each time when you concurrently submit jobs, or is
it happened occasionally?
BTW, I guess you're using the old version of
1 - 100 of 190 matches
Mail list logo