you can checkout Hadoop**credential class in spark yarn。During spark
submit,it will use config on the classpath.
I wonder how do you reference your own config?
Hi,
Is it possible to use spark.authenticate with shared secrets in mesos
client mode? I can get the driver to start up and expect to use
authenticated channels, but when the executors start up the SecurityManager
outputs "authentication: disabled".
Looking through the
Increase the cores, as you're trying to run multiple threads
Sent from Naga iPad
> On Aug 22, 2017, at 3:26 PM, "u...@moosheimer.com"
> wrote:
>
> Since you didn't post any concrete information it's hard to give you an
> advice.
>
> Try to increase the executor memory
I ran the HdfsWordCount example using this command:
spark-submit run-example \
--conf spark.streaming.dynamicAllocation.enabled=true \
--conf spark.executor.instances=0 \
--conf spark.dynamicAllocation.enabled=false \
--conf spark.master=yarn \
--conf spark.submit.deployMode=client \
Since you didn't post any concrete information it's hard to give you an advice.
Try to increase the executor memory (spark.executor.memory).
If that doesn't help give all the experts in the community a chance to help you
by adding more details like version, logfile, source etc
Mit
I'm trying to understand dynamic allocation in Spark Streaming and Structured
Streaming. It seems if you set spark.dynamicAllocation.enabled=true, both
frameworks use Core's dynamic allocation algorithm -- request executors if the
task backlog is a certain size, and remove executors if they
Hi all,
I am running a spark streaming application on AWS EC2 cluster in standalone
mode. I am using DStreams and Spark 2.0.2
I do have the setting stopGracefullyOnShutdown to true. What is the right
way to stop the streaming application.
Thanks
Any help here will be appreciated.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-OutOfMemory-Error-in-local-mode-tp29081p29096.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi Jake,
There is a another option within the 3rd party projects in the spark
database ecosystem that have combined Spark with a DBMS in such a way that
DataFrame API has been extended to include UPDATE operations
Hello ,
I met this very weird issue, while easy to reproduce, and stuck me for more
than 1 day .I suspect this may be an issue/bug related to the class loader.
Can you help confirm the root cause ?
I want to specify a customized Hadoop configuration set instead of those on the
class path(we
Hi Mich,
Thank you for the explanation, that makes sense, and is helpful for me to
understand the bigger picture between Spark/RDBMS.
Happy to know I’m already following best practice.
Cheers,
Jake
From: Mich Talebzadeh
Date: Monday, August 21, 2017 at 6:44 PM
To:
Kafka rdds need to start from a specified offset, you really don't
want the executors just starting at whatever offset happened to be
latest at the time they ran.
If you need a way to figure out the latest offset at the time the
driver starts up, you can always use a consumer to read the offsets
Hi,
I am trying to read hive orc transaction table through Spark but I am
getting the following error
Caused by: java.lang.RuntimeException: serious problem
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(
OrcInputFormat.java:1021)
at
Jorn,
My question is not about the model type but instead, the spark capability
on reusing any already trained ml model in training a new model.
On Tue, Aug 22, 2017 at 1:13 PM, Jörn Franke wrote:
> Is it really required to have one billion samples for just linear
>
Is it really required to have one billion samples for just linear regression?
Probably your model would do equally well with much less samples. Have you
checked bias and variance if you use much less random samples?
> On 22. Aug 2017, at 12:58, Sea aj wrote:
>
> I have a
I have a large dataframe of 1 billion rows of type LabeledPoint. I tried to
train a linear regression model on the df but it failed due to lack of
memory although I'm using 9 slaves, each with 100gb of ram and 16 cores of
CPU.
I decided to split my data into multiple chunks and train the model in
Hi,
Thanks a lot Burak for the explanation! I appreciate it a lot (and
promise to share the great news far and wide once I get the gist of
the internals myself)
What I miss is what part of Structured Streaming is responsible for
enforcing the semantics of output modes.
Once defined for a
Hello, Joel.
Have you solved the problem which is Java's 32-bit limit on array sizes?
Thanks.
On Wed, Jan 27, 2016 at 2:36 AM, Joel Keller wrote:
> Hello,
>
> I am running RandomForest from mllib on a data-set which has very-high
> dimensional data (~50k dimensions).
>
>
Hi everyone,
I have a huge dataframe with 1 billion rows and each row is a nested list.
That being said, I want to train some ML models on this df but due to the
huge size, I get out memory error on one of my nodes when I run fit
function.
currently, my configuration is:
144 cores, 16 cores for
19 matches
Mail list logo