This happens most probably because the Spark 1.3 you have downloaded
is built against an older version of the Hadoop libraries than those
used by CDH, and those libraries cannot parse the container IDs
generated by CDH.
You can try to work around this by manually adding CDH jars to the
front of
Instead of opening a tunnel to the Spark web ui port, could you open a
tunnel to the YARN RM web ui instead? That should allow you to
navigate to the Spark application's web ui through the RM proxy, and
hopefully that will work better.
On Fri, Feb 6, 2015 at 9:08 PM, yangqch
IIRC you have to set that configuration on the Worker processes (for
standalone). The app can't override it (only for a client-mode
driver). YARN has a similar configuration, but I don't know the name
(shouldn't be hard to find, though).
On Thu, Mar 19, 2015 at 11:56 AM, Davies Liu
Those classes are not part of standard Spark. You may want to contact
Hortonworks directly if they're suggesting you use those.
On Wed, Mar 18, 2015 at 3:30 AM, patcharee patcharee.thong...@uni.no wrote:
Hi,
I am using spark 1.3. I would like to use Spark Job History Server. I added
the
Since you're using YARN, you should be able to download a Spark 1.3.0
tarball from Spark's website and use spark-submit from that
installation to launch your app against the YARN cluster.
So effectively you would have 1.2.0 and 1.3.0 side-by-side in your cluster.
On Wed, Mar 18, 2015 at 11:09
I assume you're running YARN given the exception.
I don't know if this is covered in the documentation (I took a quick
look at the config document and didn't see references to it), but you
need to configure Spark's external shuffle service as and auxiliary
nodemanager service in your YARN
+ ALT + V for copying
commands in the shell) and that results in closing my shell. In order to
solve this I was wondering if I just deactivating the CTRL + C combination
at all! Any ideas?
// Adamantios
On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin van...@cloudera.com wrote:
You can type
commands in the shell) and that results in closing my shell. In order to
solve this I was wondering if I just deactivating the CTRL + C combination
at all! Any ideas?
// Adamantios
On Fri, Mar 13, 2015 at 7:37 PM, Marcelo Vanzin van...@cloudera.com wrote:
You can type :quit.
On Fri
You can type :quit.
On Fri, Mar 13, 2015 at 10:29 AM, Adamantios Corais
adamantios.cor...@gmail.com wrote:
Hi,
I want change the default combination of keys that exit the Spark shell
(i.e. CTRL + C) to something else, such as CTRL + H?
Thank you in advance.
// Adamantios
--
Marcelo
On Fri, Mar 6, 2015 at 2:47 PM, nitinkak001 nitinkak...@gmail.com wrote:
I am trying to run a Hive query from Spark using HiveContext. Here is the
code
/ val conf = new SparkConf().setAppName(HiveSparkIntegrationTest)
conf.set(spark.executor.extraClassPath,
It seems from the excerpt below that your cluster is set up to use the
Yarn ATS, and the code is failing in that path. I think you'll need to
apply the following patch to your Spark sources if you want this to
work:
https://github.com/apache/spark/pull/3938
On Thu, Mar 5, 2015 at 10:04 AM, Todd
I've never tried it, but I'm pretty sure in the very least you want
-Pscala-2.11 (not -D).
On Thu, Mar 5, 2015 at 4:46 PM, Night Wolf nightwolf...@gmail.com wrote:
Hey guys,
Trying to build Spark 1.3 for Scala 2.11.
I'm running with the folllowng Maven command;
-DskipTests -Dscala-2.11
Ah, and you may have to use dev/change-version-to-2.11.sh. (Again,
never tried compiling with scala 2.11.)
On Thu, Mar 5, 2015 at 4:52 PM, Marcelo Vanzin van...@cloudera.com wrote:
I've never tried it, but I'm pretty sure in the very least you want
-Pscala-2.11 (not -D).
On Thu, Mar 5, 2015
On Wed, Mar 4, 2015 at 10:08 AM, Srini Karri skarri@gmail.com wrote:
spark.executor.extraClassPath
D:\\Apache\\spark-1.2.1-bin-hadoop2\\spark-1.2.1-bin-hadoop2.4\\bin\\classes
spark.eventLog.dir
D:/Apache/spark-1.2.1-bin-hadoop2/spark-1.2.1-bin-hadoop2.4/bin/tmp/spark-events
, Mar 4, 2015 at 4:10 PM, Marcelo Vanzin van...@cloudera.com
wrote:
Seems like someone set up m2.mines.com as a mirror in your pom file
or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is
in a messed up state).
On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 kpe...@gmail.com wrote
Seems like someone set up m2.mines.com as a mirror in your pom file
or ~/.m2/settings.xml, and it doesn't mirror Spark 1.2 (or does but is
in a messed up state).
On Wed, Mar 4, 2015 at 3:49 PM, kpeng1 kpe...@gmail.com wrote:
Hi All,
I am currently having problem with the maven dependencies for
Spark applications shown in the RM's UI should have an Application
Master link when they're running. That takes you to the Spark UI for
that application where you can see all the information you're looking
for.
If you're running a history server and add
spark.yarn.historyServer.address to your
Weird python errors like this generally mean you have different
versions of python in the nodes of your cluster. Can you check that?
On Tue, Mar 3, 2015 at 4:21 PM, subscripti...@prismalytics.io
subscripti...@prismalytics.io wrote:
Hi Friends:
We noticed the following in 'pyspark' happens when
?
Thanks a lot for the help
-AJ
On Mon, Mar 2, 2015 at 3:50 PM, Marcelo Vanzin van...@cloudera.com wrote:
What are you calling masternode? In yarn-cluster mode, the driver
is running somewhere in your cluster, not on the machine where you run
spark-submit.
The easiest way to get to the Spark UI
What are you calling masternode? In yarn-cluster mode, the driver
is running somewhere in your cluster, not on the machine where you run
spark-submit.
The easiest way to get to the Spark UI when using Yarn is to use the
Yarn RM's web UI. That will give you a link to the application's UI
.compute.amazonaws.com:9026 shows
me all the applications.
Do I have to do anything for the port 8088 or whatever I am seeing at 9026
port is good .Attached is screenshot .
Thanks
AJ
On Mon, Mar 2, 2015 at 4:24 PM, Marcelo Vanzin van...@cloudera.com wrote:
That's the RM's RPC port, not the web UI port
.
--
Kannan
On Thu, Feb 26, 2015 at 6:08 PM, Marcelo Vanzin van...@cloudera.com wrote:
On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah kra...@maprtech.com wrote:
Also, I would like to know if there is a localization overhead when we
use
spark.executor.extraClassPath. Again, in the case
(URLClassLoader.java:355)
...
On Feb 25, 2015, at 5:24 PM, Marcelo Vanzin van...@cloudera.com wrote:
Guava is not in Spark. (Well, long version: it's in Spark but it's
relocated to a different package except for some special classes
leaked through the public API.)
If your app needs
On Fri, Feb 27, 2015 at 1:30 PM, Pat Ferrel p...@occamsmachete.com wrote:
@Marcelo do you mean by modifying spark.executor.extraClassPath on all
workers, that didn’t seem to work?
That's an app configuration, not a worker configuration, so if you're
trying to set it on the worker configuration
On Fri, Feb 27, 2015 at 1:42 PM, Pat Ferrel p...@occamsmachete.com wrote:
I changed in the spark master conf, which is also the only worker. I added a
path to the jar that has guava in it. Still can’t find the class.
Sorry, I'm still confused about what config you're changing. I'm
suggesting
On Wed, Feb 25, 2015 at 8:42 PM, Jim Kleckner j...@cloudphysics.com wrote:
So, should the userClassPathFirst flag work and there is a bug?
Sorry for jumping in the middle of conversation (and probably missing
some of it), but note that this option applies only to executors. If
you're trying to
SPARK_CLASSPATH is definitely deprecated, but my understanding is that
spark.executor.extraClassPath is not, so maybe the documentation needs
fixing.
I'll let someone who might know otherwise comment, though.
On Thu, Feb 26, 2015 at 2:43 PM, Kannan Rajah kra...@maprtech.com wrote:
Hi Dan,
This is a CDH issue, so I'd recommend using cdh-u...@cloudera.org for
those questions.
This is an issue with fixed in recent CM 5.3 updates; if you're not
using CM, or want a workaround, you can manually configure
spark.driver.extraLibraryPath and spark.executor.extraLibraryPath
to
On Thu, Feb 26, 2015 at 5:12 PM, Kannan Rajah kra...@maprtech.com wrote:
Also, I would like to know if there is a localization overhead when we use
spark.executor.extraClassPath. Again, in the case of hbase, these jars would
be typically available on all nodes. So there is no need to localize
Hi Anny,
You could play with creating your own log4j.properties that will write
the output somewhere else (e.g. to some remote mount, or remote
syslog). Sorry, but I don't have an example handy.
Alternatively, if you can use Yarn, it will collect all logs after the
job is finished and make them
You'll need to look at your application's logs. You can use yarn logs
--applicationId [id] to see them.
On Wed, Feb 18, 2015 at 2:39 AM, sachin Singh sachin.sha...@gmail.com wrote:
Hi,
I want to run my spark Job in Hadoop yarn Cluster mode,
I am using below command -
spark-submit --master
Hello,
On Tue, Feb 17, 2015 at 8:53 PM, dgoldenberg dgoldenberg...@gmail.com wrote:
I've tried setting spark.files.userClassPathFirst to true in SparkConf in my
program, also setting it to true in $SPARK-HOME/conf/spark-defaults.conf as
Is the code in question running on the driver or in some
the func1 and func2 from jars that are
already cached into local nodes?
Thanks,
Yitong
2015-02-09 14:35 GMT-08:00 Marcelo Vanzin van...@cloudera.com:
`func1` and `func2` never get serialized. They must exist on the other
end in the form of a class loaded by the JVM.
What gets serialized
`func1` and `func2` never get serialized. They must exist on the other
end in the form of a class loaded by the JVM.
What gets serialized is an instance of a particular closure (the
argument to your map function). That's a separate class. The
instance of that class that is serialized contains
Hi Corey,
When you run on Yarn, Yarn's libraries are placed in the classpath,
and they have precedence over your app's. So, with Spark 1.2, you'll
get Guava 11 in your classpath (with Spark 1.1 and earlier you'd get
Guava 14 from Spark, so still a problem for you).
Right now, the option Markus
Hi Koert,
On Wed, Feb 4, 2015 at 11:35 AM, Koert Kuipers ko...@tresata.com wrote:
do i understand it correctly that on yarn the the customer jars are truly
placed before the yarn and spark jars on classpath? meaning at container
construction time, on the same classloader? that would be great
On Wed, Feb 4, 2015 at 1:12 PM, Koert Kuipers ko...@tresata.com wrote:
about putting stuff on classpath before spark or yarn... yeah you can shoot
yourself in the foot with it, but since the container is isolated it should
be ok, no? we have been using HADOOP_USER_CLASSPATH_FIRST forever with
Hi Corey,
On Wed, Feb 4, 2015 at 12:44 PM, Corey Nolet cjno...@gmail.com wrote:
Another suggestion is to build Spark by yourself.
I'm having trouble seeing what you mean here, Marcelo. Guava is already
shaded to a different package for the 1.2.0 release. It shouldn't be causing
conflicts.
Hi Capitão,
Since you're using CDH, your question is probably more appropriate for
the cdh-u...@cloudera.org list.
The problem you're seeing is most probably an artifact of the way CDH
is currently packaged. You have to add Hive jars manually to you Spark
app's classpath if you want to use the
https://issues.apache.org/jira/browse/SPARK-2356
Take a look through the comments, there are some workarounds listed there.
On Wed, Jan 28, 2015 at 1:40 PM, Wang, Ningjun (LNG-NPV)
ningjun.w...@lexisnexis.com wrote:
Has anybody successfully install and run spark-1.2.0 on windows 2008 R2 or
On Thu, Jan 22, 2015 at 10:21 AM, Sean Owen so...@cloudera.com wrote:
I think a Spark site would have a lot less traffic. One annoyance is
that people can't figure out when to post on SO vs Data Science vs
Cross Validated.
Another is that a lot of the discussions we see on the Spark users
list
Hi Kane,
What's the complete command line you're using to submit the app? Where
to you expect these options to appear?
On Fri, Jan 16, 2015 at 11:12 AM, Kane Kim kane.ist...@gmail.com wrote:
I want to add some java options when submitting application:
--conf
Hi Kane,
Here's the command line you sent me privately:
./spark-1.2.0-bin-hadoop2.4/bin/spark-submit --class
SimpleApp --conf
spark.executor.extraJavaOptions=-XX:+UnlockCommercialFeatures
-XX:+FlightRecorder --master local simpleapp.jar ./test.log
You're running the app in local mode. In that
You're specifying the queue in the spark-submit command line:
--queue thequeue
Are you sure that queue exists?
On Thu, Jan 15, 2015 at 11:23 AM, Manoj Samel manojsamelt...@gmail.com wrote:
Hi,
Setup is as follows
Hadoop Cluster 2.3.0 (CDH5.0)
- Namenode HA
- Resource manager HA
-
As the error message says...
On Wed, Jan 14, 2015 at 3:14 PM, freedafeng freedaf...@yahoo.com wrote:
Error: Cluster deploy mode is currently not supported for python
applications.
Use yarn-client instead of yarn-cluster for pyspark apps.
--
Marcelo
Hi Alessandro,
You can look for a log line like this in your driver's output:
15/01/12 10:51:01 INFO storage.DiskBlockManager: Created local
directory at
/data/yarn/nm/usercache/systest/appcache/application_1421081007635_0002/spark-local-20150112105101-4f3d
If you're deploying your application
Short answer: yes.
Take a look at: http://spark.apache.org/docs/latest/running-on-yarn.html
Look for memoryOverhead.
On Mon, Jan 12, 2015 at 2:06 PM, Michael Albert
m_albert...@yahoo.com.invalid wrote:
Greetings!
My executors apparently are being terminated because they are
running beyond
Hi Manoj,
As long as you're logged in (i.e. you've run kinit), everything should
just work. You can run klist to make sure you're logged in.
On Thu, Jan 8, 2015 at 3:49 PM, Manoj Samel manojsamelt...@gmail.com wrote:
Hi,
For running spark 1.2 on Hadoop cluster with Kerberos, what spark
I ran this with CDH 5.2 without a problem (sorry don't have 5.3
readily available at the moment):
$ HBASE='/opt/cloudera/parcels/CDH/lib/hbase/\*'
$ spark-submit --driver-class-path $HBASE --conf
spark.executor.extraClassPath=$HBASE --master yarn --class
org.apache.spark.examples.HBaseTest
On Thu, Jan 8, 2015 at 4:09 PM, Manoj Samel manojsamelt...@gmail.com wrote:
Some old communication (Oct 14) says Spark is not certified with Kerberos.
Can someone comment on this aspect ?
Spark standalone doesn't support kerberos. Spark running on top of
Yarn works fine with kerberos.
--
On Thu, Jan 8, 2015 at 3:33 PM, freedafeng freedaf...@yahoo.com wrote:
I installed the custom as a standalone mode as normal. The master and slaves
started successfully.
However, I got error when I ran a job. It seems to me from the error message
the some library was compiled against hadoop1,
Disclaimer: this seems more of a CDH question, I'd suggest sending
these to the CDH mailing list in the future.
CDH 5.2 actually has Spark 1.1. It comes with SparkSQL built-in, but
it does not include the thrift server because of incompatibilities
with the CDH version of Hive. To use Hive
Disclaimer: CDH questions are better handled at cdh-us...@cloudera.org.
But the question I'd like to ask is: why do you need your own Spark
build? What's wrong with CDH's Spark that it doesn't work for you?
On Thu, Jan 8, 2015 at 3:01 PM, freedafeng freedaf...@yahoo.com wrote:
Could anyone come
and the user
that runs Spark in our case is a unix ID called mapr (in the mapr group).
Therefore, this can't read my job event logs as shown above.
Thanks,
Michael
-Original Message-
From: Marcelo Vanzin [mailto:van...@cloudera.com]
Sent: 07 January 2015 18:10
To: England, Michael (IT/UK
Nevermind my last e-mail. HDFS complains about not understanding 3777...
On Thu, Jan 8, 2015 at 9:46 AM, Marcelo Vanzin van...@cloudera.com wrote:
Hmm. Can you set the permissions of /apps/spark/historyserver/logs
to 3777? I'm not sure HDFS respects the group id bit, but it's worth a
try. (BTW
Sorry for the noise; but I just remembered you're actually using MapR
(and not HDFS), so maybe the 3777 trick could work...
On Thu, Jan 8, 2015 at 10:32 AM, Marcelo Vanzin van...@cloudera.com wrote:
Nevermind my last e-mail. HDFS complains about not understanding 3777...
On Thu, Jan 8, 2015
Just to add to Sandy's comment, check your client configuration
(generally in /etc/spark/conf). If you're using CM, you may need to
run the Deploy Client Configuration command on the cluster to update
the configs to match the new version of CDH.
On Thu, Jan 8, 2015 at 11:38 AM, Sandy Ryza
This could be cause by many things including wrong configuration. Hard
to tell with just the info you provided.
Is there any reason why you want to use your own Spark instead of the
one shipped with CDH? CDH 5.3 has Spark 1.2, so unless you really need
to run Spark 1.1, you should be better off
This particular case shouldn't cause problems since both of those
libraries are java-only (the scala version appended there is just for
helping the build scripts).
But it does look weird, so it would be nice to fix it.
On Wed, Jan 7, 2015 at 12:25 AM, Aniket Bhatnagar
aniket.bhatna...@gmail.com
The Spark code generates the log directory with 770 permissions. On
top of that you need to make sure of two things:
- all directories up to /apps/spark/historyserver/logs/ are readable
by the user running the history server
- the user running the history server belongs to the group that owns
Spark doesn't really shade akka; it pulls a different build (kept
under the org.spark-project.akka group and, I assume, with some
build-time differences from upstream akka?), but all classes are still
in the original location.
The upgrade is a little more unfortunate than just changing akka,
If you don't specify your own log4j.properties, Spark will load the
default one (from
core/src/main/resources/org/apache/spark/log4j-defaults.properties,
which ends up being packaged with the Spark assembly).
You can easily override the config file if you want to, though; check
the Debugging
On Fri, Dec 19, 2014 at 4:05 PM, Haopu Wang hw...@qilinsoft.com wrote:
My application doesn’t depends on hadoop-client directly.
It only depends on spark-core_2.10 which depends on hadoop-client 1.0.4.
This can be checked by Maven repository at
How many cores / memory do you have available per NodeManager, and how
many cores / memory are you requesting for your job?
Remember that in Yarn mode, Spark launches num executors + 1
containers. The extra container, by default, reserves 1 core and about
1g of memory (more if running in cluster
Hi Anton,
That could solve some of the issues (I've played with that a little
bit). But there are still some areas where this would be sub-optimal,
because Spark still uses system properties in some places and those
are global, not per-class loader.
(SparkSubmit is the biggest offender here, but
it as a public API, but mostly for internal Hive use.
It can give you a few ideas, though. Also, SPARK-3215.
On Thu, Dec 11, 2014 at 5:41 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi Manoj,
I'm not aware of any public projects that do something like that,
except for the Ooyala server which you say
Hi Manoj,
I'm not aware of any public projects that do something like that,
except for the Ooyala server which you say doesn't cover your needs.
We've been playing with something like that inside Hive, though:
On Thu, Dec 11, 2014 at 5:33 PM, Manoj Samel manojsamelt...@gmail.com wrote:
Hi,
Hello,
What do you mean by app that uses 2 cores and 8G of RAM?
Spark apps generally involve multiple processes. The command line
options you used affect only one of them (the driver). You may want to
take a look at similar configuration for executors. Also, check the
documentation:
Hello,
In CDH 5.2 you need to manually add Hive classes to the classpath of
your Spark job if you want to use the Hive integration. Also, be aware
that since Spark 1.1 doesn't really support the version of Hive
shipped with CDH 5.2, this combination is to be considered extremely
experimental.
On
wrote:
Thank you, Marcelo and Sean, mvn install is a good answer for my demands.
-邮件原件-
发件人: Marcelo Vanzin [mailto:van...@cloudera.com]
发送时间: 2014年11月21日 1:47
收件人: yiming zhang
抄送: Sean Owen; user@spark.apache.org
主题: Re: How to incrementally compile spark examples using mvn
Hi
and got weird
errors because some toy version i once build was stuck in my local maven
repo and it somehow got priority over a real maven repo).
On Fri, Dec 5, 2014 at 5:28 PM, Marcelo Vanzin van...@cloudera.com
wrote:
You can set SPARK_PREPEND_CLASSES=1 and it should pick your new mllib
On Tue, Dec 2, 2014 at 11:22 AM, Judy Nash
judyn...@exchange.microsoft.com wrote:
Any suggestion on how can user with custom Hadoop jar solve this issue?
You'll need to include all the dependencies for that custom Hadoop jar
to the classpath. Those will include Guava (which is not included in
Hello,
On Mon, Nov 24, 2014 at 12:07 PM, aecc alessandroa...@gmail.com wrote:
This is the stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task not
serializable: java.io.NotSerializableException: $iwC$$iwC$$iwC$$iwC$AAA
- field (class
On Mon, Nov 24, 2014 at 1:56 PM, aecc alessandroa...@gmail.com wrote:
I checked sqlContext, they use it in the same way I would like to use my
class, they make the class Serializable with transient. Does this affects
somehow the whole pipeline of data moving? I mean, will I get performance
That's an interesting question for which I do not know the answer.
Probably a question for someone with more knowledge of the internals
of the shell interpreter...
On Mon, Nov 24, 2014 at 2:19 PM, aecc alessandroa...@gmail.com wrote:
Ok, great, I'm gonna do do it that way, thanks :). However I
Hello,
What exactly are you trying to see? Workers don't generate any events
that would be logged by enabling that config option. Workers generate
logs, and those are captured and saved to disk by the cluster manager,
generally, without you having to do anything.
On Mon, Nov 24, 2014 at 7:46 PM,
Hi Yiming,
On Wed, Nov 19, 2014 at 5:35 PM, Yiming (John) Zhang sdi...@gmail.com wrote:
Thank you for your reply. I was wondering whether there is a method of
reusing locally-built components without installing them? That is, if I have
successfully built the spark project as a whole, how
Check the --files argument in the output spark-submit -h.
On Thu, Nov 20, 2014 at 7:51 AM, Matt Narrell matt.narr...@gmail.com wrote:
How do I configure the files to be uploaded to YARN containers. So far, I’ve
only seen --conf spark.yarn.jar=hdfs://….” which allows me to specify the
HDFS
Hi Tobias,
With the current Yarn code, packaging the configuration in your app's
jar and adding the -Dlog4j.configuration=log4jConf.xml argument to
the extraJavaOptions configs should work.
That's not the recommended way for get it to work, though, since this
behavior may change in the future.
Hi Anson,
We've seen this error when incompatible classes are used in the driver
and executors (e.g., same class name, but the classes are different
and thus the serialized data is different). This can happen for
example if you're including some 3rd party libraries in your app's
jar, or changing
are using
CDH's version of Spark, not trying to run an Apache Spark release on
top of CDH, right? (If that's the case, then we could probably move
this conversation to cdh-us...@cloudera.org, since it would be
CDH-specific.)
On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote
Can you check in your RM's web UI how much of each resource does Yarn
think you have available? You can also check that in the Yarn
configuration directly.
Perhaps it's not configured to use all of the available resources. (If
it was set up with Cloudera Manager, CM will reserve some room for
I haven't tried scala:cc, but you can ask maven to just build a
particular sub-project. For example:
mvn -pl :spark-examples_2.10 compile
On Sat, Nov 15, 2014 at 5:31 PM, Yiming (John) Zhang sdi...@gmail.com wrote:
Hi,
I have already successfully compile and run spark examples. My problem
Hello,
CDH 5.1.3 ships with a version of Hive that's not entirely the same as
the Hive Spark 1.1 supports. So when building your custom Spark, you
should make sure you change all the dependency versions to point to
the CDH versions.
IIRC Spark depends on org.spark-project.hive:0.12.0, you'd have
On Mon, Oct 27, 2014 at 7:37 PM, buring qyqb...@gmail.com wrote:
Here is error log,I abstract as follows:
INFO [binaryTest---main]: before first
WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver
thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136):
Actually, if you don't call SparkContext.stop(), the event log
information that is used by the history server will be incomplete, and
your application will never show up in the history server's UI.
If you don't use that functionality, then you're probably ok not
calling it as long as your
resource or 2)
add dynamic resource management for Yarn mode is very much wanted.
Jianshi
On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin van...@cloudera.com wrote:
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
That's not something you might want to do usually
Hello there,
This is more of a question for the cdh-users list, but in any case...
In CDH 5.1 we skipped packaging of the Hive module in SparkSQL. That
has been fixed in CDH 5.2, so if it's possible for you I'd recommend
upgrading.
On Thu, Oct 23, 2014 at 2:53 PM, nitinkak001
On Thu, Oct 23, 2014 at 3:40 PM, ankits ankitso...@gmail.com wrote:
2014-10-23 15:39:50,845 ERROR [] Exception in task 1.0 in stage 1.0 (TID 1)
java.io.IOException: org.apache.thrift.protocol.TProtocolException:
This looks like an exception that's happening on an executor and just
being
Hi Ashwin,
Let me try to answer to the best of my knowledge.
On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
Here are my questions :
1. Sharing spark context : How exactly multiple users can share the cluster
using same spark
context ?
That's not
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
That's not something you might want to do usually. In general, a
SparkContext maps to a user application
My question was basically this. In this page in the official doc, under
Scheduling within an application
On top of what Andrew said, you shouldn't need to manually add the
mllib jar to your jobs; it's already included in the Spark assembly
jar.
On Thu, Oct 16, 2014 at 11:51 PM, eric wong win19...@gmail.com wrote:
Hi,
i using the comma separated style for submit multiple jar files in the
follow
Hi Philip,
The assemblies are part of the CDH distribution. You can get them here:
http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html
As of Spark 1.1 (and, thus, CDH 5.2), assemblies are not published to
maven repositories anymore (you can see commit [1] for details).
[1]
Hi Anurag,
Spark SQL (from the Spark standard distribution / sources) currently
requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not
gonna work.
CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can
talk to the Hive 0.13.1 that is also bundled with CDH, so if
Hi Greg,
I'm not sure exactly what it is that you're trying to achieve, but I'm
pretty sure those variables are not supposed to be set by users. You
should take a look at the documentation for
spark.driver.extraClassPath and spark.driver.extraLibraryPath, and
the equivalent options for executors.
Hi Eric,
Check the Debugging Your Application section at:
http://spark.apache.org/docs/latest/running-on-yarn.html
Long story short: upload your log4j.properties using the --files
argument of spark-submit.
(Mental note: we could make the log level configurable via a system property...)
On
You may want to take a look at this PR:
https://github.com/apache/spark/pull/1558
Long story short: while not a terrible idea to show running
applications, your particular case should be solved differently.
Applications are responsible for calling SparkContext.stop() at the
end of their run,
You can't set up the driver memory programatically in client mode. In
that mode, the same JVM is running the driver, so you can't modify
command line options anymore when initializing the SparkContext.
(And you can't really start cluster mode apps that way, so the only
way to set this is through
in a
few different contexts, but I don't think there's an official
solution yet.)
On Wed, Oct 1, 2014 at 9:59 AM, Tamas Jambor jambo...@gmail.com wrote:
thanks Marcelo.
What's the reason it is not possible in cluster mode, either?
On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com
No, you can't instantiate a SparkContext to start apps in cluster mode.
For Yarn, for example, you'd have to call directly into
org.apache.spark.deploy.yarn.Client; that class will tell the Yarn
cluster to launch the driver for you and then instantiate the
SparkContext.
On Wed, Oct 1, 2014 at
301 - 400 of 482 matches
Mail list logo