Hi Francis,
This might be a long shot, but do you happen to have built spark on an
encrypted home dir?
(I was running into the same error when I was doing that. Rebuilding
on an unencrypted disk fixed the issue. This is a known issue /
limitation with ecryptfs. It's weird that the build doesn't
Hi Ian,
When you run your packaged application, are you adding its jar file to
the SparkContext (by calling the addJar() method)?
That will distribute the code to all the worker nodes. The failure
you're seeing seems to indicate the worker nodes do not have access to
your code.
On Mon, Apr 14,
Hi Joe,
If you cache rdd1 but not rdd2, any time you need rdd2's result, it
will have to be computed. It will use rdd1's cached data, but it will
have to compute its result again.
On Mon, Apr 14, 2014 at 5:32 AM, Joe L selme...@yahoo.com wrote:
Hi I am trying to cache 2Gbyte data and to
.)
Thanks,
Ian
On Mon, Apr 14, 2014 at 12:45 PM, Marcelo Vanzin van...@cloudera.com
wrote:
Hi Ian,
When you run your packaged application, are you adding its jar file to
the SparkContext (by calling the addJar() method)?
That will distribute the code to all the worker nodes
Hi Sung,
On Fri, Apr 18, 2014 at 5:11 PM, Sung Hwan Chung
coded...@cs.stanford.edu wrote:
while (true) {
rdd.map((row : Array[Double]) = {
row[numCols - 1] = computeSomething(row)
}).reduce(...)
}
If it fails at some point, I'd imagine that the intermediate info being
stored in
Hi Sung,
On Mon, Apr 21, 2014 at 10:52 AM, Sung Hwan Chung
coded...@cs.stanford.edu wrote:
The goal is to keep an intermediate value per row in memory, which would
allow faster subsequent computations. I.e., computeSomething would depend on
the previous value from the previous computation.
I
Hi Joe,
On Mon, Apr 21, 2014 at 11:23 AM, Joe L selme...@yahoo.com wrote:
And, I haven't gotten any answers to my questions.
One thing that might explain that is that, at least for me, all (and I
mean *all*) of your messages are ending up in my GMail spam folder,
complaining that GMail can't
Hi Ken,
On Mon, Apr 21, 2014 at 1:39 PM, Williams, Ken
ken.willi...@windlogics.com wrote:
I haven't figured out how to let the hostname default to the host mentioned
in our /etc/hadoop/conf/hdfs-site.xml like the Hadoop command-line tools do,
but that's not so important.
Try adding
Hi,
One thing you can do is set the spark version your project depends on
to 1.0.0-SNAPSHOT (make sure it matches the version of Spark you're
building); then before building your project, run sbt publishLocal
on the Spark tree.
On Wed, Apr 30, 2014 at 12:11 AM, wxhsdp wxh...@gmail.com wrote:
i
Have you tried making A extend Serializable?
On Thu, May 1, 2014 at 3:47 PM, SK skrishna...@gmail.com wrote:
Hi,
I have the following code structure. I compiles ok, but at runtime it aborts
with the error:
Exception in thread main org.apache.spark.SparkException: Job aborted:
Task not
Hi Kristoffer,
You're correct that CDH5 only supports up to Java 7 at the moment. But
Yarn apps do not run in the same JVM as Yarn itself (and I believe MR1
doesn't either), so it might be possible to pass arguments in a way
that tells Yarn to launch the application master / executors with the
Is that true? I believe that API Chanwit is talking about requires
explicitly asking for files to be cached in HDFS.
Spark automatically benefits from the kernel's page cache (i.e. if
some block is in the kernel's page cache, it will be read more
quickly). But the explicit HDFS cache is a
the cache.
Ah, yeah, sure. What I meant is that Spark itself will not, AFAIK, use
that facility for adding files to the cache or anything like that. But
yes, it does benefit from things already cached.
On May 12, 2014, at 11:10 AM, Marcelo Vanzin van...@cloudera.com wrote:
Is that true? I believe
Hi Marcin,
On Wed, May 14, 2014 at 7:22 AM, Marcin Cylke
marcin.cy...@ext.allegro.pl wrote:
- This looks like some problems with HA - but I've checked namenodes during
the job was running, and there
was no switch between master and slave namenode.
14/05/14 15:25:44 ERROR
Hey Andrew,
Since we're seeing so many of these e-mails, I think it's worth
pointing out that it's not really obvious to find unsubscription
information for the lists.
The community link on the Spark site
(http://spark.apache.org/community.html) does not have instructions
for unsubscribing; it
On Tue, May 27, 2014 at 1:05 PM, Suman Somasundar
suman.somasun...@oracle.com wrote:
I am running this on a Solaris machine with logical partitions. All the
partitions (workers) access the same Spark folder.
Can you check whether you have multiple versions of the offending
class
Hi Sebastian,
That exception generally means you have the class loaded by two
different class loaders, and some code is trying to mix instances
created by the two different loaded classes.
Do you happen to have that class both in the spark jars and in your
app's uber-jar? That might explain the
Hi Rahul,
I'll just copy paste your question here to aid with context, and
reply afterwards.
-
Can I write the RDD data in excel file along with mapping in
apache-spark? Is that a correct way? Isn't that a writing will be a
local function and can't be passed over the clusters??
Below is
Hello there,
On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin van...@cloudera.com wrote:
workbook = xlsxwriter.Workbook('output_excel.xlsx')
worksheet = workbook.add_worksheet()
data = sc.textFile(xyz.txt)
# xyz.txt is a file whose each line contains string delimited by SPACE
row=0
def
Hi Jamal,
If what you want is to process lots of files in parallel, the best
approach is probably to load all file names into an array and
parallelize that. Then each task will take a path as input and can
process it however it wants.
Or you could write the file list to a file, and then use
)
But except dna.jpeg Lets say, I have millions of dna.jpeg and I want to run
the above logic on all the millions files.
How should I go about this?
Thanks
On Mon, Jun 2, 2014 at 5:09 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi Jamal,
If what you want is to process lots of files in parallel
Ah, not that it should matter, but I'm on Linux and you seem to be on
Windows... maybe there is something weird going on with the Windows
launcher?
On Wed, Jun 11, 2014 at 10:34 AM, Marcelo Vanzin van...@cloudera.com wrote:
Just tried this and it worked fine for me:
./bin/spark-shell --jars
The error is saying that your client libraries are older than what
your server is using (2.0.0-mr1-cdh4.6.0 is IPC version 7).
Try double-checking that your build is actually using that version
(e.g., by looking at the hadoop jar files in lib_managed/jars).
On Wed, Jun 11, 2014 at 2:07 AM, bijoy
Coincidentally, I just ran into the same exception. What's probably
happening is that you're specifying some jar file in your job as an
absolute local path (e.g. just
/home/koert/test-assembly-0.1-SNAPSHOT.jar), but your Hadoop config
has the default FS set to HDFS.
So your driver does not know
Hi Koert,
Could you provide more details? Job arguments, log messages, errors, etc.
On Fri, Jun 20, 2014 at 9:40 AM, Koert Kuipers ko...@tresata.com wrote:
i noticed that when i submit a job to yarn it mistakenly tries to upload
files to local filesystem instead of hdfs. what could cause this?
On Fri, Jun 20, 2014 at 8:22 AM, Koert Kuipers ko...@tresata.com wrote:
thanks! i will try that.
i guess what i am most confused about is why the executors are trying to
retrieve the jars directly using the info i provided to add jars to my spark
context. i mean, thats bound to fail no? i
object in Scala is similar to a class with only static fields /
methods in Java. So when you set its fields in the driver, the
object does not get serialized and sent to the executors; they have
their own copy of the class and its static fields, which haven't been
initialized.
Use a proper class,
Someone might be able to correct me if I'm wrong, but I don't believe
standalone mode supports kerberos. You'd have to use Yarn for that.
On Tue, Jul 8, 2014 at 1:40 AM, 许晓炜 xuxiao...@qiyi.com wrote:
Hi all,
I encounter a strange issue when using spark 1.0 to access hdfs with
Kerberos
I
This is generally a side effect of your executor being killed. For
example, Yarn will do that if you're going over the requested memory
limits.
On Tue, Jul 8, 2014 at 12:17 PM, Rahul Bhojwani
rahulbhojwani2...@gmail.com wrote:
HI,
I am getting this error. Can anyone help out to explain why is
want I can post
my code here.
Thanks
On Wed, Jul 9, 2014 at 12:50 AM, Marcelo Vanzin van...@cloudera.com wrote:
This is generally a side effect of your executor being killed. For
example, Yarn will do that if you're going over the requested memory
limits.
On Tue, Jul 8, 2014 at 12:17 PM
suggest me how to increase the memory
limits or how to tackle this problem. I am a novice. If you want I can post
my code here.
Thanks
On Wed, Jul 9, 2014 at 12:50 AM, Marcelo Vanzin van...@cloudera.com
wrote:
This is generally a side effect of your executor being killed. For
example
Sorry, that would be sc.stop() (not close).
On Tue, Jul 8, 2014 at 1:31 PM, Marcelo Vanzin van...@cloudera.com wrote:
Hi Rahul,
Can you try calling sc.close() at the end of your program, so Spark
can clean up after itself?
On Tue, Jul 8, 2014 at 12:40 PM, Rahul Bhojwani
rahulbhojwani2
:
java.lang.OutOfMemoryError: Java heap space
at java.io.BufferedOutputStream.init(Unknown Source)
at
org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:62)
Can you help in that?
On Wed, Jul 9, 2014 at 2:07 AM, Marcelo Vanzin van...@cloudera.com wrote:
Sorry, that would be sc.stop
That output means you're running in yarn-cluster mode. So your code is
running inside the ApplicationMaster and has no access to the local
terminal.
If you want to see the output:
- try yarn-client mode, then your code will run inside the launcher process
- check the RM web ui and look at the
Have you looked at the slave machine to see if the process has
actually launched? If it has, have you tried peeking into its log
file?
(That error is printed whenever the executors fail to report back to
the driver. Insufficient resources to launch the executor is the most
common cause of that,
On Wed, Jul 16, 2014 at 12:36 PM, Matt Work Coarr
mattcoarr.w...@gmail.com wrote:
Thanks Marcelo, I'm not seeing anything in the logs that clearly explains
what's causing this to break.
One interesting point that we just discovered is that if we run the driver
and the slave (worker) on the
Could you share some code (or pseudo-code)?
Sounds like you're instantiating the JDBC connection in the driver,
and using it inside a closure that would be run in a remote executor.
That means that the connection object would need to be serializable.
If that sounds like what you're doing, it
at 1:21 PM, Marcelo Vanzin van...@cloudera.com wrote:
When I meant the executor log, I meant the log of the process launched
by the worker, not the worker. In my CDH-based Spark install, those
end up in /var/run/spark/work.
If you look at your worker log, you'll see it's launching the executor
sharath.abhis...@gmail.com wrote:
Hello Marcelo Vanzin,
Can you explain bit more on this? I tried using client mode but can you
explain how can i use this port to write the log or output to this
port?Thanks in advance!
--
View this message in context:
http://apache-spark-user-list.1001560.n3
You can upload your own log4j.properties using spark-submit's
--files argument.
On Tue, Jul 22, 2014 at 12:45 PM, abhiguruvayya
sharath.abhis...@gmail.com wrote:
I fixed the error with the yarn-client mode issue which i mentioned in my
earlier post. Now i want to edit the log4j.properties to
The spark log classes are based on the actual class names. So if you
want to filter out a package's logs you need to specify the full
package name (e.g. org.apache.spark.storage instead of just
spark.storage).
On Tue, Jul 22, 2014 at 2:07 PM, abhiguruvayya
sharath.abhis...@gmail.com wrote:
Discussions about how CDH packages Spark aside, you should be using
the spark-class script (assuming you're still in 0.9) instead of
executing Java directly. That will make sure that the environment
needed to run Spark apps is set up correctly.
CDH 5.1 ships with Spark 1.0.0, so it has
Hello,
Try something like this:
scala def newFoo[T]()(implicit ct: ClassTag[T]): T =
ct.runtimeClass.newInstance().asInstanceOf[T]
newFoo: [T]()(implicit ct: scala.reflect.ClassTag[T])T
scala newFoo[String]()
res2: String =
scala newFoo[java.util.ArrayList[String]]()
res5:
There are two problems that might be happening:
- You're requesting more resources than the master has available, so
your executors are not starting. Given your explanation this doesn't
seem to be the case.
- The executors are starting, but are having problems connecting back
to the driver. In
Can you try with -Pyarn instead of -Pyarn-alpha?
I'm pretty sure CDH4 ships with the newer Yarn API.
On Thu, Aug 7, 2014 at 8:11 AM, linkpatrickliu linkpatrick...@live.com wrote:
Hi,
Following the document:
# Cloudera CDH 4.2.0
mvn -Pyarn-alpha -Dhadoop.version=2.0.0-cdh4.2.0 -DskipTests
that ~4.2 is enough
like YARN alpha, which is supported as a one-off as I understand, to
work.
All bets are off before YARN stable really, in my book.
On Thu, Aug 7, 2014 at 6:32 PM, Marcelo Vanzin van...@cloudera.com wrote:
Can you try with -Pyarn instead of -Pyarn-alpha?
I'm pretty sure CDH4
Could you share what's the cluster manager you're using and exactly
where the error shows up (driver or executor)?
A quick look reveals that Standalone and Yarn use different options to
control this, for example. (Maybe that already should be a bug.)
On Mon, Aug 11, 2014 at 12:24 PM, DNoteboom
You could create a copy of the variable inside your Parse class;
that way it would be serialized with the instance you create when
calling map() below.
On Tue, Aug 12, 2014 at 10:56 AM, Sunny Khatri sunny.k...@gmail.com wrote:
Are there any other workarounds that could be used to pass in the
Hi, sorry for the delay. Would you have yarn available to test? Given
the discussion in SPARK-2878, this might be a different incarnation of
the same underlying issue.
The option in Yarn is spark.yarn.user.classpath.first
On Mon, Aug 11, 2014 at 1:33 PM, DNoteboom dan...@wibidata.com wrote:
I'm
On Tue, Aug 19, 2014 at 2:34 PM, Arun Ahuja aahuj...@gmail.com wrote:
/opt/cloudera/parcels/CDH/bin/spark-submit \
--master yarn \
--deploy-mode client \
This should be enough.
But when I view the job 4040 page, SparkUI, there is a single executor (just
the driver node) and I see
On Wed, Aug 20, 2014 at 8:54 AM, Matt Narrell matt.narr...@gmail.com wrote:
An “unaccepted” reply to this thread from Dean Chen suggested to build Spark
with a newer version of Hadoop (2.4.1) and this has worked to some extent.
I’m now able to submit jobs (omitting an explicit
Ah, sorry, forgot to talk about the second issue.
On Wed, Aug 20, 2014 at 8:54 AM, Matt Narrell matt.narr...@gmail.com wrote:
However, now the Spark jobs running in the ApplicationMaster on a given node
fails to find the active resourcemanager. Below is a log excerpt from one
of the assigned
Hi,
On Wed, Aug 20, 2014 at 11:59 AM, Matt Narrell matt.narr...@gmail.com wrote:
Specifying the driver-class-path yields behavior like
https://issues.apache.org/jira/browse/SPARK-2420 and
https://issues.apache.org/jira/browse/SPARK-2848 It feels like opening a
can of worms here if I also
My guess is that your test is trying to serialize a closure
referencing connectionInfo; that closure will have a reference to
the test instance, since the instance is needed to execute that
method.
Try to make the connectionInfo method local to the method where it's
needed, or declare it in an
That command line you mention in your e-mail doesn't look like
something started by Spark. Spark would start one of
ApplicationMaster, ExecutableRunner or CoarseGrainedSchedulerBackend,
not org.apache.hadoop.mapred.YarnChild.
On Wed, Aug 20, 2014 at 6:56 PM, centerqi hu cente...@gmail.com wrote:
Hi Du,
I don't believe the Guava change has made it to the 1.1 branch. The
Guava doc says hashInt was added in 12.0, so what's probably
happening is that you have and old version of Guava in your classpath
before the Spark jars. (Hadoop ships with Guava 11, so that may be the
source of your
The history server (and other Spark daemons) do not read
spark-defaults.conf. There's a bug open to implement that
(SPARK-2098), and an open PR to fix it, but it's still not in Spark.
On Wed, Sep 3, 2014 at 11:00 AM, Zhanfeng Huo huozhanf...@gmail.com wrote:
Hi,
I have seted properties in
local means everything runs in the same process; that means there is
no need for master and worker daemons to start processes.
On Wed, Sep 3, 2014 at 3:12 PM, Ruebenacker, Oliver A
oliver.ruebenac...@altisource.com wrote:
Hello,
If launched with “local” as master, where are master
The only monitoring available is the driver's Web UI, which will
generally be available on port 4040.
On Wed, Sep 3, 2014 at 3:43 PM, Ruebenacker, Oliver A
oliver.ruebenac...@altisource.com wrote:
How can that single process be monitored? Thanks!
-Original Message-
From: Marcelo
On Fri, Sep 5, 2014 at 10:50 AM, Davies Liu dav...@databricks.com wrote:
In daily development, it's common to modify your projects and re-run
the jobs. If using zip or egg to package your code, you need to do
this every time after modification, I think it will be boring.
That's why shell
Hi Davies,
On Fri, Sep 5, 2014 at 1:04 PM, Davies Liu dav...@databricks.com wrote:
In Douban, we use Moose FS[1] instead of HDFS as the distributed file system,
it's POSIX compatible and can be mounted just as NFS.
Sure, if you already have the infrastructure in place, it might be
worthwhile
On Mon, Sep 8, 2014 at 9:35 AM, Dimension Data, LLC.
subscripti...@didata.us wrote:
user$ pyspark [some-options] --driver-java-options
spark.yarn.jar=hdfs://namenode:8020/path/to/spark-assembly-*.jar
This command line does not look correct. spark.yarn.jar is not a JVM
command line option.
On Mon, Sep 8, 2014 at 10:00 AM, Dimension Data, LLC.
subscripti...@didata.us wrote:
user$ export MASTER=local[nn] # Run spark shell on LOCAL CPU threads.
user$ pyspark [someOptions] --driver-java-options -Dspark.*XYZ*.jar='
/usr/lib/spark/assembly/lib/spark-assembly-*.jar'
My question is,
On Mon, Sep 8, 2014 at 11:52 AM, Dimension Data, LLC.
subscripti...@didata.us wrote:
So just to clarify for me: When specifying 'spark.yarn.jar' as I did
above, even if I don't use HDFS to create a
RDD (e.g. do something simple like: 'sc.parallelize(range(100))'), it is
still necessary to
On Mon, Sep 8, 2014 at 3:54 PM, Dimension Data, LLC.
subscripti...@didata.us wrote:
You're probably right about the above because, as seen *below* for
pyspark (but probably for other Spark
applications too), once '-Dspark.master=[yarn-client|yarn-cluster]' is
specified, the app invocation
Yes, that's how file: URLs are interpreted everywhere in Spark. (It's also
explained in the link to the docs I posted earlier.)
The second interpretation below is local: URLs in Spark, but that doesn't
work with Yarn on Spark 1.0 (so it won't work with CDH 5.1 and older
either).
On Mon, Sep 8,
This has all the symptoms of Yarn killing your executors due to them
exceeding their memory limits. Could you check your RM/NM logs to see
if that's the case?
(The error was because of an executor at
domU-12-31-39-0B-F1-D1.compute-1.internal, so you can check that NM's
log file.)
If that's the
Hi,
Yes, this is a problem, and I'm not aware of any simple workarounds
(or complex one for that matter). There are people working to fix
this, you can follow progress here:
https://issues.apache.org/jira/browse/SPARK-1239
On Tue, Sep 9, 2014 at 2:54 PM, jbeynon jbey...@gmail.com wrote:
I'm
You're using hadoopConf, a Configuration object, in your closure.
That type is not serializable.
You can use -Dsun.io.serialization.extendedDebugInfo=true to debug
serialization issues.
On Wed, Sep 10, 2014 at 8:23 AM, Sarath Chandra
sarathchandra.jos...@algofusiontech.com wrote:
Thanks Sean.
On Mon, Sep 8, 2014 at 11:15 PM, Sean Owen so...@cloudera.com wrote:
This structure is not specific to Hadoop, but in theory works in any
JAR file. You can put JARs in JARs and refer to them with Class-Path
entries in META-INF/MANIFEST.MF.
Funny that you mention that, since someone internally
On Wed, Sep 10, 2014 at 3:44 PM, Sean Owen so...@cloudera.com wrote:
What's the Hadoop jar structure in question then? Is it something special
like a WAR file? I confess I had never heard of this so thought this was
about generic JAR stuff.
What I've been told (and Steve's e-mail alludes to)
Hi chinchu,
Where does the code trying to read the file run? Is it running on the
driver or on some executor?
If it's running on the driver, in yarn-cluster mode, the file should
have been copied to the application's work directory before the driver
is started. So hopefully just doing new
You'll need to look at the driver output to have a better idea of
what's going on. You can use yarn logs --applicationId blah after
your app is finished (e.g. by killing it) to look at it.
My guess is that your cluster doesn't have enough resources available
to service the container request
:37 PM, Marcelo Vanzin van...@cloudera.com
wrote:
You'll need to look at the driver output to have a better idea of
what's going on. You can use yarn logs --applicationId blah after
your app is finished (e.g. by killing it) to look at it.
My guess is that your cluster doesn't have enough
, Sep 25, 2014 at 12:04 AM, Marcelo Vanzin van...@cloudera.com
wrote:
You need to use the command line yarn application that I mentioned
(yarn logs). You can't look at the logs through the UI after the app
stops.
On Wed, Sep 24, 2014 at 11:16 AM, Raghuveer Chanda
raghuveer.cha...@gmail.com wrote
Sounds like spark-01 is not resolving correctly on your machine (or
is the wrong address). Can you ping spark-01 and does that reach the
VM where you set up the Spark Master?
On Wed, Sep 24, 2014 at 1:12 PM, danilopds danilob...@gmail.com wrote:
Hello,
I'm learning about Spark Streaming and I'm
Hmmm, you might be suffering from SPARK-1719.
Not sure what the proper workaround is, but it sounds like your native
libs are not in any of the standard lib directories; one workaround
might be to copy them there, or add their location to /etc/ld.so.conf
(I'm assuming Linux).
On Thu, Sep 25,
Then I think it's time for you to look at the Spark Master logs...
On Thu, Sep 25, 2014 at 7:51 AM, danilopds danilob...@gmail.com wrote:
Hi Marcelo,
Yes, I can ping spark-01 and I also include the IP and host in my file
/etc/hosts.
My VM can ping the local machine too.
--
View this
On Thu, Sep 25, 2014 at 8:55 AM, jamborta jambo...@gmail.com wrote:
I am running spark with the default settings in yarn client mode. For some
reason yarn always allocates three containers to the application (wondering
where it is set?), and only uses two of them.
The default number of
You can pass the HDFS location of those extra jars in the spark-submit
--jars argument. Spark will take care of using Yarn's distributed
cache to make them available to the executors. Note that you may need
to provide the full hdfs URL (not just the path, since that will be
interpreted as a local
Comma separated list of archives to be
extracted into the
working directory of each executor.
On Thu, Sep 25, 2014 at 2:20 PM, Tamas Jambor jambo...@gmail.com wrote:
Thank you.
Where is the number of containers set?
On Thu, Sep 25, 2014 at 7:17 PM, Marcelo Vanzin van
I assume you did those things in all machines, not just on the machine
launching the job?
I've seen that workaround used successfully (well, actually, they
copied the library to /usr/lib or something, but same idea).
On Thu, Sep 25, 2014 at 7:45 PM, taqilabon g945...@gmail.com wrote:
You're
You can't set up the driver memory programatically in client mode. In
that mode, the same JVM is running the driver, so you can't modify
command line options anymore when initializing the SparkContext.
(And you can't really start cluster mode apps that way, so the only
way to set this is through
in a
few different contexts, but I don't think there's an official
solution yet.)
On Wed, Oct 1, 2014 at 9:59 AM, Tamas Jambor jambo...@gmail.com wrote:
thanks Marcelo.
What's the reason it is not possible in cluster mode, either?
On Wed, Oct 1, 2014 at 5:42 PM, Marcelo Vanzin van...@cloudera.com
No, you can't instantiate a SparkContext to start apps in cluster mode.
For Yarn, for example, you'd have to call directly into
org.apache.spark.deploy.yarn.Client; that class will tell the Yarn
cluster to launch the driver for you and then instantiate the
SparkContext.
On Wed, Oct 1, 2014 at
You may want to take a look at this PR:
https://github.com/apache/spark/pull/1558
Long story short: while not a terrible idea to show running
applications, your particular case should be solved differently.
Applications are responsible for calling SparkContext.stop() at the
end of their run,
Hi Anurag,
Spark SQL (from the Spark standard distribution / sources) currently
requires Hive 0.12; as you mention, CDH4 has Hive 0.10, so that's not
gonna work.
CDH 5.2 ships with Spark 1.1.0 and is modified so that Spark SQL can
talk to the Hive 0.13.1 that is also bundled with CDH, so if
Hi Greg,
I'm not sure exactly what it is that you're trying to achieve, but I'm
pretty sure those variables are not supposed to be set by users. You
should take a look at the documentation for
spark.driver.extraClassPath and spark.driver.extraLibraryPath, and
the equivalent options for executors.
Hi Eric,
Check the Debugging Your Application section at:
http://spark.apache.org/docs/latest/running-on-yarn.html
Long story short: upload your log4j.properties using the --files
argument of spark-submit.
(Mental note: we could make the log level configurable via a system property...)
On
Hi Philip,
The assemblies are part of the CDH distribution. You can get them here:
http://www.cloudera.com/content/cloudera/en/downloads/cdh/cdh-5-2-0.html
As of Spark 1.1 (and, thus, CDH 5.2), assemblies are not published to
maven repositories anymore (you can see commit [1] for details).
[1]
On top of what Andrew said, you shouldn't need to manually add the
mllib jar to your jobs; it's already included in the Spark assembly
jar.
On Thu, Oct 16, 2014 at 11:51 PM, eric wong win19...@gmail.com wrote:
Hi,
i using the comma separated style for submit multiple jar files in the
follow
Hi Ashwin,
Let me try to answer to the best of my knowledge.
On Wed, Oct 22, 2014 at 11:47 AM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
Here are my questions :
1. Sharing spark context : How exactly multiple users can share the cluster
using same spark
context ?
That's not
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
That's not something you might want to do usually. In general, a
SparkContext maps to a user application
My question was basically this. In this page in the official doc, under
Scheduling within an application
resource or 2)
add dynamic resource management for Yarn mode is very much wanted.
Jianshi
On Thu, Oct 23, 2014 at 5:36 AM, Marcelo Vanzin van...@cloudera.com wrote:
On Wed, Oct 22, 2014 at 2:17 PM, Ashwin Shankar
ashwinshanka...@gmail.com wrote:
That's not something you might want to do usually
Hello there,
This is more of a question for the cdh-users list, but in any case...
In CDH 5.1 we skipped packaging of the Hive module in SparkSQL. That
has been fixed in CDH 5.2, so if it's possible for you I'd recommend
upgrading.
On Thu, Oct 23, 2014 at 2:53 PM, nitinkak001
On Thu, Oct 23, 2014 at 3:40 PM, ankits ankitso...@gmail.com wrote:
2014-10-23 15:39:50,845 ERROR [] Exception in task 1.0 in stage 1.0 (TID 1)
java.io.IOException: org.apache.thrift.protocol.TProtocolException:
This looks like an exception that's happening on an executor and just
being
Actually, if you don't call SparkContext.stop(), the event log
information that is used by the history server will be incomplete, and
your application will never show up in the history server's UI.
If you don't use that functionality, then you're probably ok not
calling it as long as your
On Mon, Oct 27, 2014 at 7:37 PM, buring qyqb...@gmail.com wrote:
Here is error log,I abstract as follows:
INFO [binaryTest---main]: before first
WARN [org.apache.spark.scheduler.TaskSetManager---Result resolver
thread-0]: Lost task 0.0 in stage 0.0 (TID 0, spark-dev136):
Hello,
CDH 5.1.3 ships with a version of Hive that's not entirely the same as
the Hive Spark 1.1 supports. So when building your custom Spark, you
should make sure you change all the dependency versions to point to
the CDH versions.
IIRC Spark depends on org.spark-project.hive:0.12.0, you'd have
I haven't tried scala:cc, but you can ask maven to just build a
particular sub-project. For example:
mvn -pl :spark-examples_2.10 compile
On Sat, Nov 15, 2014 at 5:31 PM, Yiming (John) Zhang sdi...@gmail.com wrote:
Hi,
I have already successfully compile and run spark examples. My problem
1 - 100 of 482 matches
Mail list logo