Hi,
I'd like to know where I could find more information related to the
depreciation of the actor system in spark (from 1.4.x).
I'm interested in the reasons for this decision,
Cheers
--
View this message in context:
You'd need to provide information such as executor configuration (#cores,
memory size). You might have less scheduler delay with smaller, but more
numerous executors, than the contrary.
--
View this message in context:
If you want to exploit properly the 8 nodes of your cluster, you should use ~
2 times that number for partitioning.
You can specify the number of partitions when calling parallelize, as
following:
JavaRDDPoint pnts = sc.parallelize(points, 16);
--
View this message in context:
Hi,
I have several issues related to HDFS, that may have different roots. I'm
posting as much information as I can, with the hope that I can get your
opinion on at least some of them. Basically the cases are:
- HDFS classes not found
- Connections with some datanode seems to be slow/
can you please share your application code?
I suspect that you're not making a good use of the cluster by configuring a
wrong number of partitions in your RDDs.
--
View this message in context:
Q1: You can change the port number on the master in the file
conf/spark-defaults.conf. I don't know what will be the impact on a cloudera
distro thought.
Q2: Yes: a Spark worker needs to be present on each node which you want to
make available to the driver.
Q3: You can submit an application
I'm using hadoop 2.5.2 with spark 1.4.0 and I can also see in my logs:
15/07/09 06:39:02 DEBUG HadoopRDD: SplitLocationInfo and other new Hadoop
classes are unavailable. Using the older Hadoop location info code.
java.lang.ClassNotFoundException:
Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
and higher from the official website.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367p23770.html
Sent from the
I think the properties that you have in your hdfs-site.xml should go in the
core-site.xml (at least for the namenode.name and datanote.data ones). I
might be wrong here, but that's what I have in my setup.
you should also add hadoop.tmp.dir in your core-site.xml. That might be the
source of your
Hi,
I've been compiling spark 1.4.0 with SBT, from the source tarball available
on the official website. I cannot run spark's master, even tho I have built
and run several other instance of spark on the same machine (spark 1.3,
master branch, pre built 1.4, ...)
/starting
Can you share your hadoop configuration file please?
- etc/hadoop/core-site.xml
- etc/hadoop/hdfs-site.xml
- etc/hadoop/hadoo-env.sh
AFAIK, the following properties should be configured:
hadoop.tmp.dir, dfs.namenode.name.dir, dfs.datanode.data.dir and
dfs.namenode.checkpoint.dir
Otherwise, an
Hi there,
I have some traces from my master and some workers where for some reason,
the ./work directory of an application can not be created on the workers.
There is also an issue with the master's temp directory creation.
master logs: http://pastebin.com/v3NCzm0u
worker's logs:
You can see the amount of memory consumed by each executor in the web ui (go
to the application page, and click on the executor tab).
Otherwise, for a finer grained monitoring, I can only think of correlating a
system monitoring tool like Ganglia, with the event timeline of your job.
--
View
Is it possible to recreate the same views given in the webui for completed
applications, when rebooting the master, thanks to the log files? I just
tried to change the url of the form
http://w.x.y.z:8080/history/app-2-0036, by giving the appID, but it
redirected me on the master's
Basically, here's a dump of the SO question I opened
(http://stackoverflow.com/questions/31033724/spark-1-4-0-java-lang-nosuchmethoderror-com-google-common-base-stopwatch-elapse)
I'm using spark 1.4.0 and when running the Scala SparkPageRank example
I'm wondering if there is a real benefit for splitting my memory in two for
the datanode/workers.
Datanodes and OS needs memory to perform their business. I suppose there
could be loss of performance if they came to compete for memory with the
worker(s).
Any opinion? :-)
--
View this message
You can specify the jars of your application to be included with spark-submit
with the /--jars/ switch.
Otherwise, are you sure that your newly compiled spark jar assembly is in
assembly/target/scala-2.10/?
--
View this message in context:
For 1)
In standalone mode, you can increase the worker's resource allocation in
their local conf/spark-env.sh with the following variables:
SPARK_WORKER_CORES,
SPARK_WORKER_MEMORY
At application submit time, you can tune the number of resource allocated to
executors with /--executor-cores/ and
Also, still for 1), in conf/spark-defaults.sh, you can give the following
arguments to tune the Driver's resources:
spark.driver.cores
spark.driver.memory
Not sure if you can pass them at submit time, but it should be possible.
--
View this message in context:
Note that this property is only available for YARN
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Determining-number-of-executors-within-RDD-tp15554p23256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Actually this is somehow confusing for two reasons:
- First, the option 'spark.executor.instances', which seems to be only dealt
with in the case of YARN in the source code of SparkSubmit.scala, is also
present in the conf/spark-env.sh file under the standalone section, which
would indicate that
You should try, from the SparkConf object, to issue a get.
I don't have the exact name for the matching key, but from reading the code
in SparkSubmit.scala, it should be something like:
conf.get(spark.executor.instances)
--
View this message in context:
If I read the code correctly, in RDD.scala, each rdd keeps track of it's own
dependencies, (from Dependency.scala), and has methods to access to it's
/ancestors/ dependencies, thus being able to recompute the lineage (see
getNarrowAncestors() or getDependencies() in some rdd like UnionRDD).
So it
23 matches
Mail list logo