Hey Darin,
Record count metrics are coming in Spark 1.3. Can you wait until it is
released? Or do you need a solution in older versions of spark.
Kostas
On Friday, February 27, 2015, Darin McBeath ddmcbe...@yahoo.com.invalid
wrote:
I have a fairly large Spark job where I'm essentially
The partitions parameter to textFile is the minPartitions. So there will
be at least that level of parallelism. Spark delegates to Hadoop to create
the splits for that file (yes, even for a text file on disk and not hdfs).
You can take a look at the code in FileInputFormat - but briefly it will
Yes, the driver has to be able to accept incoming connections. All the
executors connect back to the driver sending heartbeats, map status,
metrics. It is critical and I don't know of a way around it. You could look
into using something like the
https://github.com/spark-jobserver/spark-jobserver
Which Spark Job server are you talking about?
On Thu, Feb 5, 2015 at 8:28 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
Hi,
Can Spark Job Server be used for profiling Spark jobs?
, 2015 at 9:03 PM, Deep Pradhan pradhandeep1...@gmail.com
wrote:
I read somewhere about Gatling. Can that be used to profile Spark jobs?
On Fri, Feb 6, 2015 at 10:27 AM, Kostas Sakellis kos...@cloudera.com
wrote:
Which Spark Job server are you talking about?
On Thu, Feb 5, 2015 at 8:28 PM, Deep
Yes, there is no way right now to know how many stages a job will generate
automatically. Like Mark said, RDD#toDebugString will give you some info
about the RDD DAG and from that you can determine based on the dependency
types (Wide vs. narrow) if there is a stage boundary.
On Thu, Feb 5, 2015
Standalone mode does not support talking to a kerberized HDFS. If you want
to talk to a kerberized (secure) HDFS cluster i suggest you use Spark on
Yarn.
On Wed, Feb 4, 2015 at 2:29 AM, Jander g jande...@gmail.com wrote:
Hope someone helps me. Thanks.
On Wed, Feb 4, 2015 at 6:14 PM, Jander g
Kundan,
So I think your configuration here is incorrect. We need to adjust memory
and #executors. So for your case you have:
Cluster setup
5 nodes
16gb RAM
8 cores.
The number of executors should be the total number of nodes in your cluster
- in your case 5. As for --num-executor-cores it should
Hey,
If you are interested in more details there is also a thread about this
issue here:
http://apache-spark-developers-list.1001551.n3.nabble.com/Eliminate-copy-while-sending-data-any-Akka-experts-here-td7127.html
Kostas
On Tue, Sep 9, 2014 at 3:01 PM, jbeynon jbey...@gmail.com wrote:
Thanks