The cluster runs Mesos and I can see the tasks in the Mesos UI but most are
not doing much - any hints about that UI

On Fri, Nov 14, 2014 at 11:39 AM, Daniel Siegmann <daniel.siegm...@velos.io>
wrote:

> Most of the information you're asking for can be found on the Spark web UI
> (see here <http://spark.apache.org/docs/1.1.0/monitoring.html>). You can
> see which tasks are being processed by which nodes.
>
> If you're using HDFS and your file size is smaller than the HDFS block
> size you will only have one partition (remember, there is exactly one task
> for each partition in a stage). If you want to force it to have more
> partitions, you can call RDD.repartition(numPartitions). Note that this
> will introduce a shuffle you wouldn't otherwise have.
>
> Also make sure your job is allocated more than one core in your cluster
> (you can see this on the web UI).
>
> On Fri, Nov 14, 2014 at 2:18 PM, Steve Lewis <lordjoe2...@gmail.com>
> wrote:
>
>>  I have instrumented word count to track how many machines the code runs
>> on. I use an accumulator to maintain a Set or MacAddresses. I find that
>> everything is done on a single machine. This is probably optimal for word
>> count but not the larger problems I am working on.
>> How to a force processing to be split into multiple tasks. How to I
>> access the task and attempt numbers to track which processing happens in
>> which attempt. Also is using MacAddress to determine which machine is
>> running the code.
>> As far as I can tell a simple word count is running in one thread on  one
>> machine and the remainder of the cluster does nothing,
>> This is consistent with tests where I write to sdout from functions and
>> see little output on most machines in the cluster
>>
>>
>
>
>
> --
> Daniel Siegmann, Software Developer
> Velos
> Accelerating Machine Learning
>
> 54 W 40th St, New York, NY 10018
> E: daniel.siegm...@velos.io W: www.velos.io
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Reply via email to