The cluster runs Mesos and I can see the tasks in the Mesos UI but most are not doing much - any hints about that UI
On Fri, Nov 14, 2014 at 11:39 AM, Daniel Siegmann <daniel.siegm...@velos.io> wrote: > Most of the information you're asking for can be found on the Spark web UI > (see here <http://spark.apache.org/docs/1.1.0/monitoring.html>). You can > see which tasks are being processed by which nodes. > > If you're using HDFS and your file size is smaller than the HDFS block > size you will only have one partition (remember, there is exactly one task > for each partition in a stage). If you want to force it to have more > partitions, you can call RDD.repartition(numPartitions). Note that this > will introduce a shuffle you wouldn't otherwise have. > > Also make sure your job is allocated more than one core in your cluster > (you can see this on the web UI). > > On Fri, Nov 14, 2014 at 2:18 PM, Steve Lewis <lordjoe2...@gmail.com> > wrote: > >> I have instrumented word count to track how many machines the code runs >> on. I use an accumulator to maintain a Set or MacAddresses. I find that >> everything is done on a single machine. This is probably optimal for word >> count but not the larger problems I am working on. >> How to a force processing to be split into multiple tasks. How to I >> access the task and attempt numbers to track which processing happens in >> which attempt. Also is using MacAddress to determine which machine is >> running the code. >> As far as I can tell a simple word count is running in one thread on one >> machine and the remainder of the cluster does nothing, >> This is consistent with tests where I write to sdout from functions and >> see little output on most machines in the cluster >> >> > > > > -- > Daniel Siegmann, Software Developer > Velos > Accelerating Machine Learning > > 54 W 40th St, New York, NY 10018 > E: daniel.siegm...@velos.io W: www.velos.io > -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com