Plain Java MR, using the Cassandra inputFormat to read out of Cassandra.

Perhaps somebody hacked the inputFormat code on me...

But what's weird is that the parameter mapred.map.tasks didn't appear in
the job confs before at all.  Now it does, with a value of 20 (happens to
be the # of machines in the cluster), and that's without the jobs or the
mapred-site.xml files themselves changing.

The inputSplitSize is set specifically in the jobs, and has not been
changed (except I subsequently fiddled with it a little to see if it
affected the fact that I was getting 20 splits, and it didn't affect
that...just the split size, not the number).

After a submit the job, I get a message "TOTAL NUMBER OF SPLIT = 20",
before a list of the input splits...sort of looks like a hack but I can't
find where it is.

On Fri, Nov 4, 2011 at 11:58 AM, Harsh J <ha...@cloudera.com> wrote:

> Brendan,
>
> Are these jobs (whose split behavior has changed) via Hive/etc. or plain
> Java MR?
>
> In case its the former, do you have users using newer versions of them?
>
> On 04-Nov-2011, at 8:03 PM, Brendan W. wrote:
>
> > Hi,
> >
> > In the jobs running on my cluster of 20 machines, I used to run jobs (via
> > "hadoop jar ...") that would spawn around 4000 map tasks.  Now when I run
> > the same jobs, that number is 20; and I notice that in the job
> > configuration, the parameter mapred.map.tasks is set to 20, whereas it
> > never used to be present at all in the configuration file.
> >
> > Changing the input split size in the job doesn't affect this--I get the
> > size split I ask for, but the *number* of input splits is still capped at
> > 20--i.e., the job isn't reading all of my data.
> >
> > The mystery to me is where this parameter could be getting set.  It is
> not
> > present in the mapred-site.xml file in <hadoop home>/conf on any machine
> in
> > the cluster, and it is not being set in the job (I'm running out of the
> > same jar I always did; no updates).
> >
> > Is there *anywhere* else this parameter could possibly be getting set?
> > I've stopped and restarted map-reduce on the cluster with no
> effect...it's
> > getting re-read in from somewhere, but I can't figure out where.
> >
> > Thanks a lot,
> >
> > Brendan
>
>

Reply via email to