Re: multiple input splits from single file

2012-06-10 Thread Karthik Kambatla
Hi Sharat

A couple of questions/comments:

   1. Is your input graph complete?
   2. If it is not complete, it might make sense to use adjacency lists of
   graph nodes as the input to each map function. (Multiple adjacency lists
   for the map task)
   3. Even if it is complete, using the adjacency lists - or a partition of
   the edges as input for each map task might help.

Karthik

On Sun, Jun 10, 2012 at 8:02 AM, sharat attupurath wrote:

>  Hi,
>
> We are trying to solve the travelling salesman problem using hadoop. our
> input files contain just a single line that has the euclidean coordinates
> of the cities. we need to pass this single line to each mapper who will
> then process that. How can we do this so that we can achieve parallelism in
> a hadoop cluster. Is there any way to generate multiple input splits from
> the single input file.
>
> Thanks
>
> Sharat
>


Re: multiple input splits from single file

2012-06-10 Thread Harsh J
Sharat,

To answer your specific question of:

> Is there any way to generate multiple input splits from the single input file.

Yes there is. Use the NLineInputFormat class, with an N value of 1.
You should then, for a single file of N lines (dupe or not), get N map
tasks.

On Sun, Jun 10, 2012 at 8:32 PM, sharat attupurath  wrote:
> Hi,
>
> We are trying to solve the travelling salesman problem using hadoop. our
> input files contain just a single line that has the euclidean coordinates of
> the cities. we need to pass this single line to each mapper who will then
> process that. How can we do this so that we can achieve parallelism in a
> hadoop cluster. Is there any way to generate multiple input splits from the
> single input file.
>
> Thanks
>
> Sharat



-- 
Harsh J


multiple input splits from single file

2012-06-10 Thread sharat attupurath

Hi,

We are trying to solve the travelling salesman problem using hadoop. our input 
files contain just a single line that has the euclidean coordinates of the 
cities. we need to pass this single line to each mapper who will then process 
that. How can we do this so that we can achieve parallelism in a hadoop 
cluster. Is there any way to generate multiple input splits from the single 
input file.

Thanks

Sharat