Hallo,
my question is more on the architecture side of my program.

*General view:*
I have a huge HBase table containing thousands of rows. Each row contains an
ID of a node and its geographical location.
A single region of the table contains approximately 10 000 rows.

*Aim*:
I would like to calculate the distance between each pair of nodes. Meaning
that a task responsible of a region of 10 000 nodes needs to read
10 000*10 000 times.

*My architecture:*
I have created two scanners A and B. The scanner A points of the source and
the scanner B scans all the destination points. Meaning that, the scanner A
at the beginning points of the first row of the region and the scanner B
scans the rest of the nodes. Once done, The scanner A passes to the second
node and again B scans all the nodes. That's how I calculate all the pair
distances.

*My problem:*
I had a problem that the scanner A was timing out because the processing
takes time until it passes to the next row, so I have incremented the value
of the lease time, this was helpful for a region of 1000 nodes but not for
10 000 nodes.

*My question:*
1-I feel that this value should not just go up and up because my processing
is heavy, or not? Will it have some side effects if it becomes large?

2-Shouldn't I change the structure or the idea of my program? Can someone
give me a hint of how this is possible?

Thank you

Reply via email to