Hallo, my question is more on the architecture side of my program. *General view:* I have a huge HBase table containing thousands of rows. Each row contains an ID of a node and its geographical location. A single region of the table contains approximately 10 000 rows.
*Aim*: I would like to calculate the distance between each pair of nodes. Meaning that a task responsible of a region of 10 000 nodes needs to read 10 000*10 000 times. *My architecture:* I have created two scanners A and B. The scanner A points of the source and the scanner B scans all the destination points. Meaning that, the scanner A at the beginning points of the first row of the region and the scanner B scans the rest of the nodes. Once done, The scanner A passes to the second node and again B scans all the nodes. That's how I calculate all the pair distances. *My problem:* I had a problem that the scanner A was timing out because the processing takes time until it passes to the next row, so I have incremented the value of the lease time, this was helpful for a region of 1000 nodes but not for 10 000 nodes. *My question:* 1-I feel that this value should not just go up and up because my processing is heavy, or not? Will it have some side effects if it becomes large? 2-Shouldn't I change the structure or the idea of my program? Can someone give me a hint of how this is possible? Thank you
