From my understanding of the problem, you can
- keep the image binary data in sequence files
- copy the image whose similar images will searched to dfs with high
replication.
- in each map, calculate the similarity to the image
- output only the similar images from the map.
- no need a reduce step.
I am not sure whether splitting the image into 4 and analyzing the parts
individually will make
any change, since the above alg. already distributes the computation to
all nodes.
Raşit Özdaş wrote:
Hi to all, I'm a new subscriber of the group, I started to work on a hadoop-based project.
In our application, there are a huge number of images with a regular pattern, differing in 4 parts/blocks.
System takes an image as input and looks for a similar image, considering if
all these 4 parts match.
(System finds all the matches, even after finding one).
Each of these parts are independent, result of each part computed separately, these are
printed on the screen and then an average matching percentage is calculated from these.
(I can write more detailed information if needed)
Could you suggest a structure? or any ideas to have a better result?
Images can be divided into 4 parts, I see that. But folder structure of images
are important and
I have no idea with that. Images are kept in DB (can be changed, if folder
structure is better)
Is two stage of map-reduce operations better? First, one map-reduce for each image,
then a second map-reduce for every part of one image.
But as far as I know, the slowest computation slows down whole operation.
This is where I am now.
Thanks in advance..