Re: Image indexing/searching with Hadoop and MPI
> Ok I can understand your point - but I am sure that some people have been > trying to use map-reduce programming model to do CFD, or any other > scientific computing. > Any experience in this area from the list ? I know of one project that assumes it has an entire Hadoop cluster, and generates the hostnames in the Mapper and uses those host lists in the Reducer to launch an MPI job. They do it because it provides a higher efficiency for doing very small data transfers. The alternative was doing a long chain of map/reduce jobs that have very small outputs from each phase. I wouldn't recommend using MPI under map/reduce in general, since it involves making a lot of assumptions about your application. In particular, to avoid from killing your cluster your shouldn't use checkpoints in your application and just rerun the application from the beginning on failures. That implies that the application can't run very long (upper bound of probably 30 minutes on 2000 nodes). That said, if you want to run other styles of applications, you really want a two level scheduler. Where the first level scheduler allocates nodes (or partial nodes) to jobs (or frameworks). Effectively, that is what Hadoop On Demand (HOD) was doing with Torque, but I suspect there will be a more performant solution than HOD with in the next year. -- Owen
Re: Image indexing/searching with Hadoop and MPI
On Wed, Jun 3, 2009 at 5:17 PM, Edward J. Yoon wrote: > > This is a kind of newbie question (at least as far as Hadoop is > concerned). > > I was wondering if they were any Hadoop based project around dealing with > > Image indexing and searching ? We are working is this area and might be > > interesting to have a look in such a project. > > There is a text-search engine library, called lucene. See also the > nutch project. Otherwise, Did you mean something like content-based > image indexing and searching usig image attributes, such as, color, > texture, and etc., not the text of image tag? Yes this is exactly what I mean, I am looking at a project doing content-based image indexing using for example GIST, BOF, ... Does such a project exist ? > > > I think the MPI programming isn't suitable for the concept of > distributed hdfs and map/reduce programming system, since MPI requires > the heavy communication among the nodes. Ok I can understand your point - but I am sure that some people have been trying to use map-reduce programming model to do CFD, or any other scientific computing. Any experience in this area from the list ? Cheers Guillaume
Re: Image indexing/searching with Hadoop and MPI
> This is a kind of newbie question (at least as far as Hadoop is concerned). > I was wondering if they were any Hadoop based project around dealing with > Image indexing and searching ? We are working is this area and might be > interesting to have a look in such a project. There is a text-search engine library, called lucene. See also the nutch project. Otherwise, Did you mean something like content-based image indexing and searching usig image attributes, such as, color, texture, and etc., not the text of image tag? > Second question is dealing with scientific computing with Haddop. Does > anyone has try to use Hadoop to parallelize a scientific application ? I > know there is Hama but it does not seem very active these days (I might be > wrong ;) ) > Some time ago, I heard of an attempt of implementing some MPI implementation > on top of Hadoop , was it really the plan, is there any update ? > Anyway, I would be interested in any paper/fedeback on the performance of > scientific application running on large clusters using Hadoop. I think the MPI programming isn't suitable for the concept of distributed hdfs and map/reduce programming system, since MPI requires the heavy communication among the nodes. FYI, In hama, currently the basic matrix operations are implemented based on the map/reduce programming model. For example, the matrix get/set methods, the matrix norms, matrix-matrix multiplication/addition, matrix transpose. In near future, SVD, Eigenvalue decomposition and some graph algorithms will be implemented. All the operations are sequentially executed. Thanks. On Wed, Jun 3, 2009 at 5:32 PM, tog wrote: > Hi there, > > This is a kind of newbie question (at least as far as Hadoop is concerned). > I was wondering if they were any Hadoop based project around dealing with > Image indexing and searching ? We are working is this area and might be > interesting to have a look in such a project. > Second question is dealing with scientific computing with Haddop. Does > anyone has try to use Hadoop to parallelize a scientific application ? I > know there is Hama but it does not seem very active these days (I might be > wrong ;) ) > Some time ago, I heard of an attempt of implementing some MPI implementation > on top of Hadoop , was it really the plan, is there any update ? > Anyway, I would be interested in any paper/fedeback on the performance of > scientific application running on large clusters using Hadoop. > > Best Regards > Guillaume > -- Best Regards, Edward J. Yoon @ NHN, corp. edwardy...@apache.org http://blog.udanax.org