It is not restrict to unique in the real problem, it can be relaxed to a few number of different hits. Maybe I over simplified the problem in the previous email. But lets consider the unique pattern first. The problem is as follows, to search the splits of a give string (which comes from a parametric input) such as "ABCD" in the reducer generated file:

   ****
  ABCD     A    -B     -CD
  ABCD     A    -BC  -D
  ABCD     AB  -C    -D
  ****

The **** represents a huge number of other strings. And I want to know by searching this file whether ABCD is possible to be split as a given pattern. In another function, four parameters will come in: p1, p2, p3, p4. If I get the input as p1=ABCD, p2=A, p3=BC, p4=D, I will search for the record
ABCD   A  -BC -D    and will find this record.

If I get the input as p1=ABCD, p2=A, p3=ABC, p4=D, I will search again and that split is not found.

I want to return 0 when the split pattern is not found, and 1 when the split pattern is found. In order to smooth away from the tricky value 0, I return <ABCD, 0.000001> and <ABCD, 0.999999>...

Since the string I search ("ABCD") is a dynamic input parameter and the search could be triggered millions of times, my feel is maybe not wise to write to HBase and to load them back ... any advice is highly appreciated.

Thanks,

Shi



On 2010-9-22 16:32, Steve Lewis wrote:
what distinguishes this record and will every mapper know it?
It sounds like all you need to do is ignore non-matching records and  then
run other code in the mapper - I am assuming across all mappers the code
only runs once.

On Wed, Sep 22, 2010 at 2:06 PM, Shi Yu<sh...@uchicago.edu>  wrote:

Dear Hadoopers,

I am stuck at a probably very simple problem but can't figure it out. In
the Hadoop Map/Reduce framework, I want to search a huge file (which is
generated by another Reduce task) for a unique line of record (a<String,
double>  value actually). That record is expected to be passed to another
function. I have read the previous post about using Mapper only output to
HBase (
http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg06579.html)
and another post (
http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg07337.html).
They are both very interesting, however, I am still confused about how to
circle away from writing to HBase, but to use the returned record directly
from memory? I guess my problem doesn't need a reducer, so basically
load-balance the search task via multiple Mappers. I want to have something
like this

   class myClass
          method seekResultbyMapper (string toSearch, path reduceFile)
               call Map(a,b)
               do some simple calculation
               return<String, double>  result

    class anotherClass
<String, double>   para  =  myClass.seekResultbyMapper (c,d)


I don't know whether this is doable (maybe it is not a valid style in
Map/Reduce framework)? How to implement it using JAVA API? Thanks for any
suggestion in advance.


Best Regards,

Shi

--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799





--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Reply via email to