Re: return a parameter using Map only

Shi Yu Wed, 22 Sep 2010 15:08:48 -0700

It is not restrict to unique in the real problem, it can be relaxed to afew number of different hits. Maybe I over simplified the problem in theprevious email. But lets consider the unique pattern first. The problemis as follows, to search the splits of a give string (which comes from aparametric input) such as "ABCD" in the reducer generated file:


   ****
  ABCD     A    -B     -CD
  ABCD     A    -BC  -D
  ABCD     AB  -C    -D
  ****

The **** represents a huge number of other strings. And I want to knowby searching this file whether ABCD is possible to be split as a givenpattern. In another function, four parameters will come in: p1, p2, p3,p4. If I get the input as p1=ABCD, p2=A, p3=BC, p4=D, I will search forthe record

ABCD   A  -BC -D    and will find this record.

If I get the input as p1=ABCD, p2=A, p3=ABC, p4=D, I will search againand that split is not found.

I want to return 0 when the split pattern is not found, and 1 when thesplit pattern is found. In order to smooth away from the tricky value 0,I return <ABCD, 0.000001> and <ABCD, 0.999999>...

Since the string I search ("ABCD") is a dynamic input parameter and thesearch could be triggered millions of times, my feel is maybe not wiseto write to HBase and to load them back ... any advice is highlyappreciated.


Thanks,

Shi



On 2010-9-22 16:32, Steve Lewis wrote:

what distinguishes this record and will every mapper know it?
It sounds like all you need to do is ignore non-matching records and  then
run other code in the mapper - I am assuming across all mappers the code
only runs once.

On Wed, Sep 22, 2010 at 2:06 PM, Shi Yu<sh...@uchicago.edu>  wrote:

Dear Hadoopers,

I am stuck at a probably very simple problem but can't figure it out. In
the Hadoop Map/Reduce framework, I want to search a huge file (which is
generated by another Reduce task) for a unique line of record (a<String,
double>  value actually). That record is expected to be passed to another
function. I have read the previous post about using Mapper only output to
HBase (
http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg06579.html)
and another post (
http://www.mail-archive.com/hbase-u...@hadoop.apache.org/msg07337.html).
They are both very interesting, however, I am still confused about how to
circle away from writing to HBase, but to use the returned record directly
from memory? I guess my problem doesn't need a reducer, so basically
load-balance the search task via multiple Mappers. I want to have something
like this

   class myClass
          method seekResultbyMapper (string toSearch, path reduceFile)
               call Map(a,b)
               do some simple calculation
               return<String, double>  result

    class anotherClass
<String, double>   para  =  myClass.seekResultbyMapper (c,d)


I don't know whether this is doable (maybe it is not a valid style in
Map/Reduce framework)? How to implement it using JAVA API? Thanks for any
suggestion in advance.


Best Regards,

Shi

--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799



--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Re: return a parameter using Map only

Reply via email to