[ 
https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732172#action_12732172
 ] 

Andrzej Bialecki  commented on NUTCH-650:
-----------------------------------------

"nutchbase" is ok for now, although it sounds cryptic. +1 on importing and 
closing this issue.

I don't believe OPIC scoring can work well, even if we implement it as intended 
- the dynamic nature of the webgraph is IMHO not properly addressed even in the 
original paper (authors propose a smoothing schema based on a history of past 
values). In my opinion we should strive to create a more elegant scoring API 
than the current one (which owes much to the way Nutch passed bits of data 
between different data stores), and use PageRank as the default.

Re: use of Katta for distributed indexing - let's discuss this on the list.

> Hbase Integration
> -----------------
>
>                 Key: NUTCH-650
>                 URL: https://issues.apache.org/jira/browse/NUTCH-650
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 1.0.0
>            Reporter: Doğacan Güney
>            Assignee: Doğacan Güney
>             Fix For: 1.1
>
>         Attachments: hbase-integration_v1.patch, hbase_v2.patch, 
> malformedurl.patch, meta.patch, meta2.patch, nofollow-hbase.patch, 
> nutch-habase.patch, searching.diff, slash.patch
>
>
> This issue will track nutch/hbase integration

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to