Re: Solr & HBase - Re: How is Data Indexed in HBase?

2012-02-23 Thread T Vinod Gupta
regarding your question on hbase support for high performance and
consistency - i would say hbase is highly scalable and performant. how it
does what it does can be understood by reading relevant chapters around
architecture and design in the hbase book.

with regards to ranking, i see your problem. but if you split the problem
into hbase specific solution and solr based solution, you can achieve the
results probably. may be you do the ranking and store the rank in hbase and
then use solr to get the results and then use hbase as a lookup to get the
rank. or you can put the rank as part of the document schema and index the
rank too for range queries and such. is my understanding of your scenario
wrong?

thanks

On Wed, Feb 22, 2012 at 9:51 AM, Bing Li  wrote:

> Mr Gupta,
>
> Thanks so much for your reply!
>
> In my use cases, retrieving data by keyword is one of them. I think Solr
> is a proper choice.
>
> However, Solr does not provide a complex enough support to rank. And,
> frequent updating is also not suitable in Solr. So it is difficult to
> retrieve data randomly based on the values other than keyword frequency in
> text. In this case, I attempt to use HBase.
>
> But I don't know how HBase support high performance when it needs to keep
> consistency in a large scale distributed system.
>
> Now both of them are used in my system.
>
> I will check out ElasticSearch.
>
> Best regards,
> Bing
>
>
> On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta wrote:
>
>> Bing,
>> Its a classic battle on whether to use solr or hbase or a combination of
>> both. both systems are very different but there is some overlap in the
>> utility. they also differ vastly when it compares to computation power,
>> storage needs, etc. so in the end, it all boils down to your use case. you
>> need to pick the technology that it best suited to your needs.
>> im still not clear on your use case though.
>>
>> btw, if you haven't started using solr yet - then you might want to
>> checkout ElasticSearch. I spent over a week researching between solr and ES
>> and eventually chose ES due to its cool merits.
>>
>> thanks
>>
>>
>> On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu  wrote:
>>
>>> There is no secondary index support in HBase at the moment.
>>>
>>> It's on our road map.
>>>
>>> FYI
>>>
>>> On Wed, Feb 22, 2012 at 9:28 AM, Bing Li  wrote:
>>>
>>> > Jacques,
>>> >
>>> > Yes. But I still have questions about that.
>>> >
>>> > In my system, when users search with a keyword arbitrarily, the query
>>> is
>>> > forwarded to Solr. No any updating operations but appending new indexes
>>> > exist in Solr managed data.
>>> >
>>> > When I need to retrieve data based on ranking values, HBase is used.
>>> And,
>>> > the ranking values need to be updated all the time.
>>> >
>>> > Is that correct?
>>> >
>>> > My question is that the performance must be low if keeping consistency
>>> in a
>>> > large scale distributed environment. How does HBase handle this issue?
>>> >
>>> > Thanks so much!
>>> >
>>> > Bing
>>> >
>>> >
>>> > On Thu, Feb 23, 2012 at 1:17 AM, Jacques  wrote:
>>> >
>>> > > It is highly unlikely that you could replace Solr with HBase.
>>>  They're
>>> > > really apples and oranges.
>>> > >
>>> > >
>>> > > On Wed, Feb 22, 2012 at 1:09 AM, Bing Li  wrote:
>>> > >
>>> > >> Dear all,
>>> > >>
>>> > >> I wonder how data in HBase is indexed? Now Solr is used in my system
>>> > >> because data is managed in inverted index. Such an index is
>>> suitable to
>>> > >> retrieve unstructured and huge amount of data. How does HBase deal
>>> with
>>> > >> the
>>> > >> issue? May I replaced Solr with HBase?
>>> > >>
>>> > >> Thanks so much!
>>> > >>
>>> > >> Best regards,
>>> > >> Bing
>>> > >>
>>> > >
>>> > >
>>> >
>>>
>>
>>
>


Re: correct usage of StreamingUpdateSolrServer?

2012-02-10 Thread T Vinod Gupta
here is how i was playing with it..

StreamingUpdateSolrServer solrServer = new
StreamingUpdateSolrServer("http://localhost:8983/solr/";, 10, 1);

SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField( "pk_id", "id1");
doc1.addField("doc_type", "content");
doc1.addField( "id", "1");
doc1.addField( "content_text", "hello world" );

Collection docs = new
ArrayList();
docs.add(doc1);
solrServer.add(docs);
solrServer.commit();

thanks

On Fri, Feb 10, 2012 at 7:41 AM, Erick Erickson wrote:

> Can you post the code? SUSS should essentially be a drop-in
> replacement for CHSS.
>
> It's not advisable to commit after every add, it's usually better
> to use commitWithin, and perhaps commit at the very end of
> the run.
>
> Best
> Erick
>
> On Thu, Feb 9, 2012 at 4:00 PM, T Vinod Gupta 
> wrote:
> > Hi,
> > I wrote a hello world program to add documents to solr server. When I
> > use CommonsHttpSolrServer, the program exits but when I
> > use StreamingUpdateSolrServer, the program never exits. And I couldn't
> find
> > a way to close it? Are there any best practices here? Do I have to do
> > anything differently at the time of documents adds/updates when
> > using StreamingUpdateSolrServer? I am following the add/commit cycle. Is
> > that ok?
> >
> > thanks
>


correct usage of StreamingUpdateSolrServer?

2012-02-09 Thread T Vinod Gupta
Hi,
I wrote a hello world program to add documents to solr server. When I
use CommonsHttpSolrServer, the program exits but when I
use StreamingUpdateSolrServer, the program never exits. And I couldn't find
a way to close it? Are there any best practices here? Do I have to do
anything differently at the time of documents adds/updates when
using StreamingUpdateSolrServer? I am following the add/commit cycle. Is
that ok?

thanks


linking documents in solr

2012-02-08 Thread T Vinod Gupta
hi,
I have a question around documents linking in solr and want to know if its
possible. lets say i have a set of blogs and their authors that i want to
index seperately. is it possible to link a document describing a blog to
another document describing an author? if yes, can i search for blogs with
filters on attributes of the author? if yes, if i update an attribute of an
author (by its id), then will the search results reflect the updated
attribute(s)?

thanks


can solr server be pointed to lucene index?

2012-01-29 Thread T Vinod Gupta
hi,
i am really new to solr/lucene and doing some experiments.. i have a
question - if i create an index using lucene, can i use solr to query
against that index? if yes, how do i setup solr?

i already have a lucene index. i just copied over the index dir as /examples/solr/data. But that is not making any
difference. where am i going wrong?

thanks