Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-23 Thread T Vinod Gupta
regarding your question on hbase support for high performance and
consistency - i would say hbase is highly scalable and performant. how it
does what it does can be understood by reading relevant chapters around
architecture and design in the hbase book.

with regards to ranking, i see your problem. but if you split the problem
into hbase specific solution and solr based solution, you can achieve the
results probably. may be you do the ranking and store the rank in hbase and
then use solr to get the results and then use hbase as a lookup to get the
rank. or you can put the rank as part of the document schema and index the
rank too for range queries and such. is my understanding of your scenario
wrong?

thanks

On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote:

 Mr Gupta,

 Thanks so much for your reply!

 In my use cases, retrieving data by keyword is one of them. I think Solr
 is a proper choice.

 However, Solr does not provide a complex enough support to rank. And,
 frequent updating is also not suitable in Solr. So it is difficult to
 retrieve data randomly based on the values other than keyword frequency in
 text. In this case, I attempt to use HBase.

 But I don't know how HBase support high performance when it needs to keep
 consistency in a large scale distributed system.

 Now both of them are used in my system.

 I will check out ElasticSearch.

 Best regards,
 Bing


 On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query
 is
  forwarded to Solr. No any updating operations but appending new indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping consistency
 in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.
  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my system
   because data is managed in inverted index. Such an index is
 suitable to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 






Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-23 Thread Bing Li
Dear Mr Gupta,

Your understanding about my solution is correct. Now both HBase and Solr
are used in my system. I hope it could work.

Thanks so much for your reply!

Best regards,
Bing

On Fri, Feb 24, 2012 at 3:30 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 regarding your question on hbase support for high performance and
 consistency - i would say hbase is highly scalable and performant. how it
 does what it does can be understood by reading relevant chapters around
 architecture and design in the hbase book.

 with regards to ranking, i see your problem. but if you split the problem
 into hbase specific solution and solr based solution, you can achieve the
 results probably. may be you do the ranking and store the rank in hbase and
 then use solr to get the results and then use hbase as a lookup to get the
 rank. or you can put the rank as part of the document schema and index the
 rank too for range queries and such. is my understanding of your scenario
 wrong?

 thanks


 On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote:

 Mr Gupta,

 Thanks so much for your reply!

 In my use cases, retrieving data by keyword is one of them. I think Solr
 is a proper choice.

 However, Solr does not provide a complex enough support to rank. And,
 frequent updating is also not suitable in Solr. So it is difficult to
 retrieve data randomly based on the values other than keyword frequency in
 text. In this case, I attempt to use HBase.

 But I don't know how HBase support high performance when it needs to keep
 consistency in a large scale distributed system.

 Now both of them are used in my system.

 I will check out ElasticSearch.

 Best regards,
 Bing


 On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query
 is
  forwarded to Solr. No any updating operations but appending new
 indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping
 consistency in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.
  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my
 system
   because data is managed in inverted index. Such an index is
 suitable to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 







How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Dear all,

I wonder how data in HBase is indexed? Now Solr is used in my system
because data is managed in inverted index. Such an index is suitable to
retrieve unstructured and huge amount of data. How does HBase deal with the
issue? May I replaced Solr with HBase?

Thanks so much!

Best regards,
Bing


Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Bing Li
Mr Gupta,

Thanks so much for your reply!

In my use cases, retrieving data by keyword is one of them. I think Solr is
a proper choice.

However, Solr does not provide a complex enough support to rank. And,
frequent updating is also not suitable in Solr. So it is difficult to
retrieve data randomly based on the values other than keyword frequency in
text. In this case, I attempt to use HBase.

But I don't know how HBase support high performance when it needs to keep
consistency in a large scale distributed system.

Now both of them are used in my system.

I will check out ElasticSearch.

Best regards,
Bing


On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query is
  forwarded to Solr. No any updating operations but appending new indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping consistency
 in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my system
   because data is managed in inverted index. Such an index is suitable
 to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing
  
  
  
 





Re: Solr HBase - Re: How is Data Indexed in HBase?

2012-02-22 Thread Jacques
 Solr does not provide a complex enough support to rank.
I believe Solr has a bunch of plug-ability to write your own custom ranking
approach.  If you think you can't do your desired ranking with Solr, you're
probably wrong and need to ask for help from the Solr community.

 retrieving data by keyword is one of them. I think Solr is a proper
choice
The key to keyword retrieval is the construction of the data.  Among other
things, this is one of the key things that Solr is very good at: creating a
very efficient organization of the data so that you can retrieve quickly.
 At their core, Solr, ElasticSearch, Lily and Katta all use Lucene to
construct this data.  HBase is bad at this.

 how HBase support high performance when it needs to keep consistency in
a large scale distributed system
HBase is primarily built for retrieving a single row at a time based on a
predetermined and known location (the key).  It is also very efficient at
splitting massive datasets across multiple machines and allowing sequential
batch analyses of these datasets.  HBase can maintain high performance in
this way because consistency only ever exists at the row level.  This is
what HBase is good at.

You need to focus what you're doing and then write it out.  Figure out how
you think the pieces should work together.  Read the documentation.  Then,
ask specific questions where you feel like the documentation is unclear or
you feel confused.  Your general questions are very difficult to answer in
any kind of really helpful way.

thanks,
Jacques


On Wed, Feb 22, 2012 at 9:51 AM, Bing Li lbl...@gmail.com wrote:

 Mr Gupta,

 Thanks so much for your reply!

 In my use cases, retrieving data by keyword is one of them. I think Solr
 is a proper choice.

 However, Solr does not provide a complex enough support to rank. And,
 frequent updating is also not suitable in Solr. So it is difficult to
 retrieve data randomly based on the values other than keyword frequency in
 text. In this case, I attempt to use HBase.

 But I don't know how HBase support high performance when it needs to keep
 consistency in a large scale distributed system.

 Now both of them are used in my system.

 I will check out ElasticSearch.

 Best regards,
 Bing


 On Thu, Feb 23, 2012 at 1:35 AM, T Vinod Gupta tvi...@readypulse.comwrote:

 Bing,
 Its a classic battle on whether to use solr or hbase or a combination of
 both. both systems are very different but there is some overlap in the
 utility. they also differ vastly when it compares to computation power,
 storage needs, etc. so in the end, it all boils down to your use case. you
 need to pick the technology that it best suited to your needs.
 im still not clear on your use case though.

 btw, if you haven't started using solr yet - then you might want to
 checkout ElasticSearch. I spent over a week researching between solr and ES
 and eventually chose ES due to its cool merits.

 thanks


 On Wed, Feb 22, 2012 at 9:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 There is no secondary index support in HBase at the moment.

 It's on our road map.

 FYI

 On Wed, Feb 22, 2012 at 9:28 AM, Bing Li lbl...@gmail.com wrote:

  Jacques,
 
  Yes. But I still have questions about that.
 
  In my system, when users search with a keyword arbitrarily, the query
 is
  forwarded to Solr. No any updating operations but appending new indexes
  exist in Solr managed data.
 
  When I need to retrieve data based on ranking values, HBase is used.
 And,
  the ranking values need to be updated all the time.
 
  Is that correct?
 
  My question is that the performance must be low if keeping consistency
 in a
  large scale distributed environment. How does HBase handle this issue?
 
  Thanks so much!
 
  Bing
 
 
  On Thu, Feb 23, 2012 at 1:17 AM, Jacques whs...@gmail.com wrote:
 
   It is highly unlikely that you could replace Solr with HBase.
  They're
   really apples and oranges.
  
  
   On Wed, Feb 22, 2012 at 1:09 AM, Bing Li lbl...@gmail.com wrote:
  
   Dear all,
  
   I wonder how data in HBase is indexed? Now Solr is used in my system
   because data is managed in inverted index. Such an index is
 suitable to
   retrieve unstructured and huge amount of data. How does HBase deal
 with
   the
   issue? May I replaced Solr with HBase?
  
   Thanks so much!
  
   Best regards,
   Bing