Re: Solr for noSQL
Hi All I don't know if it answers any of your question but if you are interested by that check out : Lucandra ( Cassandra + Lucene) 2011/2/1 Steven Noels > On Tue, Feb 1, 2011 at 11:52 AM, Upayavira wrote: > > > > > > Apologies if my "nothing funky" sounded like you weren't doing cool > > stuff. > > > No offense whatsoever. I think my longer reply paints a more accurate light > on what Lily means in terms of "SOLR for NoSQL", and it was your reaction > who triggered this additional explanation. > > > > I was merely attempting to say that I very much doubt you were > > doing anything funky like putting HBase underneath Solr as a replacement > > of FSDirectory. > > > There are some initiatives in the context of Cassandra IIRC, as well as a > project which stores Lucene index files in HBase tables, but frankly they > seem more experimentation, and also I think the nature of how Lucene/SOLR > works + what HBase does on top of Hadoop FS somehow is in conflict with > each > other. Too many layers of indirection will kill performance on every layer. > > > > > I was trying to imply that, likely your integration with > > Solr was relatively conventional (interacting with its REST interface), > > > > > Yep. We figured that was the wiser road to walk, and leaves a clear-defined > interface and possible area of improvement against a too-low level of > integration. > > > > and the "funky" stuff that you are doing sits outside of that space. > > > > Hope that's a clearer (and more accurate?) attempt at what I was trying > > to say. > > > > Upayavira (who finds the Lily project interesting, and would love to > > find the time to play with it) > > > > Anytime, Upayavira. Anytime! ;-) > > Steven. > -- > Steven Noels > http://outerthought.org/ > Scalable Smart Data > Makers of Kauri, Daisy CMS and Lily >
Re: Solr for noSQL
On Tue, Feb 1, 2011 at 11:52 AM, Upayavira wrote: > > Apologies if my "nothing funky" sounded like you weren't doing cool > stuff. No offense whatsoever. I think my longer reply paints a more accurate light on what Lily means in terms of "SOLR for NoSQL", and it was your reaction who triggered this additional explanation. > I was merely attempting to say that I very much doubt you were > doing anything funky like putting HBase underneath Solr as a replacement > of FSDirectory. There are some initiatives in the context of Cassandra IIRC, as well as a project which stores Lucene index files in HBase tables, but frankly they seem more experimentation, and also I think the nature of how Lucene/SOLR works + what HBase does on top of Hadoop FS somehow is in conflict with each other. Too many layers of indirection will kill performance on every layer. > I was trying to imply that, likely your integration with > Solr was relatively conventional (interacting with its REST interface), > Yep. We figured that was the wiser road to walk, and leaves a clear-defined interface and possible area of improvement against a too-low level of integration. > and the "funky" stuff that you are doing sits outside of that space. > > Hope that's a clearer (and more accurate?) attempt at what I was trying > to say. > > Upayavira (who finds the Lily project interesting, and would love to > find the time to play with it) > Anytime, Upayavira. Anytime! ;-) Steven. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Kauri, Daisy CMS and Lily
Re: Solr for noSQL
On Tue, 01 Feb 2011 07:22 +0100, "Steven Noels" wrote: > On Mon, Jan 31, 2011 at 9:38 PM, Upayavira wrote: > > > > > > > On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups" > > wrote: > > > What are the advantages of using something like HBase over your standard > > > Lucene index with Solr? It would seem to me like you'd be losing a lot of > > > what Lucene has to offer!?! > > > > I think Steven is saying that he has an indexer app that reads from > > HBase and writes to a standard Solr by hitting its Rest API. > > > > So, nothing funky, just a little app that reads from HBase and posts to > > Solr. > > > > > We're doing something like offering a relational-database-like experience > (i.e. a schema language, storing typed data instead of byte[]s, secondary > indexing facilities), with some content management features (versioning, > blob storage), combined with SOLR as a search index (with mapping between > our schema and that of SOLR), the index being maintained incrementally > and > through map/reduce (for reindexing). We keep multiple versions of the > index > if you want, with state management and we do text extraction with Tika. > All > this happens fully distributed, so you can play with different boxes > serving > as HBase datanode, or index feeder, SOLR search node, etc etc. > > All that sits behind a Java API that uses Avro underneath, and a REST > interface as well (searches go directly to SOLR). For future versions, we > will integrate a recommendation engine and some analytics tools as well. > > So yes, we do more (or rather: different things) than what Lucene/SOLR > does, > as we offer a full-featured data storage environment, stuffing your data > in > HBase (which scales better than MySQL), and make it searchable through > SOLR. > > The 'funky app' you're referring at now sits at about 3 manyears of > fulltime > development, BTW. ;-) Apologies if my "nothing funky" sounded like you weren't doing cool stuff. I was merely attempting to say that I very much doubt you were doing anything funky like putting HBase underneath Solr as a replacement of FSDirectory. I was trying to imply that, likely your integration with Solr was relatively conventional (interacting with its REST interface), and the "funky" stuff that you are doing sits outside of that space. Hope that's a clearer (and more accurate?) attempt at what I was trying to say. Upayavira (who finds the Lily project interesting, and would love to find the time to play with it) --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Solr for noSQL
On Mon, Jan 31, 2011 at 9:38 PM, Upayavira wrote: > > > On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups" > wrote: > > What are the advantages of using something like HBase over your standard > > Lucene index with Solr? It would seem to me like you'd be losing a lot of > > what Lucene has to offer!?! > > I think Steven is saying that he has an indexer app that reads from > HBase and writes to a standard Solr by hitting its Rest API. > > So, nothing funky, just a little app that reads from HBase and posts to > Solr. > We're doing something like offering a relational-database-like experience (i.e. a schema language, storing typed data instead of byte[]s, secondary indexing facilities), with some content management features (versioning, blob storage), combined with SOLR as a search index (with mapping between our schema and that of SOLR), the index being maintained incrementally and through map/reduce (for reindexing). We keep multiple versions of the index if you want, with state management and we do text extraction with Tika. All this happens fully distributed, so you can play with different boxes serving as HBase datanode, or index feeder, SOLR search node, etc etc. All that sits behind a Java API that uses Avro underneath, and a REST interface as well (searches go directly to SOLR). For future versions, we will integrate a recommendation engine and some analytics tools as well. So yes, we do more (or rather: different things) than what Lucene/SOLR does, as we offer a full-featured data storage environment, stuffing your data in HBase (which scales better than MySQL), and make it searchable through SOLR. The 'funky app' you're referring at now sits at about 3 manyears of fulltime development, BTW. ;-) Steven. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Kauri, Daisy CMS and Lily
Re: Solr for noSQL
On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups" wrote: > What are the advantages of using something like HBase over your standard > Lucene index with Solr? It would seem to me like you'd be losing a lot of > what Lucene has to offer!?! I think Steven is saying that he has an indexer app that reads from HBase and writes to a standard Solr by hitting its Rest API. So, nothing funky, just a little app that reads from HBase and posts to Solr. Upayavira > On Jan 31, 2011, at 5:34 AM, Steven Noels > wrote: > > > On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai wrote: > > > >> Hi, > >> > >> > >> > >> Do we have data import handler to fast read in data from noSQL database, > >> specifically, MongoDB I am thinking to use? > >> > >> Or a more general question, how does Solr work with noSQL database? > >> > > > > > > Can't say anything about MongoDB, but we have an integration of SOLR with > > HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR > > index update API rather than a DIH - as we had the need to have incremental > > updates. The Indexer component we wrote does mapping from Lily/HBase schema > > to SOLR, as we also felt the need that both schemas shouldn't necessarily be > > identical. > > > > Steven. > > -- > > Steven Noels > > http://outerthought.org/ > > Scalable Smart Data > > Makers of Kauri, Daisy CMS and Lily > --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Solr for noSQL
What are the advantages of using something like HBase over your standard Lucene index with Solr? It would seem to me like you'd be losing a lot of what Lucene has to offer!?! Adam On Jan 31, 2011, at 5:34 AM, Steven Noels wrote: > On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai wrote: > >> Hi, >> >> >> >> Do we have data import handler to fast read in data from noSQL database, >> specifically, MongoDB I am thinking to use? >> >> Or a more general question, how does Solr work with noSQL database? >> > > > Can't say anything about MongoDB, but we have an integration of SOLR with > HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR > index update API rather than a DIH - as we had the need to have incremental > updates. The Indexer component we wrote does mapping from Lily/HBase schema > to SOLR, as we also felt the need that both schemas shouldn't necessarily be > identical. > > Steven. > -- > Steven Noels > http://outerthought.org/ > Scalable Smart Data > Makers of Kauri, Daisy CMS and Lily
Re: Solr for noSQL
On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai wrote: > Hi, > > > > Do we have data import handler to fast read in data from noSQL database, > specifically, MongoDB I am thinking to use? > > Or a more general question, how does Solr work with noSQL database? > Can't say anything about MongoDB, but we have an integration of SOLR with HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR index update API rather than a DIH - as we had the need to have incremental updates. The Indexer component we wrote does mapping from Lily/HBase schema to SOLR, as we also felt the need that both schemas shouldn't necessarily be identical. Steven. -- Steven Noels http://outerthought.org/ Scalable Smart Data Makers of Kauri, Daisy CMS and Lily
Re: Solr for noSQL
Personally, I just create a view that flattens out the database and renames the fields as I desire. Then I call the view with the DIH to import it. Solr doesn't knwo anything about the databsae, except how to get a connection and fetch rows. And that's pretty darn useful, just that much less code to write. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Upayavira To: solr-user@lucene.apache.org Sent: Fri, January 28, 2011 1:41:42 AM Subject: Re: Solr for noSQL On Thu, 27 Jan 2011 21:38 -0800, "Dennis Gearon" wrote: > Why not make one's own DIH handler, Lance? Personally, I don't like that approach. Solr is best related to as something of a black box that you configure, then push content to. Having Solr know about your data sources, and pull content in seems to me to be mixing concerns. I relate to the DIH as a useful tool for smaller sites or for prototyping, but would expect anything more substantial to require an indexing application that gives you full control over the indexing process. It could be a lightweight app that uses a MongoDB java client and SolrJ, and simply pulls from one and pushes to the other. If you don't want to run another JVM, it could run as a separate webapp within your Solr JVM. From an architectural point of view, do you configure Mysql, or MongoDB for that matter, to pull content into itself? Likewise, Solr should be a service that listens, waiting to be given data. Upayavira --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Solr for noSQL
Have you tried indexing using HTTP POST, you just call your information or documents from your DB and store it in a variable, next you just loop the POST as many register you have, and problem solve. With this method it doesn't matter what kind of DB you are using... On 1/28/11 7:43 AM, "Erick Erickson" wrote: > I'll reply for Lance because I'm awake earlier ... > > To make your own DIH, you have to solve all the > problems you'd have to solve to use a Java program > connect to your datasource via JDBC, PLUS > fit it into the DIH framework. Why do the extra work? > > The other thing is that writing your own code gives > you much greater control over, say, error handling, > exception handling, continue-or-abort decisions, etc. > DIH is a good tool, don't get me wrong, but I prefer > more control in production situations. > > Plus, connecting to Solr via SolrJ AND > connecting to your database takes about 20 lines > of code, it's not very complex. You can have that > done pretty quickly... > > But if you'd rather make your own DIH, it's up to you. > > Best > Erick > > On Fri, Jan 28, 2011 at 12:38 AM, Dennis Gearon wrote: > >> Why not make one's own DIH handler, Lance? >> >> Dennis Gearon >> >> >> Signature Warning >> >> It is always a good idea to learn from your own mistakes. It is usually a >> better >> idea to learn from others¹ mistakes, so you do not have to make them >> yourself. >> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' >> >> >> EARTH has a Right To Life, >> otherwise we all die. >> >> >> >> - Original Message >> From: Lance Norskog >> To: solr-user@lucene.apache.org >> Sent: Thu, January 27, 2011 9:33:25 PM >> Subject: Re: Solr for noSQL >> >> There no special connectors available to read from the key-value >> stores like memcache/cassandra/mongodb. You would have to get a Java >> client library for the DB and code your own dataimporthandler >> datasource. I cannot recommend this; you should make your own program >> to read data and upload to Solr with one of the Solr client libraries. >> >> Lance >> >> On 1/27/11, Jianbin Dai wrote: >>> Hi, >>> >>> >>> >>> Do we have data import handler to fast read in data from noSQL database, >>> specifically, MongoDB I am thinking to use? >>> >>> Or a more general question, how does Solr work with noSQL database? >>> >>> Thanks. >>> >>> >>> >>> Jianbin >>> >>> >>> >>> >> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >>
Re: Solr for noSQL
I'll reply for Lance because I'm awake earlier ... To make your own DIH, you have to solve all the problems you'd have to solve to use a Java program connect to your datasource via JDBC, PLUS fit it into the DIH framework. Why do the extra work? The other thing is that writing your own code gives you much greater control over, say, error handling, exception handling, continue-or-abort decisions, etc. DIH is a good tool, don't get me wrong, but I prefer more control in production situations. Plus, connecting to Solr via SolrJ AND connecting to your database takes about 20 lines of code, it's not very complex. You can have that done pretty quickly... But if you'd rather make your own DIH, it's up to you. Best Erick On Fri, Jan 28, 2011 at 12:38 AM, Dennis Gearon wrote: > Why not make one's own DIH handler, Lance? > > Dennis Gearon > > > Signature Warning > > It is always a good idea to learn from your own mistakes. It is usually a > better > idea to learn from others’ mistakes, so you do not have to make them > yourself. > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > EARTH has a Right To Life, > otherwise we all die. > > > > - Original Message ---- > From: Lance Norskog > To: solr-user@lucene.apache.org > Sent: Thu, January 27, 2011 9:33:25 PM > Subject: Re: Solr for noSQL > > There no special connectors available to read from the key-value > stores like memcache/cassandra/mongodb. You would have to get a Java > client library for the DB and code your own dataimporthandler > datasource. I cannot recommend this; you should make your own program > to read data and upload to Solr with one of the Solr client libraries. > > Lance > > On 1/27/11, Jianbin Dai wrote: > > Hi, > > > > > > > > Do we have data import handler to fast read in data from noSQL database, > > specifically, MongoDB I am thinking to use? > > > > Or a more general question, how does Solr work with noSQL database? > > > > Thanks. > > > > > > > > Jianbin > > > > > > > > > > > -- > Lance Norskog > goks...@gmail.com > >
Re: Solr for noSQL
On Thu, 27 Jan 2011 21:38 -0800, "Dennis Gearon" wrote: > Why not make one's own DIH handler, Lance? Personally, I don't like that approach. Solr is best related to as something of a black box that you configure, then push content to. Having Solr know about your data sources, and pull content in seems to me to be mixing concerns. I relate to the DIH as a useful tool for smaller sites or for prototyping, but would expect anything more substantial to require an indexing application that gives you full control over the indexing process. It could be a lightweight app that uses a MongoDB java client and SolrJ, and simply pulls from one and pushes to the other. If you don't want to run another JVM, it could run as a separate webapp within your Solr JVM. >From an architectural point of view, do you configure Mysql, or MongoDB for that matter, to pull content into itself? Likewise, Solr should be a service that listens, waiting to be given data. Upayavira --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Solr for noSQL
On Fri, Jan 28, 2011 at 6:00 AM, Jianbin Dai wrote: [...] > Do we have data import handler to fast read in data from noSQL database, > specifically, MongoDB I am thinking to use? [...] Have you tried the links that a Google search turns up? Some of them look like pretty good prospects. Regards, Gora
Re: Solr for noSQL
Do we have performance measurement? Would it be much slower compared to other DIH? > There no special connectors available to read from the key-value > stores like memcache/cassandra/mongodb. You would have to get a Java > client library for the DB and code your own dataimporthandler > datasource. I cannot recommend this; you should make your own program > to read data and upload to Solr with one of the Solr client libraries. > > Lance > > On 1/27/11, Jianbin Dai wrote: > > Hi, > > > > > > > > Do we have data import handler to fast read in data from noSQL > database,> specifically, MongoDB I am thinking to use? > > > > Or a more general question, how does Solr work with noSQL database? > > > > Thanks. > > > > > > > > Jianbin > > > > > > > > > > > -- > Lance Norskog > goks...@gmail.com >
Re: Solr for noSQL
Why not make one's own DIH handler, Lance? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Lance Norskog To: solr-user@lucene.apache.org Sent: Thu, January 27, 2011 9:33:25 PM Subject: Re: Solr for noSQL There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource. I cannot recommend this; you should make your own program to read data and upload to Solr with one of the Solr client libraries. Lance On 1/27/11, Jianbin Dai wrote: > Hi, > > > > Do we have data import handler to fast read in data from noSQL database, > specifically, MongoDB I am thinking to use? > > Or a more general question, how does Solr work with noSQL database? > > Thanks. > > > > Jianbin > > > > -- Lance Norskog goks...@gmail.com
Re: Solr for noSQL
There no special connectors available to read from the key-value stores like memcache/cassandra/mongodb. You would have to get a Java client library for the DB and code your own dataimporthandler datasource. I cannot recommend this; you should make your own program to read data and upload to Solr with one of the Solr client libraries. Lance On 1/27/11, Jianbin Dai wrote: > Hi, > > > > Do we have data import handler to fast read in data from noSQL database, > specifically, MongoDB I am thinking to use? > > Or a more general question, how does Solr work with noSQL database? > > Thanks. > > > > Jianbin > > > > -- Lance Norskog goks...@gmail.com