RE: Is Solr right for my business situation ?
Recent versions supports sharding and handles distribution of your query and result set merging. The problem, it won't help you to join on separate `tables`. The fields you query need to be present in each shard or you'll end up with an HTTP 400 - undefined field error. Indeed, there is no escape. -Original message- From: Sharma, Raghvendra Sent: Thu 30-09-2010 20:07 To: solr-user@lucene.apache.org; Subject: RE: Is Solr right for my business situation ? Thanks for the ideas. I think after reading enough documentation and articles around solr and xml indexing in general, I have come around to understand that there is no escaping denormalization. However, one tiny thought remains... perhaps my last shot at avoiding denormalization (of course its going to be a costly affair).. I was reading about how solr can handle multiple cores and therefore multiple indexes. Can there be a single search interface sending queries to these three cores ?? in that case, who would do load balancing ? the merging of the results ?? and whether I would be running three instances of solr on my system(s) or only one can handle that.. -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Thursday, September 30, 2010 9:25 PM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? You need to be able to query the database with the 'Mother of all Queries', i.e. one that completely flattens all tables into each row. In other words, the JOIN section of the query will have EVERY table in it, and depending on your schema, some of them twice or more. Trying to do that with CSV, separate tables would require you to put those into your OWN database, then query against that, as above. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/29/10, Sharma, Raghvendra wrote: > From: Sharma, Raghvendra > Subject: RE: Is Solr right for my business situation ? > To: "solr-user@lucene.apache.org" > Date: Wednesday, September 29, 2010, 9:40 AM > Some questions. > > 1. I have about 3-5 tables. Now designing schema.xml for a > single table looks ok, but whats the direction for handling > multiple table structures is something I am not sure about. > Would it be like a big huge xml, wherein those three tables > (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab > delimited). > > Do you think having multiple indexes could be a solution > for this case ?? or do I really need to spend effort in > denormalizing the data ? > > 2. Further, loading into solr can use some perf tuning.. > any tips ? best practices ? > > 3. Also, is there a way to specify a xslt at the server > side, and make it default, i.e. whenever a response is > returned, that xslt is applied to the response > automatically... > > 4. And last question for the day - :) there was one post > saying that the spatial support is really basic in solr and > is going to be improved in next versions... Can you ppl help > me get a definitive yes or no on spatial support... in the > current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > --raghav.. > > -Original Message- > From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] > > Sent: Tuesday, September 28, 2010 11:45 AM > To: solr-user@lucene.apache.org > Subject: RE: Is Solr right for my business situation ? > > Thanks for the responses people. > > @Grant > > 1. can you show me some direction on that.. loading data > from an incoming stream.. do I need some third party tools, > or need to build something myself... > > 4. I am basically attempting to build a very fast search > interface for the existing data. The volume I mentioned is > more like static one (data is already there). The sql > statements I mentioned are daily updates coming. The good > thing is that the history is not there, so the overall > volume is not growing, but I need to apply the update > statements. > > One workaround I had in mind is, (though not so great > performance) is to apply the updates to a copy of rdbms, and > then feed the rdbms extract to solr. Sounds like > overkill, but I don't have another idea right now. Perhaps > business discussions would yield something. > > @All - > > Some more questions guys. > > 1. I have about 3-5 tables. Now designing schema.xml for a > single table looks ok, but whats the direction for handling > multiple table structures is something I am not sure about. > Would i
RE: Is Solr right for my business situation ?
Thanks for the ideas. I think after reading enough documentation and articles around solr and xml indexing in general, I have come around to understand that there is no escaping denormalization. However, one tiny thought remains... perhaps my last shot at avoiding denormalization (of course its going to be a costly affair).. I was reading about how solr can handle multiple cores and therefore multiple indexes. Can there be a single search interface sending queries to these three cores ?? in that case, who would do load balancing ? the merging of the results ?? and whether I would be running three instances of solr on my system(s) or only one can handle that.. -Original Message- From: Dennis Gearon [mailto:gear...@sbcglobal.net] Sent: Thursday, September 30, 2010 9:25 PM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? You need to be able to query the database with the 'Mother of all Queries', i.e. one that completely flattens all tables into each row. In other words, the JOIN section of the query will have EVERY table in it, and depending on your schema, some of them twice or more. Trying to do that with CSV, separate tables would require you to put those into your OWN database, then query against that, as above. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/29/10, Sharma, Raghvendra wrote: > From: Sharma, Raghvendra > Subject: RE: Is Solr right for my business situation ? > To: "solr-user@lucene.apache.org" > Date: Wednesday, September 29, 2010, 9:40 AM > Some questions. > > 1. I have about 3-5 tables. Now designing schema.xml for a > single table looks ok, but whats the direction for handling > multiple table structures is something I am not sure about. > Would it be like a big huge xml, wherein those three tables > (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab > delimited). > > Do you think having multiple indexes could be a solution > for this case ?? or do I really need to spend effort in > denormalizing the data ? > > 2. Further, loading into solr can use some perf tuning.. > any tips ? best practices ? > > 3. Also, is there a way to specify a xslt at the server > side, and make it default, i.e. whenever a response is > returned, that xslt is applied to the response > automatically... > > 4. And last question for the day - :) there was one post > saying that the spatial support is really basic in solr and > is going to be improved in next versions... Can you ppl help > me get a definitive yes or no on spatial support... in the > current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > --raghav.. > > -Original Message- > From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] > > Sent: Tuesday, September 28, 2010 11:45 AM > To: solr-user@lucene.apache.org > Subject: RE: Is Solr right for my business situation ? > > Thanks for the responses people. > > @Grant > > 1. can you show me some direction on that.. loading data > from an incoming stream.. do I need some third party tools, > or need to build something myself... > > 4. I am basically attempting to build a very fast search > interface for the existing data. The volume I mentioned is > more like static one (data is already there). The sql > statements I mentioned are daily updates coming. The good > thing is that the history is not there, so the overall > volume is not growing, but I need to apply the update > statements. > > One workaround I had in mind is, (though not so great > performance) is to apply the updates to a copy of rdbms, and > then feed the rdbms extract to solr. Sounds like > overkill, but I don't have another idea right now. Perhaps > business discussions would yield something. > > @All - > > Some more questions guys. > > 1. I have about 3-5 tables. Now designing schema.xml for a > single table looks ok, but whats the direction for handling > multiple table structures is something I am not sure about. > Would it be like a big huge xml, wherein those three tables > (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab > delimited). > > 2. Further, loading into solr can use some perf tuning.. > any tips ? best practices ? > > 3. Also, is there a way to specify a xslt at the server > side, and make it default, i.e. whenever a response is > returned, that xslt is applied
RE: Is Solr right for my business situation ?
You need to be able to query the database with the 'Mother of all Queries', i.e. one that completely flattens all tables into each row. In other words, the JOIN section of the query will have EVERY table in it, and depending on your schema, some of them twice or more. Trying to do that with CSV, separate tables would require you to put those into your OWN database, then query against that, as above. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 9/29/10, Sharma, Raghvendra wrote: > From: Sharma, Raghvendra > Subject: RE: Is Solr right for my business situation ? > To: "solr-user@lucene.apache.org" > Date: Wednesday, September 29, 2010, 9:40 AM > Some questions. > > 1. I have about 3-5 tables. Now designing schema.xml for a > single table looks ok, but whats the direction for handling > multiple table structures is something I am not sure about. > Would it be like a big huge xml, wherein those three tables > (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab > delimited). > > Do you think having multiple indexes could be a solution > for this case ?? or do I really need to spend effort in > denormalizing the data ? > > 2. Further, loading into solr can use some perf tuning.. > any tips ? best practices ? > > 3. Also, is there a way to specify a xslt at the server > side, and make it default, i.e. whenever a response is > returned, that xslt is applied to the response > automatically... > > 4. And last question for the day - :) there was one post > saying that the spatial support is really basic in solr and > is going to be improved in next versions... Can you ppl help > me get a definitive yes or no on spatial support... in the > current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > --raghav.. > > -Original Message- > From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] > > Sent: Tuesday, September 28, 2010 11:45 AM > To: solr-user@lucene.apache.org > Subject: RE: Is Solr right for my business situation ? > > Thanks for the responses people. > > @Grant > > 1. can you show me some direction on that.. loading data > from an incoming stream.. do I need some third party tools, > or need to build something myself... > > 4. I am basically attempting to build a very fast search > interface for the existing data. The volume I mentioned is > more like static one (data is already there). The sql > statements I mentioned are daily updates coming. The good > thing is that the history is not there, so the overall > volume is not growing, but I need to apply the update > statements. > > One workaround I had in mind is, (though not so great > performance) is to apply the updates to a copy of rdbms, and > then feed the rdbms extract to solr. Sounds like > overkill, but I don't have another idea right now. Perhaps > business discussions would yield something. > > @All - > > Some more questions guys. > > 1. I have about 3-5 tables. Now designing schema.xml for a > single table looks ok, but whats the direction for handling > multiple table structures is something I am not sure about. > Would it be like a big huge xml, wherein those three tables > (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab > delimited). > > 2. Further, loading into solr can use some perf tuning.. > any tips ? best practices ? > > 3. Also, is there a way to specify a xslt at the server > side, and make it default, i.e. whenever a response is > returned, that xslt is applied to the response > automatically... > > 4. And last question for the day - :) there was one post > saying that the spatial support is really basic in solr and > is going to be improved in next versions... Can you ppl help > me get a definitive yes or no on spatial support... in the > current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > Looks like I m close to my solution.. :) > > --raghav > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > > Sent: Tuesday, September 28, 2010 1:05 AM > To: solr-user@lucene.apache.org > Subject: Re: Is Solr right for my business situation ? > > Inline. > > On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > > > When do you need to deploy? > > > > As I under
Re: Is Solr right for my business situation ?
Some of these are big questions- try them in different emails. On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra wrote: > Some questions. > > 1. I have about 3-5 tables. Now designing schema.xml for a single table looks > ok, but whats the direction for handling multiple table structures is > something I am not sure about. Would it be like a big huge xml, wherein those > three tables (assuming its three) would show up as three different tag-trees, > nullable. > > My source provides me a single flat file per table (tab delimited). > > Do you think having multiple indexes could be a solution for this case ?? or > do I really need to spend effort in denormalizing the data ? > > 2. Further, loading into solr can use some perf tuning.. any tips ? best > practices ? > > 3. Also, is there a way to specify a xslt at the server side, and make it > default, i.e. whenever a response is returned, that xslt is applied to the > response automatically... > > 4. And last question for the day - :) there was one post saying that the > spatial support is really basic in solr and is going to be improved in next > versions... Can you ppl help me get a definitive yes or no on spatial > support... in the current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > --raghav.. > > -Original Message- > From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] > Sent: Tuesday, September 28, 2010 11:45 AM > To: solr-user@lucene.apache.org > Subject: RE: Is Solr right for my business situation ? > > Thanks for the responses people. > > @Grant > > 1. can you show me some direction on that.. loading data from an incoming > stream.. do I need some third party tools, or need to build something > myself... > > 4. I am basically attempting to build a very fast search interface for the > existing data. The volume I mentioned is more like static one (data is > already there). The sql statements I mentioned are daily updates coming. The > good thing is that the history is not there, so the overall volume is not > growing, but I need to apply the update statements. > > One workaround I had in mind is, (though not so great performance) is to > apply the updates to a copy of rdbms, and then feed the rdbms extract to > solr. Sounds like overkill, but I don't have another idea right now. Perhaps > business discussions would yield something. > > @All - > > Some more questions guys. > > 1. I have about 3-5 tables. Now designing schema.xml for a single table looks > ok, but whats the direction for handling multiple table structures is > something I am not sure about. Would it be like a big huge xml, wherein those > three tables (assuming its three) would show up as three different tag-trees, > nullable. > > My source provides me a single flat file per table (tab delimited). > > 2. Further, loading into solr can use some perf tuning.. any tips ? best > practices ? > > 3. Also, is there a way to specify a xslt at the server side, and make it > default, i.e. whenever a response is returned, that xslt is applied to the > response automatically... > > 4. And last question for the day - :) there was one post saying that the > spatial support is really basic in solr and is going to be improved in next > versions... Can you ppl help me get a definitive yes or no on spatial > support... in the current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > Looks like I m close to my solution.. :) > > --raghav > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Tuesday, September 28, 2010 1:05 AM > To: solr-user@lucene.apache.org > Subject: Re: Is Solr right for my business situation ? > > Inline. > > On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > >> When do you need to deploy? >> >> As I understand it, the spatial search in Solr is being rewritten and is >> slated for Solr 4.0, the release after next. > > It will be in 3.x, the next release > >> >> The existing spatial search has some serious problems and is deprecated. >> >> Right now, I think the only way to get spatial search in Solr is to deploy a >> nightly snapshot from the active development on trunk. If you are deploying >> a year from now, that might change. >> >> There is not any support for SQL-like statements or for joins. The best >> practice for Solr is to think of your data as a single table, essentially >> creating a view from your database. The rows become Solr documents, the >> columns become Solr fields. > > Ther
Re: Is Solr right for my business situation ?
If at all possible, denormalize the data. Anytime you find yourself trying to make Solr behave like a database, the probability is high that you're mis-using Solr or the DB. Best Erick On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra < sraghven...@corelogic.com> wrote: > Some questions. > > 1. I have about 3-5 tables. Now designing schema.xml for a single table > looks ok, but whats the direction for handling multiple table structures is > something I am not sure about. Would it be like a big huge xml, wherein > those three tables (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab delimited). > > Do you think having multiple indexes could be a solution for this case ?? > or do I really need to spend effort in denormalizing the data ? > > 2. Further, loading into solr can use some perf tuning.. any tips ? best > practices ? > > 3. Also, is there a way to specify a xslt at the server side, and make it > default, i.e. whenever a response is returned, that xslt is applied to the > response automatically... > > 4. And last question for the day - :) there was one post saying that the > spatial support is really basic in solr and is going to be improved in next > versions... Can you ppl help me get a definitive yes or no on spatial > support... in the current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > --raghav.. > > -Original Message- > From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] > Sent: Tuesday, September 28, 2010 11:45 AM > To: solr-user@lucene.apache.org > Subject: RE: Is Solr right for my business situation ? > > Thanks for the responses people. > > @Grant > > 1. can you show me some direction on that.. loading data from an incoming > stream.. do I need some third party tools, or need to build something > myself... > > 4. I am basically attempting to build a very fast search interface for the > existing data. The volume I mentioned is more like static one (data is > already there). The sql statements I mentioned are daily updates coming. The > good thing is that the history is not there, so the overall volume is not > growing, but I need to apply the update statements. > > One workaround I had in mind is, (though not so great performance) is to > apply the updates to a copy of rdbms, and then feed the rdbms extract to > solr. Sounds like overkill, but I don't have another idea right now. > Perhaps business discussions would yield something. > > @All - > > Some more questions guys. > > 1. I have about 3-5 tables. Now designing schema.xml for a single table > looks ok, but whats the direction for handling multiple table structures is > something I am not sure about. Would it be like a big huge xml, wherein > those three tables (assuming its three) would show up as three different > tag-trees, nullable. > > My source provides me a single flat file per table (tab delimited). > > 2. Further, loading into solr can use some perf tuning.. any tips ? best > practices ? > > 3. Also, is there a way to specify a xslt at the server side, and make it > default, i.e. whenever a response is returned, that xslt is applied to the > response automatically... > > 4. And last question for the day - :) there was one post saying that the > spatial support is really basic in solr and is going to be improved in next > versions... Can you ppl help me get a definitive yes or no on spatial > support... in the current form, does it work on not ? I would store lat and > long, and would need to make them searchable... > > Looks like I m close to my solution.. :) > > --raghav > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Tuesday, September 28, 2010 1:05 AM > To: solr-user@lucene.apache.org > Subject: Re: Is Solr right for my business situation ? > > Inline. > > On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > > > When do you need to deploy? > > > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. > > It will be in 3.x, the next release > > > > > The existing spatial search has some serious problems and is deprecated. > > > > Right now, I think the only way to get spatial search in Solr is to > deploy a nightly snapshot from the active development on trunk. If you are > deploying a year from now, that might change. > > > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a
RE: Is Solr right for my business situation ?
Some questions. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). Do you think having multiple indexes could be a solution for this case ?? or do I really need to spend effort in denormalizing the data ? 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... --raghav.. -Original Message- From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 11:45 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a view from your database. The rows become Solr documents, the > columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. > > wunder > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > >> I am sure these kind of questions keep coming to you guys, but I want to >> raise the same question in a different context...my own business situation. >> I am very very new to solr and though I have tried to read through the >> documentation, I have nowhere near completing the whole read. >> >> The need is like this - >> >> We have a huge rdbms database/table. A single table perhaps houses 100+ >&
RE: Is Solr right for my business situation ?
"Staging" the data in a non-Solr store sounds like a potentially reasonable idea to me. You might want to consider a NoSQL store of some kind like MongoDB perhaps, instead of an rdbms. The way to think about Solr is not as a store or a database -- it's an index for serving your application. That's also the way to think about how to get your multiple tables in there -- denormalize, denormalize, denormalize. You need to think about what you actually need to search over, and build your index to serve that efficiently, rather than thinking about normalization or data modelling the way we are used to with rdbms's, it's a different way of thinking. A Solr index basically gives you one collection of documents. But the documents can all have different fields -- so you _could_ (but probably don't want to) essentially put all your tables in there with unique fields --they're all in the same index, they're all just "documents", but some have a table1_title and table1_author, and others have no data in those fields but a table2_productName and a table2_price. Then if you want to query on just one type of thing, you just query on those fields. Except... you don't get any joins. Which is why you probably don't want to do that after all, it probably won't serve your needs. Figuring out the right way to model your data in Solr can be tricky, and it is sometimes hard to do exactly what you want. Solr isn't an rdbms, and in some ways isn't as powerful as an rdbms -- in the sense of being as flexible with what kinds of queries you can run on any given data. What it does is give you very fast access to inverted index lookups and set combinations and facetting that would be very hard to do efficiently in an rdbms. It is a trade-off. But there's not really a general answer to "how do I take these dozen rdbms tables and store them in Solr the best way?" -- it depends on what kinds of searching you need to support and the nature of your data. From: Sharma, Raghvendra [sraghven...@corelogic.com] Sent: Tuesday, September 28, 2010 2:15 AM To: solr-user@lucene.apache.org Subject: RE: Is Solr right for my business situation ? Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might chan
RE: Is Solr right for my business situation ?
Thanks for the responses people. @Grant 1. can you show me some direction on that.. loading data from an incoming stream.. do I need some third party tools, or need to build something myself... 4. I am basically attempting to build a very fast search interface for the existing data. The volume I mentioned is more like static one (data is already there). The sql statements I mentioned are daily updates coming. The good thing is that the history is not there, so the overall volume is not growing, but I need to apply the update statements. One workaround I had in mind is, (though not so great performance) is to apply the updates to a copy of rdbms, and then feed the rdbms extract to solr. Sounds like overkill, but I don't have another idea right now. Perhaps business discussions would yield something. @All - Some more questions guys. 1. I have about 3-5 tables. Now designing schema.xml for a single table looks ok, but whats the direction for handling multiple table structures is something I am not sure about. Would it be like a big huge xml, wherein those three tables (assuming its three) would show up as three different tag-trees, nullable. My source provides me a single flat file per table (tab delimited). 2. Further, loading into solr can use some perf tuning.. any tips ? best practices ? 3. Also, is there a way to specify a xslt at the server side, and make it default, i.e. whenever a response is returned, that xslt is applied to the response automatically... 4. And last question for the day - :) there was one post saying that the spatial support is really basic in solr and is going to be improved in next versions... Can you ppl help me get a definitive yes or no on spatial support... in the current form, does it work on not ? I would store lat and long, and would need to make them searchable... Looks like I m close to my solution.. :) --raghav -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, September 28, 2010 1:05 AM To: solr-user@lucene.apache.org Subject: Re: Is Solr right for my business situation ? Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a view from your database. The rows become Solr documents, the > columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. > > wunder > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > >> I am sure these kind of questions keep coming to you guys, but I want to >> raise the same question in a different context...my own business situation. >> I am very very new to solr and though I have tried to read through the >> documentation, I have nowhere near completing the whole read. >> >> The need is like this - >> >> We have a huge rdbms database/table. A single table perhaps houses 100+ >> million rows. Though oracle is doing a fine job of handling the insertion >> and updation of data, the querying is where our main concerns lie. Since we >> have spatial data, the index building takes hours and hours for such tables. >> >> That's when we thought of moving away from standard rdbms and thought of >> trying something different and fast. >> My last week has been spent in a journey reading through bigtable to hadoop >> to hbase, to hive and then finally landed on solr. As far as I am in my >> tests, it looks pretty good, but I have a few unanswered questions still. >> Trying this group for them :) (I am sure I can find some answers if I >> read/google more on the topic, but now I m being lazy and feel asking the >> people who are already using it/or perhaps developing it is a better bet). >> >> 1. Can I get my solr instance to load data (fresh data for indexing) from a >> stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. >> 2. Can I host my solr instance to use hbase as the database/file system >> (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. >> 3. are there somewhere any reports available (as in benchmarks ) for a
Re: Is Solr right for my business situation ?
Ah, totally looked over that news: spatial search in 3.x! :-D :-D Any idea already when this will be released? Awesome to hear that it has been moved forward! :) -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Solr right for my business situation ?
Wow, that is a relief! I was going to have to look at ElasticSearch instead. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/27/10, Grant Ingersoll wrote: > From: Grant Ingersoll > Subject: Re: Is Solr right for my business situation ? > To: solr-user@lucene.apache.org > Date: Monday, September 27, 2010, 12:35 PM > Inline. > > On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > > > When do you need to deploy? > > > > As I understand it, the spatial search in Solr is > being rewritten and is slated for Solr 4.0, the release > after next. > > It will be in 3.x, the next release > > > > > The existing spatial search has some serious problems > and is deprecated. > > > > Right now, I think the only way to get spatial search > in Solr is to deploy a nightly snapshot from the active > development on trunk. If you are deploying a year from now, > that might change. > > > > There is not any support for SQL-like statements or > for joins. The best practice for Solr is to think of your > data as a single table, essentially creating a view from > your database. The rows become Solr documents, the columns > become Solr fields. > > There is now group-by capabilities in trunk as well, which > may or may not help. > > > > > wunder > > > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra > wrote: > > > >> I am sure these kind of questions keep coming to > you guys, but I want to raise the same question in a > different context...my own business situation. > >> I am very very new to solr and though I have tried > to read through the documentation, I have nowhere near > completing the whole read. > >> > >> The need is like this - > >> > >> We have a huge rdbms database/table. A single > table perhaps houses 100+ million rows. Though oracle is > doing a fine job of handling the insertion and updation of > data, the querying is where our main concerns lie. > Since we have spatial data, the index building takes hours > and hours for such tables. > >> > >> That's when we thought of moving away from > standard rdbms and thought of trying something different and > fast. > >> My last week has been spent in a journey reading > through bigtable to hadoop to hbase, to hive and then > finally landed on solr. As far as I am in my tests, it looks > pretty good, but I have a few unanswered questions still. > Trying this group for them :) (I am sure I can > find some answers if I read/google more on the topic, but > now I m being lazy and feel asking the people who are > already using it/or perhaps developing it is a better bet). > >> > >> 1. Can I get my solr instance to load data (fresh > data for indexing) from a stream (imagine a mq kind of > queue, or similar) ? > > Yes, with a little bit of work. > > >> 2. Can I host my solr instance to use hbase as the > database/file system (read HDFS) ? > > Probably, but I doubt it will be fast. Local disk is > usually the best. 100+ M rows is large but not > unreasonable. > > >> 3. are there somewhere any reports available (as > in benchmarks ) for a solr instance's performance ? > > You can probably search the web for these. I've > personally seen several installs w/ 1B+ docs and subsecond > search and faceting and heard of others. You might > look at the stuff the Hathi trust has put up. > > >> 4. are there any APIs available which might help > me apply ANSI sql kind of statements to my solr data ? > > No. Question back? What kinds of things are you > trying to do? > > >> > >> It would be great if people could help share their > experience in the area... if it's too much trouble writing > all of it, perhaps url would be easier... I welcome all > kinds of help here... any advice/suggestions are good ... > >> > >> Looking forward to your viewpoints.. > >> > >> --raghav.. > >> > ** > > >> This message may contain confidential or > proprietary information intended only for the use of the > >> addressee(s) named above or may contain > information that is legally privileged. If you are > >> not the intended addressee, or the person > responsible for delivering it to the intended addressee, > >> you are hereby notified that reading, > disseminating, distributing or copying this message is > strictly > >> prohibited. If you have received this message by > mistake, please immediately notify us by > >> replying to the message and delete the original > message and any copies immediately thereafter. > >> > >> Thank you. > >> > ** > > >> CLLD > >> > > > > > > > > > > -- > Grant Ingersoll > http://lucenerevolution.org Apache Lucene/Solr > Conference, Boston Oct 7-8 > >
Re: Is Solr right for my business situation ?
@Walter Underwood: Walter Underwood wrote: > > Right now, I think the only way to get spatial search in Solr is to deploy > a nightly snapshot from the active development on trunk. > Could you give me the link to this trunk, I need it very much! Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592330.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is Solr right for my business situation ?
Right, I know, I was curious about it's current closeness to being in main distro, not a patch. Among other things, when those who know better decide it goes in core distro, that makes me more comfortable that they've decided it works acceptably, and also makes more more comfortable that it will continue to be supported in _future_ versions without someone having to prepare a new patch. Ravi Julapalli wrote: Hi Jonathan, Field collpasing is available in 1.4 by applying patch https://issues.apache.org/jira/browse/SOLR-236 -Ravi From: Jonathan Rochkind To: "solr-user@lucene.apache.org" Sent: Mon, September 27, 2010 9:18:20 PM Subject: Re: Is Solr right for my business situation ? Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
Hi Jonathan, Field collpasing is available in 1.4 by applying patch https://issues.apache.org/jira/browse/SOLR-236 -Ravi From: Jonathan Rochkind To: "solr-user@lucene.apache.org" Sent: Mon, September 27, 2010 9:18:20 PM Subject: Re: Is Solr right for my business situation ? Grant Ingersoll wrote: > > There is now group-by capabilities in trunk as well, which may or may not help. > Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan >
Re: Is Solr right for my business situation ?
Grant Ingersoll wrote: There is now group-by capabilities in trunk as well, which may or may not help. Really, the field collapsing stuff has been committed to trunk finally? Or are you talking about something else? If it's the field collapsing stuff, and it's been committed to trunk, does that mean it'll be in the 3.0 release? Jonathan
Re: Is Solr right for my business situation ?
Inline. On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote: > When do you need to deploy? > > As I understand it, the spatial search in Solr is being rewritten and is > slated for Solr 4.0, the release after next. It will be in 3.x, the next release > > The existing spatial search has some serious problems and is deprecated. > > Right now, I think the only way to get spatial search in Solr is to deploy a > nightly snapshot from the active development on trunk. If you are deploying a > year from now, that might change. > > There is not any support for SQL-like statements or for joins. The best > practice for Solr is to think of your data as a single table, essentially > creating a view from your database. The rows become Solr documents, the > columns become Solr fields. There is now group-by capabilities in trunk as well, which may or may not help. > > wunder > > On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > >> I am sure these kind of questions keep coming to you guys, but I want to >> raise the same question in a different context...my own business situation. >> I am very very new to solr and though I have tried to read through the >> documentation, I have nowhere near completing the whole read. >> >> The need is like this - >> >> We have a huge rdbms database/table. A single table perhaps houses 100+ >> million rows. Though oracle is doing a fine job of handling the insertion >> and updation of data, the querying is where our main concerns lie. Since we >> have spatial data, the index building takes hours and hours for such tables. >> >> That's when we thought of moving away from standard rdbms and thought of >> trying something different and fast. >> My last week has been spent in a journey reading through bigtable to hadoop >> to hbase, to hive and then finally landed on solr. As far as I am in my >> tests, it looks pretty good, but I have a few unanswered questions still. >> Trying this group for them :) (I am sure I can find some answers if I >> read/google more on the topic, but now I m being lazy and feel asking the >> people who are already using it/or perhaps developing it is a better bet). >> >> 1. Can I get my solr instance to load data (fresh data for indexing) from a >> stream (imagine a mq kind of queue, or similar) ? Yes, with a little bit of work. >> 2. Can I host my solr instance to use hbase as the database/file system >> (read HDFS) ? Probably, but I doubt it will be fast. Local disk is usually the best. 100+ M rows is large but not unreasonable. >> 3. are there somewhere any reports available (as in benchmarks ) for a solr >> instance's performance ? You can probably search the web for these. I've personally seen several installs w/ 1B+ docs and subsecond search and faceting and heard of others. You might look at the stuff the Hathi trust has put up. >> 4. are there any APIs available which might help me apply ANSI sql kind of >> statements to my solr data ? No. Question back? What kinds of things are you trying to do? >> >> It would be great if people could help share their experience in the area... >> if it's too much trouble writing all of it, perhaps url would be easier... I >> welcome all kinds of help here... any advice/suggestions are good ... >> >> Looking forward to your viewpoints.. >> >> --raghav.. >> ** >> >> This message may contain confidential or proprietary information intended >> only for the use of the >> addressee(s) named above or may contain information that is legally >> privileged. If you are >> not the intended addressee, or the person responsible for delivering it to >> the intended addressee, >> you are hereby notified that reading, disseminating, distributing or copying >> this message is strictly >> prohibited. If you have received this message by mistake, please immediately >> notify us by >> replying to the message and delete the original message and any copies >> immediately thereafter. >> >> Thank you. >> ** >> >> CLLD >> > > > > -- Grant Ingersoll http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8
Re: Is Solr right for my business situation ?
When do you need to deploy? As I understand it, the spatial search in Solr is being rewritten and is slated for Solr 4.0, the release after next. The existing spatial search has some serious problems and is deprecated. Right now, I think the only way to get spatial search in Solr is to deploy a nightly snapshot from the active development on trunk. If you are deploying a year from now, that might change. There is not any support for SQL-like statements or for joins. The best practice for Solr is to think of your data as a single table, essentially creating a view from your database. The rows become Solr documents, the columns become Solr fields. wunder On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote: > I am sure these kind of questions keep coming to you guys, but I want to > raise the same question in a different context...my own business situation. > I am very very new to solr and though I have tried to read through the > documentation, I have nowhere near completing the whole read. > > The need is like this - > > We have a huge rdbms database/table. A single table perhaps houses 100+ > million rows. Though oracle is doing a fine job of handling the insertion and > updation of data, the querying is where our main concerns lie. Since we have > spatial data, the index building takes hours and hours for such tables. > > That's when we thought of moving away from standard rdbms and thought of > trying something different and fast. > My last week has been spent in a journey reading through bigtable to hadoop > to hbase, to hive and then finally landed on solr. As far as I am in my > tests, it looks pretty good, but I have a few unanswered questions still. > Trying this group for them :) (I am sure I can find some answers if I > read/google more on the topic, but now I m being lazy and feel asking the > people who are already using it/or perhaps developing it is a better bet). > > 1. Can I get my solr instance to load data (fresh data for indexing) from a > stream (imagine a mq kind of queue, or similar) ? > 2. Can I host my solr instance to use hbase as the database/file system (read > HDFS) ? > 3. are there somewhere any reports available (as in benchmarks ) for a solr > instance's performance ? > 4. are there any APIs available which might help me apply ANSI sql kind of > statements to my solr data ? > > It would be great if people could help share their experience in the area... > if it's too much trouble writing all of it, perhaps url would be easier... I > welcome all kinds of help here... any advice/suggestions are good ... > > Looking forward to your viewpoints.. > > --raghav.. > ** > > This message may contain confidential or proprietary information intended > only for the use of the > addressee(s) named above or may contain information that is legally > privileged. If you are > not the intended addressee, or the person responsible for delivering it to > the intended addressee, > you are hereby notified that reading, disseminating, distributing or copying > this message is strictly > prohibited. If you have received this message by mistake, please immediately > notify us by > replying to the message and delete the original message and any copies > immediately thereafter. > > Thank you. > ** > > CLLD >