RE: Is Solr right for my business situation ?

2010-09-30 Thread Dennis Gearon
You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra sraghven...@corelogic.com wrote:

 From: Sharma, Raghvendra sraghven...@corelogic.com
 Subject: RE: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 29, 2010, 9:40 AM
 Some questions.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 Do you think having multiple indexes could be a solution
 for this case ?? or do I really need to spend effort in
 denormalizing the data ?
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 --raghav..
 
 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?
 
 Thanks for the responses people.
 
 @Grant  
 
 1. can you show me some direction on that.. loading data
 from an incoming stream.. do I need some third party tools,
 or need to build something myself...
 
 4. I am basically attempting to build a very fast search
 interface for the existing data. The volume I mentioned is
 more like static one (data is already there). The sql
 statements I mentioned are daily updates coming. The good
 thing is that the history is not there, so the overall
 volume is not growing, but I need to apply the update
 statements. 
 
 One workaround I had in mind is, (though not so great
 performance) is to apply the updates to a copy of rdbms, and
 then feed the rdbms extract to solr.  Sounds like
 overkill, but I don't have another idea right now. Perhaps
 business discussions would yield something.
 
 @All -
 
 Some more questions guys.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 Looks like I m close to my solution.. :)
 
 --raghav
 
 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 
 Sent: Tuesday, September 28, 2010 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is Solr right for my business situation ?
 
 Inline.
 
 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
 
  When do you need to deploy?
  
  As I understand it, the spatial search in Solr is
 being rewritten and is slated for Solr 4.0, the release
 after next.
 
 It will be in 3.x, the next release
 
  
  The existing spatial search has some serious problems
 and is deprecated.
  
  Right now, I think the only way to get spatial search
 in Solr is to deploy a nightly snapshot from the active
 development on trunk. If you are deploying a year from now,
 that might change

RE: Is Solr right for my business situation ?

2010-09-30 Thread Sharma, Raghvendra
Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra sraghven...@corelogic.com wrote:

 From: Sharma, Raghvendra sraghven...@corelogic.com
 Subject: RE: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 29, 2010, 9:40 AM
 Some questions.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 Do you think having multiple indexes could be a solution
 for this case ?? or do I really need to spend effort in
 denormalizing the data ?
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 --raghav..
 
 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?
 
 Thanks for the responses people.
 
 @Grant  
 
 1. can you show me some direction on that.. loading data
 from an incoming stream.. do I need some third party tools,
 or need to build something myself...
 
 4. I am basically attempting to build a very fast search
 interface for the existing data. The volume I mentioned is
 more like static one (data is already there). The sql
 statements I mentioned are daily updates coming. The good
 thing is that the history is not there, so the overall
 volume is not growing, but I need to apply the update
 statements. 
 
 One workaround I had in mind is, (though not so great
 performance) is to apply the updates to a copy of rdbms, and
 then feed the rdbms extract to solr.  Sounds like
 overkill, but I don't have another idea right now. Perhaps
 business discussions would yield something.
 
 @All -
 
 Some more questions guys.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work

RE: Is Solr right for my business situation ?

2010-09-30 Thread Markus Jelsma
Recent versions supports sharding and handles distribution of your query and 
result set merging. The problem, it won't help you to join on separate 
`tables`. The fields you query need to be present in each shard or you'll end 
up with an HTTP 400 - undefined field error.

 

Indeed, there is no escape.
 
-Original message-
From: Sharma, Raghvendra sraghven...@corelogic.com
Sent: Thu 30-09-2010 20:07
To: solr-user@lucene.apache.org; 
Subject: RE: Is Solr right for my business situation ?

Thanks for the ideas.

I think after reading enough documentation and articles around solr and xml 
indexing in general, I have come around to understand that there is no escaping 
denormalization.

However, one tiny thought remains... perhaps my last shot at avoiding 
denormalization (of course its going to be a costly affair)..

I was reading about how solr can handle multiple cores and therefore multiple 
indexes.  Can there be a single search interface sending queries to these three 
cores ?? in that case, who would do load balancing ? the merging of the results 
?? and whether I would be running three instances of solr on my system(s) or 
only one can handle that..



-Original Message-
From: Dennis Gearon [mailto:gear...@sbcglobal.net] 
Sent: Thursday, September 30, 2010 9:25 PM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

You need to be able to query the database with the 'Mother of all Queries', 
i.e. one that completely flattens all tables into each row. 

In other words, the JOIN section of the query will have EVERY table in it, and 
depending on your schema, some of them twice or more.

Trying to do that with CSV, separate tables would require you to put those into 
your OWN database, then query against that, as above.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
 otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/29/10, Sharma, Raghvendra sraghven...@corelogic.com wrote:

 From: Sharma, Raghvendra sraghven...@corelogic.com
 Subject: RE: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Date: Wednesday, September 29, 2010, 9:40 AM
 Some questions.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 Do you think having multiple indexes could be a solution
 for this case ?? or do I really need to spend effort in
 denormalizing the data ?
 
 2. Further, loading into solr can use some perf tuning..
 any tips ? best practices ?
 
 3. Also, is there a way to specify a xslt at the server
 side, and make it default, i.e. whenever a response is
 returned, that xslt is applied to the response
 automatically...
 
 4. And last question for the day - :) there was one post
 saying that the spatial support is really basic in solr and
 is going to be improved in next versions... Can you ppl help
 me get a definitive yes or no on spatial support... in the
 current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...
 
 --raghav..
 
 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?
 
 Thanks for the responses people.
 
 @Grant  
 
 1. can you show me some direction on that.. loading data
 from an incoming stream.. do I need some third party tools,
 or need to build something myself...
 
 4. I am basically attempting to build a very fast search
 interface for the existing data. The volume I mentioned is
 more like static one (data is already there). The sql
 statements I mentioned are daily updates coming. The good
 thing is that the history is not there, so the overall
 volume is not growing, but I need to apply the update
 statements. 
 
 One workaround I had in mind is, (though not so great
 performance) is to apply the updates to a copy of rdbms, and
 then feed the rdbms extract to solr.  Sounds like
 overkill, but I don't have another idea right now. Perhaps
 business discussions would yield something.
 
 @All -
 
 Some more questions guys.  
 
 1. I have about 3-5 tables. Now designing schema.xml for a
 single table looks ok, but whats the direction for handling
 multiple table structures is something I am not sure about.
 Would it be like a big huge xml, wherein those three tables
 (assuming its three) would show up as three different
 tag-trees, nullable. 
 
 My source provides me a single flat file per table (tab
 delimited).
 
 2. Further, loading into solr can

RE: Is Solr right for my business situation ?

2010-09-29 Thread Sharma, Raghvendra
Some questions.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

Do you think having multiple indexes could be a solution for this case ?? or do 
I really need to spend effort in denormalizing the data ?

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

--raghav..

-Original Message-
From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com] 
Sent: Tuesday, September 28, 2010 11:45 AM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

Thanks for the responses people.

@Grant  

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements. 

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?
 
 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

 
 The existing spatial search has some serious problems and is deprecated.
 
 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.
 
 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

 
 wunder
 
 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion 
 and updation of data, the querying is where our main concerns lie.  Since we

Re: Is Solr right for my business situation ?

2010-09-29 Thread Erick Erickson
If at all possible, denormalize the data. Anytime you find yourself trying
to make Solr
behave like a database, the probability is high that you're mis-using Solr
or the DB.

Best
Erick

On Wed, Sep 29, 2010 at 12:40 PM, Sharma, Raghvendra 
sraghven...@corelogic.com wrote:

 Some questions.

 1. I have about 3-5 tables. Now designing schema.xml for a single table
 looks ok, but whats the direction for handling multiple table structures is
 something I am not sure about. Would it be like a big huge xml, wherein
 those three tables (assuming its three) would show up as three different
 tag-trees, nullable.

 My source provides me a single flat file per table (tab delimited).

 Do you think having multiple indexes could be a solution for this case ??
 or do I really need to spend effort in denormalizing the data ?

 2. Further, loading into solr can use some perf tuning.. any tips ? best
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it
 default, i.e. whenever a response is returned, that xslt is applied to the
 response automatically...

 4. And last question for the day - :) there was one post saying that the
 spatial support is really basic in solr and is going to be improved in next
 versions... Can you ppl help me get a definitive yes or no on spatial
 support... in the current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...

 --raghav..

 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?

 Thanks for the responses people.

 @Grant

 1. can you show me some direction on that.. loading data from an incoming
 stream.. do I need some third party tools, or need to build something
 myself...

 4. I am basically attempting to build a very fast search interface for the
 existing data. The volume I mentioned is more like static one (data is
 already there). The sql statements I mentioned are daily updates coming. The
 good thing is that the history is not there, so the overall volume is not
 growing, but I need to apply the update statements.

 One workaround I had in mind is, (though not so great performance) is to
 apply the updates to a copy of rdbms, and then feed the rdbms extract to
 solr.  Sounds like overkill, but I don't have another idea right now.
 Perhaps business discussions would yield something.

 @All -

 Some more questions guys.

 1. I have about 3-5 tables. Now designing schema.xml for a single table
 looks ok, but whats the direction for handling multiple table structures is
 something I am not sure about. Would it be like a big huge xml, wherein
 those three tables (assuming its three) would show up as three different
 tag-trees, nullable.

 My source provides me a single flat file per table (tab delimited).

 2. Further, loading into solr can use some perf tuning.. any tips ? best
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it
 default, i.e. whenever a response is returned, that xslt is applied to the
 response automatically...

 4. And last question for the day - :) there was one post saying that the
 spatial support is really basic in solr and is going to be improved in next
 versions... Can you ppl help me get a definitive yes or no on spatial
 support... in the current form, does it work on not ? I would store lat and
 long, and would need to make them searchable...

 Looks like I m close to my solution.. :)

 --raghav

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, September 28, 2010 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is Solr right for my business situation ?

 Inline.

 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

  When do you need to deploy?
 
  As I understand it, the spatial search in Solr is being rewritten and is
 slated for Solr 4.0, the release after next.

 It will be in 3.x, the next release

 
  The existing spatial search has some serious problems and is deprecated.
 
  Right now, I think the only way to get spatial search in Solr is to
 deploy a nightly snapshot from the active development on trunk. If you are
 deploying a year from now, that might change.
 
  There is not any support for SQL-like statements or for joins. The best
 practice for Solr is to think of your data as a single table, essentially
 creating a view from your database. The rows become Solr documents, the
 columns become Solr fields.

 There is now group-by capabilities in trunk as well, which may or may not
 help.

 
  wunder
 
  On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
  I am sure these kind of questions keep coming to you guys, but I want to
 raise the same question in a different context...my own business situation.
  I am very very new to solr and though I have tried to read through the
 documentation, I

Re: Is Solr right for my business situation ?

2010-09-29 Thread Lance Norskog
Some of these are big questions- try them in different emails.

On Wed, Sep 29, 2010 at 9:40 AM, Sharma, Raghvendra
sraghven...@corelogic.com wrote:
 Some questions.

 1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
 ok, but whats the direction for handling multiple table structures is 
 something I am not sure about. Would it be like a big huge xml, wherein those 
 three tables (assuming its three) would show up as three different tag-trees, 
 nullable.

 My source provides me a single flat file per table (tab delimited).

 Do you think having multiple indexes could be a solution for this case ?? or 
 do I really need to spend effort in denormalizing the data ?

 2. Further, loading into solr can use some perf tuning.. any tips ? best 
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it 
 default, i.e. whenever a response is returned, that xslt is applied to the 
 response automatically...

 4. And last question for the day - :) there was one post saying that the 
 spatial support is really basic in solr and is going to be improved in next 
 versions... Can you ppl help me get a definitive yes or no on spatial 
 support... in the current form, does it work on not ? I would store lat and 
 long, and would need to make them searchable...

 --raghav..

 -Original Message-
 From: Sharma, Raghvendra [mailto:sraghven...@corelogic.com]
 Sent: Tuesday, September 28, 2010 11:45 AM
 To: solr-user@lucene.apache.org
 Subject: RE: Is Solr right for my business situation ?

 Thanks for the responses people.

 @Grant

 1. can you show me some direction on that.. loading data from an incoming 
 stream.. do I need some third party tools, or need to build something 
 myself...

 4. I am basically attempting to build a very fast search interface for the 
 existing data. The volume I mentioned is more like static one (data is 
 already there). The sql statements I mentioned are daily updates coming. The 
 good thing is that the history is not there, so the overall volume is not 
 growing, but I need to apply the update statements.

 One workaround I had in mind is, (though not so great performance) is to 
 apply the updates to a copy of rdbms, and then feed the rdbms extract to 
 solr.  Sounds like overkill, but I don't have another idea right now. Perhaps 
 business discussions would yield something.

 @All -

 Some more questions guys.

 1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
 ok, but whats the direction for handling multiple table structures is 
 something I am not sure about. Would it be like a big huge xml, wherein those 
 three tables (assuming its three) would show up as three different tag-trees, 
 nullable.

 My source provides me a single flat file per table (tab delimited).

 2. Further, loading into solr can use some perf tuning.. any tips ? best 
 practices ?

 3. Also, is there a way to specify a xslt at the server side, and make it 
 default, i.e. whenever a response is returned, that xslt is applied to the 
 response automatically...

 4. And last question for the day - :) there was one post saying that the 
 spatial support is really basic in solr and is going to be improved in next 
 versions... Can you ppl help me get a definitive yes or no on spatial 
 support... in the current form, does it work on not ? I would store lat and 
 long, and would need to make them searchable...

 Looks like I m close to my solution.. :)

 --raghav

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Tuesday, September 28, 2010 1:05 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Is Solr right for my business situation ?

 Inline.

 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?

 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

 It will be in 3.x, the next release


 The existing spatial search has some serious problems and is deprecated.

 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying 
 a year from now, that might change.

 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

 There is now group-by capabilities in trunk as well, which may or may not 
 help.


 wunder

 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:

 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.

 The need is like this -

 We have a huge rdbms database

RE: Is Solr right for my business situation ?

2010-09-28 Thread Sharma, Raghvendra
Thanks for the responses people.

@Grant  

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements. 

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.  

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable. 

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?
 
 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

 
 The existing spatial search has some serious problems and is deprecated.
 
 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.
 
 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

 
 wunder
 
 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion 
 and updation of data, the querying is where our main concerns lie.  Since we 
 have spatial data, the index building takes hours and hours for such tables.
 
 That's when we thought of moving away from standard rdbms and thought of 
 trying something different and fast. 
 My last week has been spent in a journey reading through bigtable to hadoop 
 to hbase, to hive and then finally landed on solr. As far as I am in my 
 tests, it looks pretty good, but I have a few unanswered questions still. 
 Trying this group for them  :)  (I am sure I can find some answers if I 
 read/google more on the topic, but now I m being lazy and feel asking the 
 people who are already using it/or perhaps developing it is a better bet).
 
 1. Can I get my solr instance to load data (fresh data for indexing) from a 
 stream (imagine a mq kind of queue, or similar) ?

Yes, with a little bit of work.

 2. Can I host my solr instance to use hbase as the database/file system 
 (read HDFS) ?

Probably, but I doubt it will be fast.  Local disk is usually the best.  100+ M 
rows is large but not unreasonable.

 3. are there somewhere any reports available (as in benchmarks ) for a solr 
 instance's performance ? 

You can probably search the web for these.  I've personally seen several 
installs w/ 1B+ docs and subsecond search and faceting and heard of others.  
You might look at the stuff the Hathi trust has put up.  

 4. are there any APIs available which might help me

RE: Is Solr right for my business situation ?

2010-09-28 Thread Jonathan Rochkind
Staging the data in a non-Solr store sounds like a potentially reasonable 
idea to me. You might want to consider a NoSQL store of some kind like MongoDB 
perhaps, instead of an rdbms. 

The way to think about Solr is not as a store or a database -- it's an index 
for serving your application. That's also the way to think about how to get 
your multiple tables in there -- denormalize, denormalize, denormalize.  You 
need to think about what you actually need to search over, and build your index 
to serve that efficiently, rather than thinking about normalization or data 
modelling the way we are used to with rdbms's, it's a different way of 
thinking.  

A Solr index basically gives you one collection of documents. But the documents 
can all have different fields -- so you _could_ (but probably don't want to) 
essentially put all your tables in there with unique fields --they're all in 
the same index, they're all just documents, but some have a table1_title and 
table1_author, and others have no data in those fields but a table2_productName 
and a table2_price.  Then if you want to query on just one type of thing, you 
just query on those fields.  Except... you don't get any joins.  Which is why 
you probably don't want to do that after all, it probably won't serve your 
needs. 

Figuring out the right way to model your data in Solr can be tricky, and it is 
sometimes hard to do exactly what you want. Solr isn't an rdbms, and in some 
ways isn't as powerful as an rdbms -- in the sense of being as flexible with 
what kinds of queries you can run on any given data.   What it does is give you 
very fast access to inverted index lookups and set combinations and facetting 
that would be very hard to do efficiently in an rdbms. It is a trade-off.  But 
there's not really a general answer to how do I take these dozen rdbms tables 
and store them in Solr the best way? -- it depends on what kinds of searching 
you need to support and the nature of your data. 

From: Sharma, Raghvendra [sraghven...@corelogic.com]
Sent: Tuesday, September 28, 2010 2:15 AM
To: solr-user@lucene.apache.org
Subject: RE: Is Solr right for my business situation ?

Thanks for the responses people.

@Grant

1. can you show me some direction on that.. loading data from an incoming 
stream.. do I need some third party tools, or need to build something myself...

4. I am basically attempting to build a very fast search interface for the 
existing data. The volume I mentioned is more like static one (data is already 
there). The sql statements I mentioned are daily updates coming. The good thing 
is that the history is not there, so the overall volume is not growing, but I 
need to apply the update statements.

One workaround I had in mind is, (though not so great performance) is to apply 
the updates to a copy of rdbms, and then feed the rdbms extract to solr.  
Sounds like overkill, but I don't have another idea right now. Perhaps business 
discussions would yield something.

@All -

Some more questions guys.

1. I have about 3-5 tables. Now designing schema.xml for a single table looks 
ok, but whats the direction for handling multiple table structures is something 
I am not sure about. Would it be like a big huge xml, wherein those three 
tables (assuming its three) would show up as three different tag-trees, 
nullable.

My source provides me a single flat file per table (tab delimited).

2. Further, loading into solr can use some perf tuning.. any tips ? best 
practices ?

3. Also, is there a way to specify a xslt at the server side, and make it 
default, i.e. whenever a response is returned, that xslt is applied to the 
response automatically...

4. And last question for the day - :) there was one post saying that the 
spatial support is really basic in solr and is going to be improved in next 
versions... Can you ppl help me get a definitive yes or no on spatial 
support... in the current form, does it work on not ? I would store lat and 
long, and would need to make them searchable...

Looks like I m close to my solution.. :)

--raghav

-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Tuesday, September 28, 2010 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Is Solr right for my business situation ?

Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?

 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release


 The existing spatial search has some serious problems and is deprecated.

 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.

 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table

Re: Is Solr right for my business situation ?

2010-09-27 Thread Walter Underwood
When do you need to deploy?

As I understand it, the spatial search in Solr is being rewritten and is slated 
for Solr 4.0, the release after next.

The existing spatial search has some serious problems and is deprecated.

Right now, I think the only way to get spatial search in Solr is to deploy a 
nightly snapshot from the active development on trunk. If you are deploying a 
year from now, that might change.

There is not any support for SQL-like statements or for joins. The best 
practice for Solr is to think of your data as a single table, essentially 
creating a view from your database. The rows become Solr documents, the columns 
become Solr fields.

wunder

On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:

 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion and 
 updation of data, the querying is where our main concerns lie.  Since we have 
 spatial data, the index building takes hours and hours for such tables.
 
 That's when we thought of moving away from standard rdbms and thought of 
 trying something different and fast. 
 My last week has been spent in a journey reading through bigtable to hadoop 
 to hbase, to hive and then finally landed on solr. As far as I am in my 
 tests, it looks pretty good, but I have a few unanswered questions still. 
 Trying this group for them  :)  (I am sure I can find some answers if I 
 read/google more on the topic, but now I m being lazy and feel asking the 
 people who are already using it/or perhaps developing it is a better bet).
 
 1. Can I get my solr instance to load data (fresh data for indexing) from a 
 stream (imagine a mq kind of queue, or similar) ?
 2. Can I host my solr instance to use hbase as the database/file system (read 
 HDFS) ?
 3. are there somewhere any reports available (as in benchmarks ) for a solr 
 instance's performance ? 
 4. are there any APIs available which might help me apply ANSI sql kind of 
 statements to my solr data ? 
 
 It would be great if people could help share their experience in the area... 
 if it's too much trouble writing all of it, perhaps url would be easier... I 
 welcome all kinds of help here... any advice/suggestions are good ...
 
 Looking forward to your viewpoints..
 
 --raghav..
 **
  
 This message may contain confidential or proprietary information intended 
 only for the use of the 
 addressee(s) named above or may contain information that is legally 
 privileged. If you are 
 not the intended addressee, or the person responsible for delivering it to 
 the intended addressee, 
 you are hereby notified that reading, disseminating, distributing or copying 
 this message is strictly 
 prohibited. If you have received this message by mistake, please immediately 
 notify us by  
 replying to the message and delete the original message and any copies 
 immediately thereafter. 
 
 Thank you. 
 **
  
 CLLD
 






Re: Is Solr right for my business situation ?

2010-09-27 Thread Grant Ingersoll
Inline.

On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:

 When do you need to deploy?
 
 As I understand it, the spatial search in Solr is being rewritten and is 
 slated for Solr 4.0, the release after next.

It will be in 3.x, the next release

 
 The existing spatial search has some serious problems and is deprecated.
 
 Right now, I think the only way to get spatial search in Solr is to deploy a 
 nightly snapshot from the active development on trunk. If you are deploying a 
 year from now, that might change.
 
 There is not any support for SQL-like statements or for joins. The best 
 practice for Solr is to think of your data as a single table, essentially 
 creating a view from your database. The rows become Solr documents, the 
 columns become Solr fields.

There is now group-by capabilities in trunk as well, which may or may not help.

 
 wunder
 
 On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra wrote:
 
 I am sure these kind of questions keep coming to you guys, but I want to 
 raise the same question in a different context...my own business situation.
 I am very very new to solr and though I have tried to read through the 
 documentation, I have nowhere near completing the whole read.
 
 The need is like this - 
 
 We have a huge rdbms database/table. A single table perhaps houses 100+ 
 million rows. Though oracle is doing a fine job of handling the insertion 
 and updation of data, the querying is where our main concerns lie.  Since we 
 have spatial data, the index building takes hours and hours for such tables.
 
 That's when we thought of moving away from standard rdbms and thought of 
 trying something different and fast. 
 My last week has been spent in a journey reading through bigtable to hadoop 
 to hbase, to hive and then finally landed on solr. As far as I am in my 
 tests, it looks pretty good, but I have a few unanswered questions still. 
 Trying this group for them  :)  (I am sure I can find some answers if I 
 read/google more on the topic, but now I m being lazy and feel asking the 
 people who are already using it/or perhaps developing it is a better bet).
 
 1. Can I get my solr instance to load data (fresh data for indexing) from a 
 stream (imagine a mq kind of queue, or similar) ?

Yes, with a little bit of work.

 2. Can I host my solr instance to use hbase as the database/file system 
 (read HDFS) ?

Probably, but I doubt it will be fast.  Local disk is usually the best.  100+ M 
rows is large but not unreasonable.

 3. are there somewhere any reports available (as in benchmarks ) for a solr 
 instance's performance ? 

You can probably search the web for these.  I've personally seen several 
installs w/ 1B+ docs and subsecond search and faceting and heard of others.  
You might look at the stuff the Hathi trust has put up.  

 4. are there any APIs available which might help me apply ANSI sql kind of 
 statements to my solr data ? 

No.  Question back?  What kinds of things are you trying to do?

 
 It would be great if people could help share their experience in the area... 
 if it's too much trouble writing all of it, perhaps url would be easier... I 
 welcome all kinds of help here... any advice/suggestions are good ...
 
 Looking forward to your viewpoints..
 
 --raghav..
 **
  
 This message may contain confidential or proprietary information intended 
 only for the use of the 
 addressee(s) named above or may contain information that is legally 
 privileged. If you are 
 not the intended addressee, or the person responsible for delivering it to 
 the intended addressee, 
 you are hereby notified that reading, disseminating, distributing or copying 
 this message is strictly 
 prohibited. If you have received this message by mistake, please immediately 
 notify us by  
 replying to the message and delete the original message and any copies 
 immediately thereafter. 
 
 Thank you. 
 **
  
 CLLD
 
 
 
 
 

--
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8



Re: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind

Grant Ingersoll wrote:


There is now group-by capabilities in trunk as well, which may or may not help.
  
Really, the field collapsing stuff has been committed to trunk finally? 
Or are you talking about something else?


If it's the field collapsing stuff, and it's been committed to trunk, 
does that mean it'll be in the 3.0 release?


Jonathan

  


Re: Is Solr right for my business situation ?

2010-09-27 Thread Ravi Julapalli
Hi Jonathan,

Field collpasing is available in 1.4 by applying patch 
https://issues.apache.org/jira/browse/SOLR-236

-Ravi





From: Jonathan Rochkind rochk...@jhu.edu
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 9:18:20 PM
Subject: Re: Is Solr right for my business situation ?

Grant Ingersoll wrote:
 
 There is now group-by capabilities in trunk as well, which may or may not 
help.
  
Really, the field collapsing stuff has been committed to trunk finally? Or are 
you talking about something else?

If it's the field collapsing stuff, and it's been committed to trunk, does that 
mean it'll be in the 3.0 release?

Jonathan

  


  

Re: Is Solr right for my business situation ?

2010-09-27 Thread Jonathan Rochkind
Right, I know, I was curious about it's current closeness to being in 
main distro, not a patch.  Among other things, when those who know 
better decide it goes in core distro, that makes me more comfortable 
that they've decided it works acceptably, and also makes more more 
comfortable that it will continue to be supported in _future_ versions 
without someone having to prepare a new patch.


Ravi Julapalli wrote:

Hi Jonathan,

Field collpasing is available in 1.4 by applying patch 
https://issues.apache.org/jira/browse/SOLR-236


-Ravi





From: Jonathan Rochkind rochk...@jhu.edu
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Mon, September 27, 2010 9:18:20 PM
Subject: Re: Is Solr right for my business situation ?

Grant Ingersoll wrote:
  
There is now group-by capabilities in trunk as well, which may or may not 


help.
  
 

Really, the field collapsing stuff has been committed to trunk finally? Or are 
you talking about something else?


If it's the field collapsing stuff, and it's been committed to trunk, does that 
mean it'll be in the 3.0 release?


Jonathan

  
 




  
  


Re: Is Solr right for my business situation ?

2010-09-27 Thread PeterKerk

@Walter Underwood:

Walter Underwood wrote:
 
 Right now, I think the only way to get spatial search in Solr is to deploy
 a nightly snapshot from the active development on trunk.
 

Could you give me the link to this trunk, I need it very much!

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592330.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is Solr right for my business situation ?

2010-09-27 Thread Dennis Gearon
Wow, that is a relief!

I was going to have to look at ElasticSearch instead.


Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Mon, 9/27/10, Grant Ingersoll gsing...@apache.org wrote:

 From: Grant Ingersoll gsing...@apache.org
 Subject: Re: Is Solr right for my business situation ?
 To: solr-user@lucene.apache.org
 Date: Monday, September 27, 2010, 12:35 PM
 Inline.
 
 On Sep 27, 2010, at 1:26 PM, Walter Underwood wrote:
 
  When do you need to deploy?
  
  As I understand it, the spatial search in Solr is
 being rewritten and is slated for Solr 4.0, the release
 after next.
 
 It will be in 3.x, the next release
 
  
  The existing spatial search has some serious problems
 and is deprecated.
  
  Right now, I think the only way to get spatial search
 in Solr is to deploy a nightly snapshot from the active
 development on trunk. If you are deploying a year from now,
 that might change.
  
  There is not any support for SQL-like statements or
 for joins. The best practice for Solr is to think of your
 data as a single table, essentially creating a view from
 your database. The rows become Solr documents, the columns
 become Solr fields.
 
 There is now group-by capabilities in trunk as well, which
 may or may not help.
 
  
  wunder
  
  On Sep 27, 2010, at 9:34 AM, Sharma, Raghvendra
 wrote:
  
  I am sure these kind of questions keep coming to
 you guys, but I want to raise the same question in a
 different context...my own business situation.
  I am very very new to solr and though I have tried
 to read through the documentation, I have nowhere near
 completing the whole read.
  
  The need is like this - 
  
  We have a huge rdbms database/table. A single
 table perhaps houses 100+ million rows. Though oracle is
 doing a fine job of handling the insertion and updation of
 data, the querying is where our main concerns lie. 
 Since we have spatial data, the index building takes hours
 and hours for such tables.
  
  That's when we thought of moving away from
 standard rdbms and thought of trying something different and
 fast. 
  My last week has been spent in a journey reading
 through bigtable to hadoop to hbase, to hive and then
 finally landed on solr. As far as I am in my tests, it looks
 pretty good, but I have a few unanswered questions still.
 Trying this group for them  :)  (I am sure I can
 find some answers if I read/google more on the topic, but
 now I m being lazy and feel asking the people who are
 already using it/or perhaps developing it is a better bet).
  
  1. Can I get my solr instance to load data (fresh
 data for indexing) from a stream (imagine a mq kind of
 queue, or similar) ?
 
 Yes, with a little bit of work.
 
  2. Can I host my solr instance to use hbase as the
 database/file system (read HDFS) ?
 
 Probably, but I doubt it will be fast.  Local disk is
 usually the best.  100+ M rows is large but not
 unreasonable.
 
  3. are there somewhere any reports available (as
 in benchmarks ) for a solr instance's performance ? 
 
 You can probably search the web for these.  I've
 personally seen several installs w/ 1B+ docs and subsecond
 search and faceting and heard of others.  You might
 look at the stuff the Hathi trust has put up.  
 
  4. are there any APIs available which might help
 me apply ANSI sql kind of statements to my solr data ? 
 
 No.  Question back?  What kinds of things are you
 trying to do?
 
  
  It would be great if people could help share their
 experience in the area... if it's too much trouble writing
 all of it, perhaps url would be easier... I welcome all
 kinds of help here... any advice/suggestions are good ...
  
  Looking forward to your viewpoints..
  
  --raghav..
 
 **
 
  This message may contain confidential or
 proprietary information intended only for the use of the 
  addressee(s) named above or may contain
 information that is legally privileged. If you are 
  not the intended addressee, or the person
 responsible for delivering it to the intended addressee, 
  you are hereby notified that reading,
 disseminating, distributing or copying this message is
 strictly 
  prohibited. If you have received this message by
 mistake, please immediately notify us by  
  replying to the message and delete the original
 message and any copies immediately thereafter. 
  
  Thank you. 
 
 **
 
  CLLD
  
  
  
  
  
 
 --
 Grant Ingersoll
 http://lucenerevolution.org Apache Lucene/Solr
 Conference, Boston Oct 7-8
 



Re: Is Solr right for my business situation ?

2010-09-27 Thread PeterKerk

Ah, totally looked over that news: spatial search in 3.x! :-D :-D

Any idea already when this will be released? 

Awesome to hear that it has been moved forward! :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-Solr-right-for-our-project-tp1589927p1592448.html
Sent from the Solr - User mailing list archive at Nabble.com.