Re: Solr + Federated Search Question
Thanks Ahmet. Yay! New term :) Although it does look like "federated" and "metasearch" can be used interchangeably. Alejandro On Thu, Oct 2, 2014 at 2:37 PM, Ahmet Arslan wrote: > Hi Alejandro, > > So your example is better called as "metasearch". Here a quotation from a > book. > > "Instead of retrieving information from a single information source using > one search engine, one can utilize multiple search engines or a single > search engine retrieving documents from a plethora of document collections. > A scenario where multiple engines are used is known as metasearch, while > the scenario where a single engine retrieves from multiple collections is > known as federation. In both these scenarios, the final result of the > retrieval effort needs to be a single, unified ranking of documents, based > on several ranked lists." > > Ahmet > > > On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana < > acalbaz...@gmail.com> wrote: > Ahmet,Jeff, > > Thanks. Some terms are a bit overloaded. By "federated", I do mean the > ability to query multiple, disparate, repositories. So, no. All of my > data would not necessarily be in Solr. Solr would be one of several - > databases, filesystems, document stores, etc... that I would like to > "plug-in". The content in each repository would be of different types (the > shape/schema of the content would differ significantly). > > Thanks, > > Alejandro > > > > > On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky > wrote: > > > Alejandro, you'll have to clarify how you are using the term "federated > > search". I mean, technically Ahmet is correct in that Solr queries can be > > fanned out to shards and the results from each shard aggregated > > ("federated") into a single result list, but... more traditionally, > > "federated" refers to "disparate" databases or search engines. > > > > See: > > http://en.wikipedia.org/wiki/Federated_search > > > > So, please tell us a little more about what you are really trying to do. > > > > I mean, is all of your data in Solr, in multiple collections, or on > > multiple Solr servers, or... is only some of your data in Solr and some > is > > in other search engines? > > > > Another approach taken with Solr is that indeed all of your source data > > may be in "disparate databases", but you perform an ETL (Extract, > > Transform, and Load) process to ingest all of that data into Solr and > then > > simply directly search the data within Solr. > > > > -- Jack Krupansky > > > > -Original Message- From: Ahmet Arslan > > Sent: Wednesday, October 1, 2014 9:35 AM > > To: solr-user@lucene.apache.org > > Subject: Re: Solr + Federated Search Question > > > > Hi, > > > > Federation is possible. Solr has distributed search support with shards > > parameter. > > > > Ahmet > > > > > > > > On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana < > > acalbaz...@gmail.com> wrote: > > Hello, > > > > I have a general question about Solr in a federated search context. I > > understand that Solr does not do federated search and that different > tools > > are often used to incorporate Solr indexes into a federated/enterprise > > search solution. Does anyone have recommendations on any products (open > > source or otherwise) that addresses this space? > > > > Thanks, > > > > Alejandro > > > >
Re: Solr + Federated Search Question
Alexandre, Thanks. I will have a look. Alejandro On Wed, Oct 1, 2014 at 3:03 PM, Alexandre Rafalovitch wrote: > http://project.carrot2.org/ is worth having a look at. It supports > Solr well. In fact, a subset of it is shipped with Solr > > Regards, >Alex. > Personal: http://www.outerthoughts.com/ and @arafalov > Solr resources and newsletter: http://www.solr-start.com/ and @solrstart > Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 > > > On 1 October 2014 09:29, Alejandro Calbazana wrote: > > Hello, > > > > I have a general question about Solr in a federated search context. I > > understand that Solr does not do federated search and that different > tools > > are often used to incorporate Solr indexes into a federated/enterprise > > search solution. Does anyone have recommendations on any products (open > > source or otherwise) that addresses this space? > > > > Thanks, > > > > Alejandro >
Re: Solr + Federated Search Question
Ahmet,Jeff, Thanks. Some terms are a bit overloaded. By "federated", I do mean the ability to query multiple, disparate, repositories. So, no. All of my data would not necessarily be in Solr. Solr would be one of several - databases, filesystems, document stores, etc... that I would like to "plug-in". The content in each repository would be of different types (the shape/schema of the content would differ significantly). Thanks, Alejandro On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky wrote: > Alejandro, you'll have to clarify how you are using the term "federated > search". I mean, technically Ahmet is correct in that Solr queries can be > fanned out to shards and the results from each shard aggregated > ("federated") into a single result list, but... more traditionally, > "federated" refers to "disparate" databases or search engines. > > See: > http://en.wikipedia.org/wiki/Federated_search > > So, please tell us a little more about what you are really trying to do. > > I mean, is all of your data in Solr, in multiple collections, or on > multiple Solr servers, or... is only some of your data in Solr and some is > in other search engines? > > Another approach taken with Solr is that indeed all of your source data > may be in "disparate databases", but you perform an ETL (Extract, > Transform, and Load) process to ingest all of that data into Solr and then > simply directly search the data within Solr. > > -- Jack Krupansky > > -Original Message- From: Ahmet Arslan > Sent: Wednesday, October 1, 2014 9:35 AM > To: solr-user@lucene.apache.org > Subject: Re: Solr + Federated Search Question > > Hi, > > Federation is possible. Solr has distributed search support with shards > parameter. > > Ahmet > > > > On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana < > acalbaz...@gmail.com> wrote: > Hello, > > I have a general question about Solr in a federated search context. I > understand that Solr does not do federated search and that different tools > are often used to incorporate Solr indexes into a federated/enterprise > search solution. Does anyone have recommendations on any products (open > source or otherwise) that addresses this space? > > Thanks, > > Alejandro >
Solr + Federated Search Question
Hello, I have a general question about Solr in a federated search context. I understand that Solr does not do federated search and that different tools are often used to incorporate Solr indexes into a federated/enterprise search solution. Does anyone have recommendations on any products (open source or otherwise) that addresses this space? Thanks, Alejandro
Re: Computing Results So That They are Returned in Search Results
So here is my use case with a little more detail. I'm working with recurring events. Each event has an expression associated with it that defines its recurrence pattern. For example, monthly, daily, yearly... The event has metadata associated with it that is searchable. When a user performs a search, they can match on various metadata fields, but the query can also span a range of dates. If a match occurs, I'd like to unwind the expression into the instances specified by the pattern and return these "virtual" instances as results. Right now, I'm post processing data to hammer out the results that fit the window of time specified in the query, but this moves sorting and pagination out of the Solr tier. I'd like to see if I can get it to stay there :) Post processing also prohibits me from faceting which would be extremely useful. I'm trying to avoid heavy post processing if I can. Given the nature of the data, its not really feasible for me to pre-assemble instance data and index since I don't know the window of time a user will be looking at. Thanks, Alejandro On Wed, Oct 30, 2013 at 6:35 PM, Upayavira wrote: > Also note that function queries only return numbers (given their origin > in scoring). They cannot be used to create virtual string or text > fields. > > Upayavira > > On Wed, Oct 30, 2013, at 05:19 PM, Jack Krupansky wrote: > > A function query is simply returning a calculated result based on > > existing > > data - no new fields required. > > > > Did you actually want to precompute a value, store it in the index, and > > then > > query on it? If so, you could do that indexing with a custom or scripted > > update processor. > > > > Flesh out an example of exactly what you want. > > > > -- Jack Krupansky > > > > -Original Message- > > From: Alejandro Calbazana > > Sent: Wednesday, October 30, 2013 12:46 PM > > To: solr-user@lucene.apache.org > > Subject: Re: Computing Results So That They are Returned in Search > > Results > > > > Sounds really close to what I'm looking for, but this sounds like it > > would > > result in a new field on a document (or a new value for a field defined > > to > > hold the result of a function). Would it be possible for a function > > query > > to produce a new document so that I can associate the computed value with > > it? > > > > Thanks, > > > > Alejandro > > > > > > On Wed, Oct 30, 2013 at 12:05 PM, Jack Krupansky > > wrote: > > > > > You could create a custom "value source" and then use it in a function > > > query embedded in your return fields list (fl). > > > > > > So, the function query could use a function (value source) that takes a > > > field, fetches its value, performs some arbitrary calculation, and then > > > returns that value. > > > > > > fl=id,name,my-func(field1),my-**func(field2) > > > > > > -- Jack Krupansky > > > > > > -Original Message- From: Alejandro Calbazana > > > Sent: Wednesday, October 30, 2013 10:10 AM > > > To: solr-user@lucene.apache.org > > > Subject: Computing Results So That They are Returned in Search Results > > > > > > I'd like to throw out a design question and see if its possible to > solve > > > this with Solr. > > > > > > I have a set of data that is computed that I'd like to make searchable. > > > Ideally, I'd like to have all documents indexed and call it the day, > but > > > the nature of the data is such that it needs to be computed given a > > > definition. I'm interested in searching on definitions and then > creating > > > results on the fly that are calculated based on something embedded in > the > > > definition. > > > > > > Is it possible to embed this calculation login into Solr's result > handling > > > process? I know this sounds exotic, but the nature of the data is such > > > that I can't index these calculated documents because I don't know what > > > the > > > boundary is and specifiying an arbitrary number isn't ideal. > > > > > > Has anyone run across something like this? > > > > > > Thanks, > > > > > > Alejandr > > > > > >
Re: Computing Results So That They are Returned in Search Results
Sounds really close to what I'm looking for, but this sounds like it would result in a new field on a document (or a new value for a field defined to hold the result of a function). Would it be possible for a function query to produce a new document so that I can associate the computed value with it? Thanks, Alejandro On Wed, Oct 30, 2013 at 12:05 PM, Jack Krupansky wrote: > You could create a custom "value source" and then use it in a function > query embedded in your return fields list (fl). > > So, the function query could use a function (value source) that takes a > field, fetches its value, performs some arbitrary calculation, and then > returns that value. > > fl=id,name,my-func(field1),my-**func(field2) > > -- Jack Krupansky > > -Original Message- From: Alejandro Calbazana > Sent: Wednesday, October 30, 2013 10:10 AM > To: solr-user@lucene.apache.org > Subject: Computing Results So That They are Returned in Search Results > > I'd like to throw out a design question and see if its possible to solve > this with Solr. > > I have a set of data that is computed that I'd like to make searchable. > Ideally, I'd like to have all documents indexed and call it the day, but > the nature of the data is such that it needs to be computed given a > definition. I'm interested in searching on definitions and then creating > results on the fly that are calculated based on something embedded in the > definition. > > Is it possible to embed this calculation login into Solr's result handling > process? I know this sounds exotic, but the nature of the data is such > that I can't index these calculated documents because I don't know what the > boundary is and specifiying an arbitrary number isn't ideal. > > Has anyone run across something like this? > > Thanks, > > Alejandr >
Computing Results So That They are Returned in Search Results
I'd like to throw out a design question and see if its possible to solve this with Solr. I have a set of data that is computed that I'd like to make searchable. Ideally, I'd like to have all documents indexed and call it the day, but the nature of the data is such that it needs to be computed given a definition. I'm interested in searching on definitions and then creating results on the fly that are calculated based on something embedded in the definition. Is it possible to embed this calculation login into Solr's result handling process? I know this sounds exotic, but the nature of the data is such that I can't index these calculated documents because I don't know what the boundary is and specifiying an arbitrary number isn't ideal. Has anyone run across something like this? Thanks, Alejandr
Many Dynamic Fields + Indexing Strategy
Hi, I have an application that has a fair number of dynamic fields in addition to static fields. The use case is that a customer can create any number of dynamic fields and associate them with domain objects that we then pull into an indexed document. I have no way to know these fields in advance and the expectation is that these fields are searchable using a field/value query. It is a multi-tenant environment and it is possible that there could be a high volume of dynamic fields created. My question is if there is a reasonable indexing strategy that can be used to accommodate such a use case. My concern is that I can end up with a large number of dynamic fields which would bring querying and full indexing to a slow down. Through some testing, I've created unique dynamic fields and got into the 50K - 100K range when my JVM began to behave poorly and go OOM. I understand why this happens but I'm interested in how to protect against this. My only thought at the moment is to split my single index into multiple cores - one per tenant. Has anyone else had this requirement? How did you handle it? My schema is pretty much what I've described. A handful of static fields with the stock dynamic field pattern definitions. I am using Solr 4.2.1. Thanks, Al
tlog after commit
Quick question... Should I still see tlog files after a hard commit? I'm trying to test soft commit and hard commits and I was under the impression that tlog would be removed after a hard commit where, in the case of soft commits, I would still see them. Thanks, Al
Federated Search Design Question
Hi, I have a general design question about federated search that I'd like to get some thoughts on. I have several line of business applications that manage their own data. There is a need to search across these LOB apps, but each of them have different authorization schemes in terms of allowing users access to data. None of this data lives in Solr at the moment. Ideally, everyone would push their data to Solr and we'd rationalize a common ACL model for authorization. Everything would be relatively straightforward. Unfortunately, I'm not going to be able to solve the ACL problem in my timeline. As an alternative, one consideration is to use Solr as soft of a cache where data is pulled from individual endpoints and stored. A final query would be made against results stored in Solr for combined results. Has anyone used Solr in this way? I understand that this might be an unusual usage, results are likely going to be thrown away as queries change, and there is overhead in committing. If results were pushed into memory, that might be enough for this purpose. If there alternatives, I'm opened to suggestion. Thanks! Al
DIH + Solr Cloud
Hi, Quick question about data import handlers in Solr cloud. Does anyone use more than one instance to support the DIH process? Or is the typical setup to have one box setup as only the DIH and keep this responsibility outside of the Solr cloud environment? I'm just trying to get picture of his this is typically deployed. Thanks! Alejandro
Data Import Handler Help
Hi, I'm looking for a bit of guidance in implementing a data import handler for mongodb. I am using https://github.com/sucode/solrMongoDBImporter/blob/master/README.md as a starting point, and I can get full imports working properly with a few adjustments to the source. The problem comes in when I try delta imports. After adding code to support delta queries and looking at how the sql import handler works, I get deltas reads but the counts grow out of control. Its as if DocBuilder does not know when to stop processing. Example: I have one doc to be read but I get 2 docs added/updated. Has anyone seen this before? Using 4.2.0. Thanks