Re: Solr + Federated Search Question

2014-10-02 Thread Alejandro Calbazana
Thanks Ahmet.  Yay!  New term :)  Although it does look like "federated"
and "metasearch" can be  used interchangeably.

Alejandro

On Thu, Oct 2, 2014 at 2:37 PM, Ahmet Arslan 
wrote:

> Hi Alejandro,
>
> So your example is better called as "metasearch". Here a quotation from a
> book.
>
> "Instead of retrieving information from a single information source using
> one search engine, one can utilize multiple search engines or a single
> search engine retrieving documents from a plethora of document collections.
> A scenario where multiple engines are used is known as metasearch, while
> the scenario where a single engine retrieves from multiple collections is
> known as federation. In both these scenarios, the final result of the
> retrieval effort needs to be a single, unified ranking of documents, based
> on several ranked lists."
>
> Ahmet
>
>
> On Thursday, October 2, 2014 7:29 PM, Alejandro Calbazana <
> acalbaz...@gmail.com> wrote:
> Ahmet,Jeff,
>
> Thanks.  Some terms are a bit overloaded.  By "federated", I do mean the
> ability to query multiple, disparate, repositories.  So, no.  All of my
> data would not necessarily be in Solr.  Solr would be one of several -
> databases, filesystems, document stores, etc...  that I would like to
> "plug-in".  The content in each repository would be of different types (the
> shape/schema of the content would differ significantly).
>
> Thanks,
>
> Alejandro
>
>
>
>
> On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky 
> wrote:
>
> > Alejandro, you'll have to clarify how you are using the term "federated
> > search". I mean, technically Ahmet is correct in that Solr queries can be
> > fanned out to shards and the results from each shard aggregated
> > ("federated") into a single result list, but... more traditionally,
> > "federated" refers to "disparate" databases or search engines.
> >
> > See:
> > http://en.wikipedia.org/wiki/Federated_search
> >
> > So, please tell us a little more about what you are really trying to do.
> >
> > I mean, is all of your data in Solr, in multiple collections, or on
> > multiple Solr servers, or... is only some of your data in Solr and some
> is
> > in other search engines?
> >
> > Another approach taken with Solr is that indeed all of your source data
> > may be in "disparate databases", but you perform an ETL (Extract,
> > Transform, and Load) process to ingest all of that data into Solr and
> then
> > simply directly search the data within Solr.
> >
> > -- Jack Krupansky
> >
> > -Original Message- From: Ahmet Arslan
> > Sent: Wednesday, October 1, 2014 9:35 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr + Federated Search Question
> >
> > Hi,
> >
> > Federation is possible. Solr has distributed search support with shards
> > parameter.
> >
> > Ahmet
> >
> >
> >
> > On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana <
> > acalbaz...@gmail.com> wrote:
> > Hello,
> >
> > I have a general question about Solr in a federated search context.  I
> > understand that Solr does not do federated search and that  different
> tools
> > are often used to incorporate Solr indexes into a federated/enterprise
> > search solution.  Does anyone have recommendations on any products (open
> > source or otherwise) that addresses this space?
> >
> > Thanks,
> >
> > Alejandro
> >
>
>


Re: Solr + Federated Search Question

2014-10-02 Thread Alejandro Calbazana
Alexandre,

Thanks.  I will have a look.

Alejandro

On Wed, Oct 1, 2014 at 3:03 PM, Alexandre Rafalovitch 
wrote:

> http://project.carrot2.org/ is worth having a look at. It supports
> Solr well. In fact, a subset of it is shipped with Solr
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 1 October 2014 09:29, Alejandro Calbazana  wrote:
> > Hello,
> >
> > I have a general question about Solr in a federated search context.  I
> > understand that Solr does not do federated search and that  different
> tools
> > are often used to incorporate Solr indexes into a federated/enterprise
> > search solution.  Does anyone have recommendations on any products (open
> > source or otherwise) that addresses this space?
> >
> > Thanks,
> >
> > Alejandro
>


Re: Solr + Federated Search Question

2014-10-02 Thread Alejandro Calbazana
Ahmet,Jeff,

Thanks.  Some terms are a bit overloaded.  By "federated", I do mean the
ability to query multiple, disparate, repositories.  So, no.  All of my
data would not necessarily be in Solr.  Solr would be one of several -
databases, filesystems, document stores, etc...  that I would like to
"plug-in".  The content in each repository would be of different types (the
shape/schema of the content would differ significantly).

Thanks,

Alejandro

On Wed, Oct 1, 2014 at 9:47 AM, Jack Krupansky 
wrote:

> Alejandro, you'll have to clarify how you are using the term "federated
> search". I mean, technically Ahmet is correct in that Solr queries can be
> fanned out to shards and the results from each shard aggregated
> ("federated") into a single result list, but... more traditionally,
> "federated" refers to "disparate" databases or search engines.
>
> See:
> http://en.wikipedia.org/wiki/Federated_search
>
> So, please tell us a little more about what you are really trying to do.
>
> I mean, is all of your data in Solr, in multiple collections, or on
> multiple Solr servers, or... is only some of your data in Solr and some is
> in other search engines?
>
> Another approach taken with Solr is that indeed all of your source data
> may be in "disparate databases", but you perform an ETL (Extract,
> Transform, and Load) process to ingest all of that data into Solr and then
> simply directly search the data within Solr.
>
> -- Jack Krupansky
>
> -Original Message- From: Ahmet Arslan
> Sent: Wednesday, October 1, 2014 9:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr + Federated Search Question
>
> Hi,
>
> Federation is possible. Solr has distributed search support with shards
> parameter.
>
> Ahmet
>
>
>
> On Wednesday, October 1, 2014 4:29 PM, Alejandro Calbazana <
> acalbaz...@gmail.com> wrote:
> Hello,
>
> I have a general question about Solr in a federated search context.  I
> understand that Solr does not do federated search and that  different tools
> are often used to incorporate Solr indexes into a federated/enterprise
> search solution.  Does anyone have recommendations on any products (open
> source or otherwise) that addresses this space?
>
> Thanks,
>
> Alejandro
>


Solr + Federated Search Question

2014-10-01 Thread Alejandro Calbazana
Hello,

I have a general question about Solr in a federated search context.  I
understand that Solr does not do federated search and that  different tools
are often used to incorporate Solr indexes into a federated/enterprise
search solution.  Does anyone have recommendations on any products (open
source or otherwise) that addresses this space?

Thanks,

Alejandro


Re: Computing Results So That They are Returned in Search Results

2013-10-30 Thread Alejandro Calbazana
So here is my use case with a little more detail.  I'm working with
recurring events.  Each event has an expression associated with it that
defines its recurrence pattern.  For example, monthly, daily, yearly...
The event has metadata associated with it that is searchable.  When a user
performs a search, they can match on various metadata fields, but the query
can also span a range of dates.  If a match occurs, I'd like to unwind the
expression into the instances specified by the pattern and return these
"virtual" instances as results.

Right now, I'm post processing data to hammer out the results that fit the
window of time specified in the query, but this moves sorting and
pagination out of the Solr tier.  I'd like to see if I can get it to stay
there :)  Post processing also prohibits me from faceting which would be
extremely useful.

I'm trying to avoid heavy post processing if I can.  Given the nature of
the data, its not really feasible for me to pre-assemble instance data and
index since I don't know the window of time a user will be looking at.

Thanks,

Alejandro


On Wed, Oct 30, 2013 at 6:35 PM, Upayavira  wrote:

> Also note that function queries only return numbers (given their origin
> in scoring). They cannot be used to create virtual string or text
> fields.
>
> Upayavira
>
> On Wed, Oct 30, 2013, at 05:19 PM, Jack Krupansky wrote:
> > A function query is simply returning a calculated result based on
> > existing
> > data - no new fields required.
> >
> > Did you actually want to precompute a value, store it in the index, and
> > then
> > query on it? If so, you could do that indexing with a custom or scripted
> > update processor.
> >
> > Flesh out an example of exactly what you want.
> >
> > -- Jack Krupansky
> >
> > -Original Message-
> > From: Alejandro Calbazana
> > Sent: Wednesday, October 30, 2013 12:46 PM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Computing Results So That They are Returned in Search
> > Results
> >
> > Sounds really close to what I'm looking for, but this sounds like it
> > would
> > result in a new field on a document (or a new value for a field defined
> > to
> > hold the result of a function).  Would it be possible for a function
> > query
> > to produce a new document so that I can associate the computed value with
> > it?
> >
> > Thanks,
> >
> > Alejandro
> >
> >
> > On Wed, Oct 30, 2013 at 12:05 PM, Jack Krupansky
> > wrote:
> >
> > > You could create a custom "value source" and then use it in a function
> > > query embedded in your return fields list (fl).
> > >
> > > So, the function query could use a function (value source) that takes a
> > > field, fetches its value, performs some arbitrary calculation, and then
> > > returns that value.
> > >
> > > fl=id,name,my-func(field1),my-**func(field2)
> > >
> > > -- Jack Krupansky
> > >
> > > -Original Message- From: Alejandro Calbazana
> > > Sent: Wednesday, October 30, 2013 10:10 AM
> > > To: solr-user@lucene.apache.org
> > > Subject: Computing Results So That They are Returned in Search Results
> > >
> > > I'd like to throw out a design question and see if its possible to
> solve
> > > this with Solr.
> > >
> > > I have a set of data that is computed that I'd like to make searchable.
> > > Ideally, I'd like to have all documents indexed and call it the day,
> but
> > > the nature of the data is such that it needs to be computed given a
> > > definition.  I'm interested in searching on definitions and then
> creating
> > > results on the fly that are calculated based on something embedded in
> the
> > > definition.
> > >
> > > Is it possible to embed this calculation login into Solr's result
> handling
> > > process?  I know this sounds exotic, but the nature of the data is such
> > > that I can't index these calculated documents because I don't know what
> > > the
> > > boundary is and specifiying an arbitrary number isn't ideal.
> > >
> > > Has anyone run across something like this?
> > >
> > > Thanks,
> > >
> > > Alejandr
> > >
> >
>


Re: Computing Results So That They are Returned in Search Results

2013-10-30 Thread Alejandro Calbazana
Sounds really close to what I'm looking for, but this sounds like it would
result in a new field on a document (or a new value for a field defined to
hold the result of a function).  Would it be possible for a function query
to produce a new document so that I can associate the computed value with
it?

Thanks,

Alejandro


On Wed, Oct 30, 2013 at 12:05 PM, Jack Krupansky wrote:

> You could create a custom "value source" and then use it in a function
> query embedded in your return fields list (fl).
>
> So, the function query could use a function (value source) that takes a
> field, fetches its value, performs some arbitrary calculation, and then
> returns that value.
>
> fl=id,name,my-func(field1),my-**func(field2)
>
> -- Jack Krupansky
>
> -Original Message- From: Alejandro Calbazana
> Sent: Wednesday, October 30, 2013 10:10 AM
> To: solr-user@lucene.apache.org
> Subject: Computing Results So That They are Returned in Search Results
>
> I'd like to throw out a design question and see if its possible to solve
> this with Solr.
>
> I have a set of data that is computed that I'd like to make searchable.
> Ideally, I'd like to have all documents indexed and call it the day, but
> the nature of the data is such that it needs to be computed given a
> definition.  I'm interested in searching on definitions and then creating
> results on the fly that are calculated based on something embedded in the
> definition.
>
> Is it possible to embed this calculation login into Solr's result handling
> process?  I know this sounds exotic, but the nature of the data is such
> that I can't index these calculated documents because I don't know what the
> boundary is and specifiying an arbitrary number isn't ideal.
>
> Has anyone run across something like this?
>
> Thanks,
>
> Alejandr
>


Computing Results So That They are Returned in Search Results

2013-10-30 Thread Alejandro Calbazana
I'd like to throw out a design question and see if its possible to solve
this with Solr.

I have a set of data that is computed that I'd like to make searchable.
Ideally, I'd like to have all documents indexed and call it the day, but
the nature of the data is such that it needs to be computed given a
definition.  I'm interested in searching on definitions and then creating
results on the fly that are calculated based on something embedded in the
definition.

Is it possible to embed this calculation login into Solr's result handling
process?  I know this sounds exotic, but the nature of the data is such
that I can't index these calculated documents because I don't know what the
boundary is and specifiying an arbitrary number isn't ideal.

Has anyone run across something like this?

Thanks,

Alejandr


Many Dynamic Fields + Indexing Strategy

2013-10-29 Thread Alejandro Calbazana
Hi,

I have an application that has a fair number of dynamic fields in addition
to static fields.  The use case is that a customer can create any number of
dynamic fields and associate them with domain objects that we then pull
into an indexed document.  I have no way to know these fields in advance
and the expectation is that these fields are searchable using a field/value
query.  It is a multi-tenant environment and it is possible that there
could be a high volume of dynamic fields created.

My question is if there is a reasonable indexing strategy that can be used
to accommodate such a use case.  My concern is that I can end up with a
large number of dynamic fields which would bring querying and full indexing
to a slow down.  Through some testing, I've created unique dynamic fields
and got into the 50K - 100K range when my JVM began to behave poorly and go
OOM.  I understand why this happens but I'm interested in how to protect
against this.

My only thought at the moment is to split my single index into multiple
cores - one per tenant.  Has anyone else had this requirement?  How did you
handle it?

My schema is pretty much what I've described.  A handful of static fields
with the stock dynamic field pattern definitions.  I am using Solr 4.2.1.

Thanks,

Al


tlog after commit

2013-09-17 Thread Alejandro Calbazana
Quick question...  Should I still see tlog files after a hard commit?

I'm trying to test soft commit and hard commits and I was under the
impression that tlog would be removed after a hard commit where, in the
case of soft commits, I would still see them.

Thanks,

Al


Federated Search Design Question

2013-09-13 Thread Alejandro Calbazana
Hi,

I have a general design question about federated search that I'd like to
get some thoughts on.

I have several line of business applications that manage their own data.
There is a need to search across these LOB apps, but each of them have
different authorization schemes in terms of allowing users access to data.
None of this data lives in Solr at the moment.

Ideally, everyone would push their data to Solr and we'd rationalize a
common ACL model for authorization.  Everything would be relatively
straightforward.  Unfortunately, I'm not going to be able to solve the ACL
problem in my timeline.

As an alternative, one consideration is to use Solr as soft of a cache
where data is pulled from individual endpoints and stored. A final query
would be made against results stored in Solr for combined results.

Has anyone used Solr in this way?  I understand that this might be an
unusual usage, results are likely going to be thrown away as queries
change, and there is overhead in committing.  If results were pushed into
memory, that might be enough for this purpose.

If there alternatives, I'm opened to suggestion.

Thanks!

Al


DIH + Solr Cloud

2013-09-03 Thread Alejandro Calbazana
Hi,

Quick question about data import handlers in Solr cloud.  Does anyone use
more than one instance to support the DIH process?  Or is the typical setup
to have one box setup as only the DIH and keep this responsibility outside
of the Solr cloud environment?  I'm just trying to get picture of his this
is typically deployed.

Thanks!

Alejandro


Data Import Handler Help

2013-08-07 Thread Alejandro Calbazana
Hi,

I'm looking for a bit of guidance in implementing a data import handler for
mongodb.

I am using
https://github.com/sucode/solrMongoDBImporter/blob/master/README.md as a
starting point, and I can get full imports working properly with a few
adjustments to the source.   The problem comes in when I try delta
imports.  After adding code to support delta queries and looking at how the
sql import handler works, I get deltas reads but the counts grow out of
control.  Its as if DocBuilder does not know when to stop processing.
Example: I have one doc to be read but I get 2 docs added/updated.

Has anyone seen this before?  Using 4.2.0.

Thanks