Re: Approach for Merge Database and Files

2018-06-27 Thread Erick Erickson
Seriously consider de-normalizing the data at index time. Your
indexing client just accesses both the DB and the file system, selects
the relevant data from each then indexes that data as a _single_
document.

Then there's no joining necessary at query time.

It's a common pattern to do extra work at index time on the theory
that you query much more often than indexing.

Best,
Erick

On Wed, Jun 27, 2018 at 7:01 AM, Angel Addati  wrote:
> Thank you Erick. I readed a lot of in the web and I still a litle lost. In
> summarizze, I need perform a join (sql join). For the moment, I evaluated
> the following alternatives:
>
>1. Alias (first response). I think it doesn't solve the problem, because
>I understand this is usefull to union (sql union) of the results. So I dont
>found how can it help me.
>2. Join Query. it is not a sql join because only show data for the first
>item and it only filters for the second item.So I discharded it.
>3. Block join. I dont finish understand it (the functionality and where
>I need to configure). But I think it dont solve the needed because It
>define a paret/child relationship.
>
>
> Do you have some clue to investigate?  Thank you for your help!
>
>
> *Angel** Adrián Addati*
>
>
> 2018-06-26 11:31 GMT-03:00 Erick Erickson :
>
>> bq.  I don't know if the best approach is combine in index time or in
>> query time
>>
>> It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2
>> (db == from the database and fm = file data).
>>
>> If you want to form a Solr query like
>>
>> db_f1:something fm_f2:something_else
>>
>> you don't have much choice, you've got to do it at index time or your
>> search time will be horrible.
>>
>> OTOH, if you want search, say, _only_ on the db_* data or _only_ on
>> the file data and enrich the results returned to the user with data
>> from the other source, that's perfectly reasonable, although you
>> should really do some prototyping to see if it meets your SLA. This
>> presupposes that you're only returning a few rows. For example, use
>> Solr to get the top 10 docs based on file data and have your app layer
>> reach out to the DB to enrich just those 10 docs.
>>
>> In general, you should always consider doing as much pre-processing at
>> index time as you can on the theory that what you want is fast
>> searches and you'll search over a doc many, many more times than you
>> index it.
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Jun 26, 2018 at 7:02 AM, Angel Addati 
>> wrote:
>> > Thank both.
>> >
>> > *"From your problem description, it looks like you want to gather the
>> data
>> > from the DB and filesystem and combine them into a Solr document at index
>> > time, then index that document. " *
>> >
>> > Exactly. I don't know if the best approach is combine in index time or in
>> > query time. But I need search and show results of the combine items. I'm
>> > investigating the allias sugguest. Do you think it solve the problem or
>> Do
>> > you know other approach?
>> >
>> > PD: I need put the information in the file and the information in the
>> data
>> > base also, because it have some important content and metadata.
>> >
>> > Regards...
>> >
>> > * - - -*
>> > *Angel** Adrián Addati*
>> >
>> >
>> > 2018-06-26 10:50 GMT-03:00 Erick Erickson :
>> >
>> >> From your problem description, it looks like you want to gather the
>> >> data from the DB and filesystem and combine them into a Solr document
>> >> at index time, then index that document.
>> >>
>> >> Put enough information in Solr to fetch the document as necessary,
>> >> often people don't put the entire file in Solr especially if it's,
>> >> say, a PDF or Word etc.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
>> >>  wrote:
>> >> > Hi,
>> >> >
>> >> > I would create a search alias, that contains the latest versions of
>> the
>> >> different collections.
>> >> >
>> >> > See:
>> >> > https://lucene.apache.org/solr/guide/7_3/collections-
>> >> api.html#collections-api
>> >> >
>> >> > Then you use this alias to search for results
>> >> >
>> >> > You get better results if you define the same schema for all
>> collections
>> >> >
>> >> > Best Regards
>> >> > Peter Gylling Jørgensen
>> >> > Findability Consultant
>> >> > Mail: peter.jorgen...@findwise.com> findwise.com>
>> >> > Mobile: +45 42442890
>> >> >
>> >> >
>> >> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati > >> >:
>> >> >
>> >> > Hi:
>> >> >
>> >> > I have two sources to indexing:
>> >> > Database: MetadataDB1, MetadataDB2, File Url...
>> >> > Files: MetadataF1, MetadataF2, File Url, Contain...
>> >> >
>> >> > I index the data base and the files. When I search, I need search and
>> >> show
>> >> > the merge result: Database + Files (MetadataDb1, MetadataDB2,
>> MetadataF1,
>> >> > MetadataF2, File Url, Contain, ...).
>> >> >
>> >> >
>> >> > Is it possible?
>> >> >
>> >> > Regards!
>> >> >
>> >> 

Re: Approach for Merge Database and Files

2018-06-27 Thread Angel Addati
Thank you Erick. I readed a lot of in the web and I still a litle lost. In
summarizze, I need perform a join (sql join). For the moment, I evaluated
the following alternatives:

   1. Alias (first response). I think it doesn't solve the problem, because
   I understand this is usefull to union (sql union) of the results. So I dont
   found how can it help me.
   2. Join Query. it is not a sql join because only show data for the first
   item and it only filters for the second item.So I discharded it.
   3. Block join. I dont finish understand it (the functionality and where
   I need to configure). But I think it dont solve the needed because It
   define a paret/child relationship.


Do you have some clue to investigate?  Thank you for your help!


*Angel** Adrián Addati*


2018-06-26 11:31 GMT-03:00 Erick Erickson :

> bq.  I don't know if the best approach is combine in index time or in
> query time
>
> It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2
> (db == from the database and fm = file data).
>
> If you want to form a Solr query like
>
> db_f1:something fm_f2:something_else
>
> you don't have much choice, you've got to do it at index time or your
> search time will be horrible.
>
> OTOH, if you want search, say, _only_ on the db_* data or _only_ on
> the file data and enrich the results returned to the user with data
> from the other source, that's perfectly reasonable, although you
> should really do some prototyping to see if it meets your SLA. This
> presupposes that you're only returning a few rows. For example, use
> Solr to get the top 10 docs based on file data and have your app layer
> reach out to the DB to enrich just those 10 docs.
>
> In general, you should always consider doing as much pre-processing at
> index time as you can on the theory that what you want is fast
> searches and you'll search over a doc many, many more times than you
> index it.
>
> Best,
> Erick
>
>
> On Tue, Jun 26, 2018 at 7:02 AM, Angel Addati 
> wrote:
> > Thank both.
> >
> > *"From your problem description, it looks like you want to gather the
> data
> > from the DB and filesystem and combine them into a Solr document at index
> > time, then index that document. " *
> >
> > Exactly. I don't know if the best approach is combine in index time or in
> > query time. But I need search and show results of the combine items. I'm
> > investigating the allias sugguest. Do you think it solve the problem or
> Do
> > you know other approach?
> >
> > PD: I need put the information in the file and the information in the
> data
> > base also, because it have some important content and metadata.
> >
> > Regards...
> >
> > * - - -*
> > *Angel** Adrián Addati*
> >
> >
> > 2018-06-26 10:50 GMT-03:00 Erick Erickson :
> >
> >> From your problem description, it looks like you want to gather the
> >> data from the DB and filesystem and combine them into a Solr document
> >> at index time, then index that document.
> >>
> >> Put enough information in Solr to fetch the document as necessary,
> >> often people don't put the entire file in Solr especially if it's,
> >> say, a PDF or Word etc.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
> >>  wrote:
> >> > Hi,
> >> >
> >> > I would create a search alias, that contains the latest versions of
> the
> >> different collections.
> >> >
> >> > See:
> >> > https://lucene.apache.org/solr/guide/7_3/collections-
> >> api.html#collections-api
> >> >
> >> > Then you use this alias to search for results
> >> >
> >> > You get better results if you define the same schema for all
> collections
> >> >
> >> > Best Regards
> >> > Peter Gylling Jørgensen
> >> > Findability Consultant
> >> > Mail: peter.jorgen...@findwise.com findwise.com>
> >> > Mobile: +45 42442890
> >> >
> >> >
> >> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati  >> >:
> >> >
> >> > Hi:
> >> >
> >> > I have two sources to indexing:
> >> > Database: MetadataDB1, MetadataDB2, File Url...
> >> > Files: MetadataF1, MetadataF2, File Url, Contain...
> >> >
> >> > I index the data base and the files. When I search, I need search and
> >> show
> >> > the merge result: Database + Files (MetadataDb1, MetadataDB2,
> MetadataF1,
> >> > MetadataF2, File Url, Contain, ...).
> >> >
> >> >
> >> > Is it possible?
> >> >
> >> > Regards!
> >> >
> >> > Angel
> >> >
> >> >
> >> >
> >> > --
> >> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >>
>


Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
bq.  I don't know if the best approach is combine in index time or in query time

It Depends (tm). What is your goal? Let's say you have db_f1 and fm_f2
(db == from the database and fm = file data).

If you want to form a Solr query like

db_f1:something fm_f2:something_else

you don't have much choice, you've got to do it at index time or your
search time will be horrible.

OTOH, if you want search, say, _only_ on the db_* data or _only_ on
the file data and enrich the results returned to the user with data
from the other source, that's perfectly reasonable, although you
should really do some prototyping to see if it meets your SLA. This
presupposes that you're only returning a few rows. For example, use
Solr to get the top 10 docs based on file data and have your app layer
reach out to the DB to enrich just those 10 docs.

In general, you should always consider doing as much pre-processing at
index time as you can on the theory that what you want is fast
searches and you'll search over a doc many, many more times than you
index it.

Best,
Erick


On Tue, Jun 26, 2018 at 7:02 AM, Angel Addati  wrote:
> Thank both.
>
> *"From your problem description, it looks like you want to gather the data
> from the DB and filesystem and combine them into a Solr document at index
> time, then index that document. " *
>
> Exactly. I don't know if the best approach is combine in index time or in
> query time. But I need search and show results of the combine items. I'm
> investigating the allias sugguest. Do you think it solve the problem or Do
> you know other approach?
>
> PD: I need put the information in the file and the information in the data
> base also, because it have some important content and metadata.
>
> Regards...
>
> * - - -*
> *Angel** Adrián Addati*
>
>
> 2018-06-26 10:50 GMT-03:00 Erick Erickson :
>
>> From your problem description, it looks like you want to gather the
>> data from the DB and filesystem and combine them into a Solr document
>> at index time, then index that document.
>>
>> Put enough information in Solr to fetch the document as necessary,
>> often people don't put the entire file in Solr especially if it's,
>> say, a PDF or Word etc.
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
>>  wrote:
>> > Hi,
>> >
>> > I would create a search alias, that contains the latest versions of the
>> different collections.
>> >
>> > See:
>> > https://lucene.apache.org/solr/guide/7_3/collections-
>> api.html#collections-api
>> >
>> > Then you use this alias to search for results
>> >
>> > You get better results if you define the same schema for all collections
>> >
>> > Best Regards
>> > Peter Gylling Jørgensen
>> > Findability Consultant
>> > Mail: peter.jorgen...@findwise.com
>> > Mobile: +45 42442890
>> >
>> >
>> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati > >:
>> >
>> > Hi:
>> >
>> > I have two sources to indexing:
>> > Database: MetadataDB1, MetadataDB2, File Url...
>> > Files: MetadataF1, MetadataF2, File Url, Contain...
>> >
>> > I index the data base and the files. When I search, I need search and
>> show
>> > the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
>> > MetadataF2, File Url, Contain, ...).
>> >
>> >
>> > Is it possible?
>> >
>> > Regards!
>> >
>> > Angel
>> >
>> >
>> >
>> > --
>> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>


Re: Approach for Merge Database and Files

2018-06-26 Thread Angel Addati
Thank both.

*"From your problem description, it looks like you want to gather the data
from the DB and filesystem and combine them into a Solr document at index
time, then index that document. " *

Exactly. I don't know if the best approach is combine in index time or in
query time. But I need search and show results of the combine items. I'm
investigating the allias sugguest. Do you think it solve the problem or Do
you know other approach?

PD: I need put the information in the file and the information in the data
base also, because it have some important content and metadata.

Regards...

* - - -*
*Angel** Adrián Addati*


2018-06-26 10:50 GMT-03:00 Erick Erickson :

> From your problem description, it looks like you want to gather the
> data from the DB and filesystem and combine them into a Solr document
> at index time, then index that document.
>
> Put enough information in Solr to fetch the document as necessary,
> often people don't put the entire file in Solr especially if it's,
> say, a PDF or Word etc.
>
> Best,
> Erick
>
> On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
>  wrote:
> > Hi,
> >
> > I would create a search alias, that contains the latest versions of the
> different collections.
> >
> > See:
> > https://lucene.apache.org/solr/guide/7_3/collections-
> api.html#collections-api
> >
> > Then you use this alias to search for results
> >
> > You get better results if you define the same schema for all collections
> >
> > Best Regards
> > Peter Gylling Jørgensen
> > Findability Consultant
> > Mail: peter.jorgen...@findwise.com
> > Mobile: +45 42442890
> >
> >
> > Den 26. jun. 2018 kl. 13.55 skrev angeladdati  >:
> >
> > Hi:
> >
> > I have two sources to indexing:
> > Database: MetadataDB1, MetadataDB2, File Url...
> > Files: MetadataF1, MetadataF2, File Url, Contain...
> >
> > I index the data base and the files. When I search, I need search and
> show
> > the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
> > MetadataF2, File Url, Contain, ...).
> >
> >
> > Is it possible?
> >
> > Regards!
> >
> > Angel
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>


Re: Approach for Merge Database and Files

2018-06-26 Thread Erick Erickson
>From your problem description, it looks like you want to gather the
data from the DB and filesystem and combine them into a Solr document
at index time, then index that document.

Put enough information in Solr to fetch the document as necessary,
often people don't put the entire file in Solr especially if it's,
say, a PDF or Word etc.

Best,
Erick

On Tue, Jun 26, 2018 at 5:21 AM, Peter Gylling Jørgensen
 wrote:
> Hi,
>
> I would create a search alias, that contains the latest versions of the 
> different collections.
>
> See:
> https://lucene.apache.org/solr/guide/7_3/collections-api.html#collections-api
>
> Then you use this alias to search for results
>
> You get better results if you define the same schema for all collections
>
> Best Regards
> Peter Gylling Jørgensen
> Findability Consultant
> Mail: peter.jorgen...@findwise.com
> Mobile: +45 42442890
>
>
> Den 26. jun. 2018 kl. 13.55 skrev angeladdati 
> mailto:angeladd...@gmail.com>>:
>
> Hi:
>
> I have two sources to indexing:
> Database: MetadataDB1, MetadataDB2, File Url...
> Files: MetadataF1, MetadataF2, File Url, Contain...
>
> I index the data base and the files. When I search, I need search and show
> the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
> MetadataF2, File Url, Contain, ...).
>
>
> Is it possible?
>
> Regards!
>
> Angel
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Approach for Merge Database and Files

2018-06-26 Thread Peter Gylling Jørgensen
Hi,

I would create a search alias, that contains the latest versions of the 
different collections.

See:
https://lucene.apache.org/solr/guide/7_3/collections-api.html#collections-api

Then you use this alias to search for results

You get better results if you define the same schema for all collections

Best Regards
Peter Gylling Jørgensen
Findability Consultant
Mail: peter.jorgen...@findwise.com
Mobile: +45 42442890


Den 26. jun. 2018 kl. 13.55 skrev angeladdati 
mailto:angeladd...@gmail.com>>:

Hi:

I have two sources to indexing:
Database: MetadataDB1, MetadataDB2, File Url...
Files: MetadataF1, MetadataF2, File Url, Contain...

I index the data base and the files. When I search, I need search and show
the merge result: Database + Files (MetadataDb1, MetadataDB2, MetadataF1,
MetadataF2, File Url, Contain, ...).


Is it possible?

Regards!

Angel



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html