subject:"Re\: Multi\-language solr1.3 what would you reckon\?"

Re: Multi-language solr1.3 what would you reckon?

2008-10-15 Thread sunnyfr

ok MultiCore is handy indeed to don't have this big index wich manage every
language,
but when you have one modification to do you have to do it on all of them.

And the point as well is it's complicate too boost more one language than
another one,
ie with an Italian search video, if we don't have that much video then it
might be more interesting to bring back english one.

And if there is some language like Slovakia which are not managed by the
website but people can come from there ... so the video will be stored in
core0 which will be all language which are not english, spanish, germany ..
french.
so this kind of garbage core for every language which are not managed ...
and I think it might be hard to manage.

What do you think?

Hannes Carl Meyer-2 wrote:

I attached an example for you.

The challenge with MultiCore is on the client's search logic. It would
help
if you know which language the person wants to search through. If not you
would have to perform multiple requests to the multiple cores. Ordinary
logic would be:

1. search chien in core0 (english)
2. if #1 returned zero results search for chien in core1 (french)

---

In your client you could even parallelize the requests to minimize waiting
time.

*One feature I didn't try yet is the DistributedSearch (and how it will
help
with multiple cores)*, find it here:
http://wiki.apache.org/solr/DistributedSearch

Regards,

Hannes

On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote:

Thanks for this explanation, but just to get it properly :

One core per language, so with the same field and schema just the
language
part and management which is different?
and one core which consider every language which are not managed by solr
like russian or ???
so different request to the dabase
ok

Just don't get really when you look for the word 'chien' on the english
website I want get back result from french video because chien is french
so
if it doesn't find any english video with chien I need my french video
then.

Exactly the same for user's core, if somebody look for 'chien' and there
is
one user with exactly the same username I would like to show it up.

thanks for your time, really,

John E. McBride wrote:

Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.

The approach, translated to solr 1.3 would be to use multicore - one
core per geography. Then a schema.xml per core, each with a different
language in the porter algorithm, stopwords etc - taken from snowball.

Then on the german front end you make requests to the de core, on the
english front end make requests to the english core.

This is much simpler than sorting every language in the one index, for
example german queries will need to be run through the german query
filters etc. If you have all languages in one schema, then you will
have to do some front end logic to map the query to the correct field.

You have failed to consider internationalisation of the query side of
the process - your field type merely have analysis filters.

Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently the
ingestion mechanisms and schedules.

Just a few thoughts.

John

sunnyfr wrote:
Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml

Thanks a lot,

--
View this message in context:
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr1.3 MultiCore Scenario

--
View this message in context:
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19991949.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-15 Thread sunnyfr

Hi,

Sorry I didnt get your example can you send me it again?
thanks,

Hannes Carl Meyer-2 wrote:

I attached an example for you.

1. search chien in core0 (english)
2. if #1 returned zero results search for chien in core1 (french)

---

In your client you could even parallelize the requests to minimize waiting
time.

*One feature I didn't try yet is the DistributedSearch (and how it will
help
with multiple cores)*, find it here:
http://wiki.apache.org/solr/DistributedSearch

Regards,

Hannes

On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote:

Thanks for this explanation, but just to get it properly :

Exactly the same for user's core, if somebody look for 'chien' and there
is
one user with exactly the same username I would like to show it up.

thanks for your time, really,

John E. McBride wrote:

Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.

Then on the german front end you make requests to the de core, on the
english front end make requests to the english core.

You have failed to consider internationalisation of the query side of
the process - your field type merely have analysis filters.

Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently the
ingestion mechanisms and schedules.

Just a few thoughts.

John

sunnyfr wrote:
Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml

Thanks a lot,

--
View this message in context:
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr1.3 MultiCore Scenario

--
View this message in context:
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19990348.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-15 Thread Hannes Carl Meyer

It should work, but if you want to handle multiple languages in ONE index
you end up with a lot of filters and fields handled with different analyzers
in a SINGLE configuration.

On Wed, Oct 15, 2008 at 3:03 PM, sunnyfr [EMAIL PROTECTED] wrote:

But about stopwords and stemming, is it a real issue if on one core I've
several stemming and stopwords(with a different name), it should work?

Hannes Carl Meyer-2 wrote:

Hi,

yes, if you don't handle (stopwords, stemming etc.) a specific language
you
should create a general core.

In my project I'm supporting 10 languages and if I get unsupported
languages
it is going to be logged and discarded right away!

Boosting on multiple cores is indeed a problem. An idea would be to merge
the result sets from core0 and core1 and sort by scoring?

Regards

Hannes

On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr [EMAIL PROTECTED] wrote:

ok MultiCore is handy indeed to don't have this big index wich manage
every
language,
but when you have one modification to do you have to do it on all of
them.

And the point as well is it's complicate too boost more one language
than
another one,
ie with an Italian search video, if we don't have that much video then
it
might be more interesting to bring back english one.

And if there is some language like Slovakia which are not managed by the
website but people can come from there ... so the video will be stored
in
core0 which will be all language which are not english, spanish, germany
..
french.
so this kind of garbage core for every language which are not managed
...
and I think it might be hard to manage.

What do you think?

Hannes Carl Meyer-2 wrote:

I attached an example for you.

The challenge with MultiCore is on the client's search logic. It would
help
if you know which language the person wants to search through. If not
you
would have to perform multiple requests to the multiple cores.
Ordinary
logic would be:

1. search chien in core0 (english)
2. if #1 returned zero results search for chien in core1 (french)

---

In your client you could even parallelize the requests to minimize
waiting
time.

*One feature I didn't try yet is the DistributedSearch (and how it
will
help
with multiple cores)*, find it here:
http://wiki.apache.org/solr/DistributedSearch

Regards,

Hannes

On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED]
wrote:

Thanks for this explanation, but just to get it properly :

One core per language, so with the same field and schema just the
language
part and management which is different?
and one core which consider every language which are not managed by
solr
like russian or ???
so different request to the dabase
ok

Just don't get really when you look for the word 'chien' on the
english
website I want get back result from french video because chien is
french
so
if it doesn't find any english video with chien I need my french
video
then.

Exactly the same for user's core, if somebody look for 'chien' and
there
is
one user with exactly the same username I would like to show it up.

thanks for your time, really,

John E. McBride wrote:

Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.

The approach, translated to solr 1.3 would be to use multicore -
one
core per geography. Then a schema.xml per core, each with a
different
language in the porter algorithm, stopwords etc - taken from
snowball.

Then on the german front end you make requests to the de core, on
the
english front end make requests to the english core.

This is much simpler than sorting every language in the one index,
for
example german queries will need to be run through the german query
filters etc. If you have all languages in one schema, then you
will
have to do some front end logic to map the query to the correct
field.

You have failed to consider internationalisation of the query side
of
the process - your field type merely have analysis filters.

Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently
the
ingestion mechanisms and schedules.

Just a few thoughts.

John

sunnyfr wrote:
Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml

Thanks a lot,

--
View this message in context:

http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-15 Thread sunnyfr

But about stopwords and stemming, is it a real issue if on one core I've
several stemming and stopwords(with a different name), it should work?

Hannes Carl Meyer-2 wrote:

Hi,

yes, if you don't handle (stopwords, stemming etc.) a specific language
you
should create a general core.

In my project I'm supporting 10 languages and if I get unsupported
languages
it is going to be logged and discarded right away!

Boosting on multiple cores is indeed a problem. An idea would be to merge
the result sets from core0 and core1 and sort by scoring?

Regards

Hannes

On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr [EMAIL PROTECTED] wrote:

ok MultiCore is handy indeed to don't have this big index wich manage
every
language,
but when you have one modification to do you have to do it on all of
them.

And if there is some language like Slovakia which are not managed by the
website but people can come from there ... so the video will be stored in
core0 which will be all language which are not english, spanish, germany
..
french.
so this kind of garbage core for every language which are not managed ...
and I think it might be hard to manage.

What do you think?

Hannes Carl Meyer-2 wrote:

I attached an example for you.

The challenge with MultiCore is on the client's search logic. It would
help
if you know which language the person wants to search through. If not
you
would have to perform multiple requests to the multiple cores. Ordinary
logic would be:

1. search chien in core0 (english)
2. if #1 returned zero results search for chien in core1 (french)

---

In your client you could even parallelize the requests to minimize
waiting
time.

*One feature I didn't try yet is the DistributedSearch (and how it will
help
with multiple cores)*, find it here:
http://wiki.apache.org/solr/DistributedSearch

Regards,

Hannes

On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote:

Thanks for this explanation, but just to get it properly :

One core per language, so with the same field and schema just the
language
part and management which is different?
and one core which consider every language which are not managed by
solr
like russian or ???
so different request to the dabase
ok

Just don't get really when you look for the word 'chien' on the
english
website I want get back result from french video because chien is
french
so
if it doesn't find any english video with chien I need my french video
then.

Exactly the same for user's core, if somebody look for 'chien' and
there
is
one user with exactly the same username I would like to show it up.

thanks for your time, really,

John E. McBride wrote:

Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.

The approach, translated to solr 1.3 would be to use multicore - one
core per geography. Then a schema.xml per core, each with a
different
language in the porter algorithm, stopwords etc - taken from
snowball.

Then on the german front end you make requests to the de core, on
the
english front end make requests to the english core.

This is much simpler than sorting every language in the one index,
for
example german queries will need to be run through the german query
filters etc. If you have all languages in one schema, then you will
have to do some front end logic to map the query to the correct
field.

You have failed to consider internationalisation of the query side
of
the process - your field type merely have analysis filters.

Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently
the
ingestion mechanisms and schedules.

Just a few thoughts.

John

sunnyfr wrote:
Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml

Thanks a lot,

--
View this message in context:

http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr1.3 MultiCore Scenario

Re: Multi-language solr1.3 what would you reckon?

2008-10-15 Thread Hannes Carl Meyer

Hi,

yes, if you don't handle (stopwords, stemming etc.) a specific language you
should create a general core.

In my project I'm supporting 10 languages and if I get unsupported languages
it is going to be logged and discarded right away!

Boosting on multiple cores is indeed a problem. An idea would be to merge
the result sets from core0 and core1 and sort by scoring?

Regards

Hannes

On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr [EMAIL PROTECTED] wrote:

ok MultiCore is handy indeed to don't have this big index wich manage every
language,
but when you have one modification to do you have to do it on all of them.

What do you think?

Hannes Carl Meyer-2 wrote:

I attached an example for you.

1. search chien in core0 (english)
2. if #1 returned zero results search for chien in core1 (french)

---

In your client you could even parallelize the requests to minimize
waiting
time.

*One feature I didn't try yet is the DistributedSearch (and how it will
help
with multiple cores)*, find it here:
http://wiki.apache.org/solr/DistributedSearch

Regards,

Hannes

On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote:

Thanks for this explanation, but just to get it properly :

Exactly the same for user's core, if somebody look for 'chien' and there
is
one user with exactly the same username I would like to show it up.

thanks for your time, really,

John E. McBride wrote:

Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.

Then on the german front end you make requests to the de core, on the
english front end make requests to the english core.

You have failed to consider internationalisation of the query side of
the process - your field type merely have analysis filters.

Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently the
ingestion mechanisms and schedules.

Just a few thoughts.

John

sunnyfr wrote:
Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml

Thanks a lot,

--
View this message in context:

http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr1.3 MultiCore Scenario

Re: Multi-language solr1.3 what would you reckon?

2008-10-14 Thread sunnyfr


is it ??? 


sunnyfr wrote:
 
 Ok so actually multi-core is multi-index?
 Cheers for this links
 
 
 Hannes Carl Meyer-2 wrote:
 
 Nope, your schema defines a single index with alle languages being
 stored.
 The other way would be MultiCore/MultipleIndexes as described here:
 http://wiki.apache.org/solr/CoreAdmin and
 http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59
 
 On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote:
 

 But I don't get, if you look in my schema.xml it's what I've done, multi
 index?
 So I was right ?


 Hannes Carl Meyer-2 wrote:
 
  Hi Ralf,
 
  you should also check on the example inside the Solr 1.3 download
 package!
 
  The management of multiple languages inside multiple indexes really
 makes
  sense in terms of configuration efforts (look at your big kahuna
  configuration file!), performance and gives an additional
 scalibility
  feature (in fact that you index/search in multiple cores which could
 be
  theoretically placed on different machines).
 
  But, from the perspecitve of the search client you will have to
 execute
  search processes on multiple cores simultaneously. If this is feasible
 you
  should really think about using multiple indexes.
 
  Regards,
 
  Hannes
 
  On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH 
  [EMAIL PROTECTED] wrote:
 
  Hannes Carl Meyer schrieb:
 
  Hi,
 
  is it really neccessary to put it all into one index? You could also
 use
  the
  Solr MultiCore/MultipleIndexes feature and seperate by language.
 
 
  Is there a good webpage with infos about the multiindex-feature ?
  I know http://wiki.apache.org/solr/MultipleIndexes but there is not
  enough
  info :-(
 
 
  Greets -Ralf-
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19970307.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-14 Thread sunnyfr


Thanks for this explanation, but just to get it properly :

One core per language, so with the same field and schema just the language
part and management which is different?
and one core which consider every language which are not managed by solr
like russian or ??? 
so different request to the dabase 
ok 

Just don't get really when you look for the word 'chien' on the english
website I want get back result from french video because chien is french so
if it doesn't find any english video with chien I need my french video then.

Exactly the same for user's core, if somebody look for 'chien' and there is
one user with exactly the same username I would like to show it up.

thanks for your time, really,



John E. McBride wrote:
 
 Fairly nebulous requirements, but I recently was involved in a 
 multilingual search platform.
 
 The approach, translated to solr 1.3 would be to use multicore - one 
 core per geography.  Then a schema.xml per core, each with a different 
 language in the porter algorithm, stopwords etc - taken from snowball.
 
 Then on the german front end you make requests to the de core, on the 
 english front end make requests to the english core.
 
 This is much simpler than sorting every language in the one index, for 
 example german queries will need to be run through the german query 
 filters etc.  If you have all languages in one schema, then you will 
 have to do some front end logic to map the query to the correct field.
 
 You have failed to consider internationalisation of the query side of 
 the process - your field type merely have analysis filters. 
 
 Additionally, if the data source for each different geography is 
 different it makes sense to separate the indexes and subsequently the 
 ingestion mechanisms and schedules.
 
 Just a few thoughts.
 
 John
 
 sunnyfr wrote:
 Hi,

 I would like to manage properly multi language search motor,
 I would like your advice about what have I done.

 Solr1.3
 tomcat55

 http://www.nabble.com/file/p19954805/schema.xml schema.xml 

 Thanks a lot,

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-14 Thread Hannes Carl Meyer

I attached an example for you.

The challenge with MultiCore is on the client's search logic. It would help
if you know which language the person wants to search through. If not you
would have to perform multiple requests to the multiple cores. Ordinary
logic would be:

1. search chien in core0 (english)
2. if #1 returned zero results search for chien in core1 (french)

---

In your client you could even parallelize the requests to minimize waiting
time.

*One feature I didn't try yet is the DistributedSearch (and how it will help
with multiple cores)*, find it here:
http://wiki.apache.org/solr/DistributedSearch

Regards,

Hannes

On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote:

Thanks for this explanation, but just to get it properly :

One core per language, so with the same field and schema just the language
part and management which is different?
and one core which consider every language which are not managed by solr
like russian or ???
so different request to the dabase
ok

Just don't get really when you look for the word 'chien' on the english
website I want get back result from french video because chien is french so
if it doesn't find any english video with chien I need my french video
then.

Exactly the same for user's core, if somebody look for 'chien' and there is
one user with exactly the same username I would like to show it up.

thanks for your time, really,

John E. McBride wrote:

Fairly nebulous requirements, but I recently was involved in a
multilingual search platform.

Then on the german front end you make requests to the de core, on the
english front end make requests to the english core.

You have failed to consider internationalisation of the query side of
the process - your field type merely have analysis filters.

Additionally, if the data source for each different geography is
different it makes sense to separate the indexes and subsequently the
ingestion mechanisms and schedules.

Just a few thoughts.

John

sunnyfr wrote:
Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml

Thanks a lot,

--
View this message in context:
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr1.3 MultiCore Scenario

Re: Multi-language solr1.3 what would you reckon?

2008-10-14 Thread Hannes Carl Meyer

Sorry, yes MultiCore means multiple indexes!

Regards,

Hannes

On Tue, Oct 14, 2008 at 11:53 AM, sunnyfr [EMAIL PROTECTED] wrote:


 is it ???


 sunnyfr wrote:
 
  Ok so actually multi-core is multi-index?
  Cheers for this links
 
 
  Hannes Carl Meyer-2 wrote:
 
  Nope, your schema defines a single index with alle languages being
  stored.
  The other way would be MultiCore/MultipleIndexes as described here:
  http://wiki.apache.org/solr/CoreAdmin and
 
 http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59
 
  On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote:
 
 
  But I don't get, if you look in my schema.xml it's what I've done,
 multi
  index?
  So I was right ?
 
 
  Hannes Carl Meyer-2 wrote:
  
   Hi Ralf,
  
   you should also check on the example inside the Solr 1.3 download
  package!
  
   The management of multiple languages inside multiple indexes really
  makes
   sense in terms of configuration efforts (look at your big kahuna
   configuration file!), performance and gives an additional
  scalibility
   feature (in fact that you index/search in multiple cores which could
  be
   theoretically placed on different machines).
  
   But, from the perspecitve of the search client you will have to
  execute
   search processes on multiple cores simultaneously. If this is
 feasible
  you
   should really think about using multiple indexes.
  
   Regards,
  
   Hannes
  
   On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH 
   [EMAIL PROTECTED] wrote:
  
   Hannes Carl Meyer schrieb:
  
   Hi,
  
   is it really neccessary to put it all into one index? You could
 also
  use
   the
   Solr MultiCore/MultipleIndexes feature and seperate by language.
  
  
   Is there a good webpage with infos about the multiindex-feature ?
   I know http://wiki.apache.org/solr/MultipleIndexes but there is not
   enough
   info :-(
  
  
   Greets -Ralf-
  
  
  
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19970307.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread Hannes Carl Meyer

Hi,

is it really neccessary to put it all into one index? You could also use the
Solr MultiCore/MultipleIndexes feature and seperate by language.

Regards,

Hannes

On Mon, Oct 13, 2008 at 3:20 PM, sunnyfr [EMAIL PROTECTED] wrote:


 Hi,

 I would like to manage properly multi language search motor,
 I would like your advice about what have I done.

 Solr1.3
 tomcat55

 http://www.nabble.com/file/p19954805/schema.xml schema.xml

 Thanks a lot,

 --
 View this message in context:
 http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19954805.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread John E. McBride

Fairly nebulous requirements, but I recently was involved in a 
multilingual search platform.


The approach, translated to solr 1.3 would be to use multicore - one 
core per geography.  Then a schema.xml per core, each with a different 
language in the porter algorithm, stopwords etc - taken from snowball.


Then on the german front end you make requests to the de core, on the 
english front end make requests to the english core.


This is much simpler than sorting every language in the one index, for 
example german queries will need to be run through the german query 
filters etc.  If you have all languages in one schema, then you will 
have to do some front end logic to map the query to the correct field.


You have failed to consider internationalisation of the query side of 
the process - your field type merely have analysis filters. 

Additionally, if the data source for each different geography is 
different it makes sense to separate the indexes and subsequently the 
ingestion mechanisms and schedules.


Just a few thoughts.

John

sunnyfr wrote:

Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml 


Thanks a lot,

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread sunnyfr


Hi,

Thanks guys for your answer, but I don't think I can use multi-core for each
language, 
because for exemple if somebody is connected from Italia and if there is not
that much Italian's book,
so by default I will show up few italian books but all the english one as
well.

Do you have an example ? 
I'm quite lost about it,


John E. McBride wrote:
 
 Fairly nebulous requirements, but I recently was involved in a 
 multilingual search platform.
 
 The approach, translated to solr 1.3 would be to use multicore - one 
 core per geography.  Then a schema.xml per core, each with a different 
 language in the porter algorithm, stopwords etc - taken from snowball.
 
 Then on the german front end you make requests to the de core, on the 
 english front end make requests to the english core.
 
 This is much simpler than sorting every language in the one index, for 
 example german queries will need to be run through the german query 
 filters etc.  If you have all languages in one schema, then you will 
 have to do some front end logic to map the query to the correct field.
 
 You have failed to consider internationalisation of the query side of 
 the process - your field type merely have analysis filters. 
 
 Additionally, if the data source for each different geography is 
 different it makes sense to separate the indexes and subsequently the 
 ingestion mechanisms and schedules.
 
 Just a few thoughts.
 
 John
 
 sunnyfr wrote:
 Hi,

 I would like to manage properly multi language search motor,
 I would like your advice about what have I done.

 Solr1.3
 tomcat55

 http://www.nabble.com/file/p19954805/schema.xml schema.xml 

 Thanks a lot,

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19955092.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread John E. McBride

Well, it's this section shown below, which would change from geography 
to geography.

Parameterise the EnglishPorterFilterFactory and protwords.

You could introduce logic in the front end which asks if num results is 
zero then makes a call to the english language, but it doesn't make 
logical sense?  why would a search in the italian language bring up 
anything in the english index?


I think you need to explain your application in a little more detail.


fieldType name=text class=solr.TextField positionIncrementGap=100
-
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
-
!--
in this example, we will only use synonyms at query time
   filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=false/
  
--

-
!--
Case insensitive stop word removal.
enablePositionIncrements=true ensures that a 'gap' is left to
allow for accurate phrase queries.
  
--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
-
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

sunnyfr wrote:

Hi,

Thanks guys for your answer, but I don't think I can use multi-core for each
language, 
because for exemple if somebody is connected from Italia and if there is not

that much Italian's book,
so by default I will show up few italian books but all the english one as
well.

Do you have an example ? 
I'm quite lost about it,



John E. McBride wrote:
  
Fairly nebulous requirements, but I recently was involved in a 
multilingual search platform.


The approach, translated to solr 1.3 would be to use multicore - one 
core per geography.  Then a schema.xml per core, each with a different 
language in the porter algorithm, stopwords etc - taken from snowball.


Then on the german front end you make requests to the de core, on the 
english front end make requests to the english core.


This is much simpler than sorting every language in the one index, for 
example german queries will need to be run through the german query 
filters etc.  If you have all languages in one schema, then you will 
have to do some front end logic to map the query to the correct field.


You have failed to consider internationalisation of the query side of 
the process - your field type merely have analysis filters. 

Additionally, if the data source for each different geography is 
different it makes sense to separate the indexes and subsequently the 
ingestion mechanisms and schedules.


Just a few thoughts.

John

sunnyfr wrote:


Hi,

I would like to manage properly multi language search motor,
I would like your advice about what have I done.

Solr1.3
tomcat55

http://www.nabble.com/file/p19954805/schema.xml schema.xml 


Thanks a lot,

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread sunnyfr


What is the problem with the way that I've done, 
Does that's means that there is some which are linked with language that we
won't manage by search,
there is too many language, the application will be for video,
we will manage around 10 language, but in our database we have around  25
language, 
Should i create a core text and others like text_en, text_fr, text_es, and
all the video which are not in this language manage by the search engine
should be stored in text ?

Because even if they are on the english website they should be able if they
enter a french word chien for dog
to find french videos.
I don't know if I'm clear??

and even so text should manage all the other language which are not managed
in the other cores ?? 

thanks


John E. McBride wrote:
 
 Well, it's this section shown below, which would change from geography 
 to geography.
 Parameterise the EnglishPorterFilterFactory and protwords.
 
 You could introduce logic in the front end which asks if num results is 
 zero then makes a call to the english language, but it doesn't make 
 logical sense?  why would a search in the italian language bring up 
 anything in the english index?
 
 I think you need to explain your application in a little more detail.
 
 
 fieldType name=text class=solr.TextField positionIncrementGap=100
 -
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 -
 !--
  in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory 
 synonyms=index_synonyms.txt ignoreCase=true expand=false/

 --
 -
 !--
  Case insensitive stop word removal.
  enablePositionIncrements=true ensures that a 'gap' is left to
  allow for accurate phrase queries.

 --
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1 
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 -
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0 
 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 /fieldType
 
 sunnyfr wrote:
 Hi,

 Thanks guys for your answer, but I don't think I can use multi-core for
 each
 language, 
 because for exemple if somebody is connected from Italia and if there is
 not
 that much Italian's book,
 so by default I will show up few italian books but all the english one as
 well.

 Do you have an example ? 
 I'm quite lost about it,


 John E. McBride wrote:
   
 Fairly nebulous requirements, but I recently was involved in a 
 multilingual search platform.

 The approach, translated to solr 1.3 would be to use multicore - one 
 core per geography.  Then a schema.xml per core, each with a different 
 language in the porter algorithm, stopwords etc - taken from snowball.

 Then on the german front end you make requests to the de core, on the 
 english front end make requests to the english core.

 This is much simpler than sorting every language in the one index, for 
 example german queries will need to be run through the german query 
 filters etc.  If you have all languages in one schema, then you will 
 have to do some front end logic to map the query to the correct field.

 You have failed to consider internationalisation of the query side of 
 the process - your field type merely have analysis filters. 

 Additionally, if the data source for each different geography is 
 different it makes sense to separate the indexes and subsequently the 
 ingestion mechanisms and schedules.

 Just a few thoughts.

 John

 sunnyfr wrote:
 
 Hi,

 I would like to manage properly multi language search motor,
 I would like your advice about what have I done.

 Solr1.3
 tomcat55

 http://www.nabble.com/file/p19954805/schema.xml schema.xml 

 Thanks a lot,

   
   

 

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19955411.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread Kraus, Ralf | pixelhouse GmbH


Hannes Carl Meyer schrieb:

Hi,

is it really neccessary to put it all into one index? You could also use the
Solr MultiCore/MultipleIndexes feature and seperate by language.
  

Is there a good webpage with infos about the multiindex-feature ?
I know http://wiki.apache.org/solr/MultipleIndexes but there is not 
enough info :-(



Greets -Ralf-

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread John E. McBride


In your schema you define each field as follows:

fieldtype name=text_it class=solr.TextField
−
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=Italian/
/analyzer
/fieldtype

etc

However, you have not defined the query filters - if you do not this 
then you will not get any matches for searches in different languages.


for example, in english if you index the sentence the joyful boy played 
tennis, this would typically get stored as joy boy play tennis due to 
the analysis filters. If you then made a query for joyful without 
applying the same filters on the query side you would get no matches.


You will also want to get some multilingual stop words lists from 
snowball website eg http://snowball.tartarus.org/algorithms/german/stop.txt.


sunnyfr wrote:
What is the problem with the way that I've done, 
Does that's means that there is some which are linked with language that we

won't manage by search,
there is too many language, the application will be for video,
we will manage around 10 language, but in our database we have around  25
language, 
Should i create a core text and others like text_en, text_fr, text_es, and

all the video which are not in this language manage by the search engine
should be stored in text ?

Because even if they are on the english website they should be able if they
enter a french word chien for dog
to find french videos.
I don't know if I'm clear??

and even so text should manage all the other language which are not managed
in the other cores ?? 


thanks


John E. McBride wrote:
  
Well, it's this section shown below, which would change from geography 
to geography.

Parameterise the EnglishPorterFilterFactory and protwords.

You could introduce logic in the front end which asks if num results is 
zero then makes a call to the english language, but it doesn't make 
logical sense?  why would a search in the italian language bring up 
anything in the english index?


I think you need to explain your application in a little more detail.


fieldType name=text class=solr.TextField positionIncrementGap=100
-
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
-
!--
 in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=false/
   
--

-
!--
 Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is left to
 allow for accurate phrase queries.
   
--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt enablePositionIncrements=true/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
-
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

sunnyfr wrote:


Hi,

Thanks guys for your answer, but I don't think I can use multi-core for
each
language, 
because for exemple if somebody is connected from Italia and if there is

not
that much Italian's book,
so by default I will show up few italian books but all the english one as
well.

Do you have an example ? 
I'm quite lost about it,



John E. McBride wrote:
  
  
Fairly nebulous requirements, but I recently was involved in a 
multilingual search platform.


The approach, translated to solr 1.3 would be to use multicore - one 
core per geography.  Then a schema.xml per core, each with a different 
language in the porter algorithm, stopwords etc - taken from snowball.


Then on the german front end you make requests to the de core, on the 
english front end make requests to the english core.


This is much simpler than sorting every language in the one index, for 
example german queries will need to be run through the german query 
filters etc.  If you have all languages in one schema, then you will 
have to do some front end logic to map the query to the correct field.


You have failed to consider internationalisation of the query side of 
the process - your field type merely have analysis filters. 

Additionally, if the data source for each

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread Hannes Carl Meyer

Hi Ralf,

you should also check on the example inside the Solr 1.3 download package!

The management of multiple languages inside multiple indexes really makes
sense in terms of configuration efforts (look at your big kahuna
configuration file!), performance and gives an additional scalibility
feature (in fact that you index/search in multiple cores which could be
theoretically placed on different machines).

But, from the perspecitve of the search client you will have to execute
search processes on multiple cores simultaneously. If this is feasible you
should really think about using multiple indexes.

Regards,

Hannes

On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH 
[EMAIL PROTECTED] wrote:

 Hannes Carl Meyer schrieb:

 Hi,

 is it really neccessary to put it all into one index? You could also use
 the
 Solr MultiCore/MultipleIndexes feature and seperate by language.


 Is there a good webpage with infos about the multiindex-feature ?
 I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough
 info :-(


 Greets -Ralf-

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread sunnyfr


But I don't get, if you look in my schema.xml it's what I've done, multi
index?
So I was right ?


Hannes Carl Meyer-2 wrote:
 
 Hi Ralf,
 
 you should also check on the example inside the Solr 1.3 download package!
 
 The management of multiple languages inside multiple indexes really makes
 sense in terms of configuration efforts (look at your big kahuna
 configuration file!), performance and gives an additional scalibility
 feature (in fact that you index/search in multiple cores which could be
 theoretically placed on different machines).
 
 But, from the perspecitve of the search client you will have to execute
 search processes on multiple cores simultaneously. If this is feasible you
 should really think about using multiple indexes.
 
 Regards,
 
 Hannes
 
 On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH 
 [EMAIL PROTECTED] wrote:
 
 Hannes Carl Meyer schrieb:

 Hi,

 is it really neccessary to put it all into one index? You could also use
 the
 Solr MultiCore/MultipleIndexes feature and seperate by language.


 Is there a good webpage with infos about the multiindex-feature ?
 I know http://wiki.apache.org/solr/MultipleIndexes but there is not
 enough
 info :-(


 Greets -Ralf-


 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread Hannes Carl Meyer

Nope, your schema defines a single index with alle languages being stored.
The other way would be MultiCore/MultipleIndexes as described here:
http://wiki.apache.org/solr/CoreAdmin and
http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59

On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote:


 But I don't get, if you look in my schema.xml it's what I've done, multi
 index?
 So I was right ?


 Hannes Carl Meyer-2 wrote:
 
  Hi Ralf,
 
  you should also check on the example inside the Solr 1.3 download
 package!
 
  The management of multiple languages inside multiple indexes really makes
  sense in terms of configuration efforts (look at your big kahuna
  configuration file!), performance and gives an additional scalibility
  feature (in fact that you index/search in multiple cores which could be
  theoretically placed on different machines).
 
  But, from the perspecitve of the search client you will have to execute
  search processes on multiple cores simultaneously. If this is feasible
 you
  should really think about using multiple indexes.
 
  Regards,
 
  Hannes
 
  On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH 
  [EMAIL PROTECTED] wrote:
 
  Hannes Carl Meyer schrieb:
 
  Hi,
 
  is it really neccessary to put it all into one index? You could also
 use
  the
  Solr MultiCore/MultipleIndexes feature and seperate by language.
 
 
  Is there a good webpage with infos about the multiindex-feature ?
  I know http://wiki.apache.org/solr/MultipleIndexes but there is not
  enough
  info :-(
 
 
  Greets -Ralf-
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

2008-10-13 Thread sunnyfr


Ok so actually multi-core is multi-index?
Cheers for this links


Hannes Carl Meyer-2 wrote:
 
 Nope, your schema defines a single index with alle languages being stored.
 The other way would be MultiCore/MultipleIndexes as described here:
 http://wiki.apache.org/solr/CoreAdmin and
 http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59
 
 On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote:
 

 But I don't get, if you look in my schema.xml it's what I've done, multi
 index?
 So I was right ?


 Hannes Carl Meyer-2 wrote:
 
  Hi Ralf,
 
  you should also check on the example inside the Solr 1.3 download
 package!
 
  The management of multiple languages inside multiple indexes really
 makes
  sense in terms of configuration efforts (look at your big kahuna
  configuration file!), performance and gives an additional scalibility
  feature (in fact that you index/search in multiple cores which could be
  theoretically placed on different machines).
 
  But, from the perspecitve of the search client you will have to execute
  search processes on multiple cores simultaneously. If this is feasible
 you
  should really think about using multiple indexes.
 
  Regards,
 
  Hannes
 
  On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH 
  [EMAIL PROTECTED] wrote:
 
  Hannes Carl Meyer schrieb:
 
  Hi,
 
  is it really neccessary to put it all into one index? You could also
 use
  the
  Solr MultiCore/MultipleIndexes feature and seperate by language.
 
 
  Is there a good webpage with infos about the multiindex-feature ?
  I know http://wiki.apache.org/solr/MultipleIndexes but there is not
  enough
  info :-(
 
 
  Greets -Ralf-
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html
 Sent from the Solr - User mailing list archive at Nabble.com.


 
 

-- 
View this message in context: 
http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19957842.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

Re: Multi-language solr1.3 what would you reckon?

20 matches

Site Navigation

Mail list logo

Footer information