Re: Multi-language solr1.3 what would you reckon?
ok MultiCore is handy indeed to don't have this big index wich manage every language, but when you have one modification to do you have to do it on all of them. And the point as well is it's complicate too boost more one language than another one, ie with an Italian search video, if we don't have that much video then it might be more interesting to bring back english one. And if there is some language like Slovakia which are not managed by the website but people can come from there ... so the video will be stored in core0 which will be all language which are not english, spanish, germany .. french. so this kind of garbage core for every language which are not managed ... and I think it might be hard to manage. What do you think? Hannes Carl Meyer-2 wrote: I attached an example for you. The challenge with MultiCore is on the client's search logic. It would help if you know which language the person wants to search through. If not you would have to perform multiple requests to the multiple cores. Ordinary logic would be: 1. search chien in core0 (english) 2. if #1 returned zero results search for chien in core1 (french) --- In your client you could even parallelize the requests to minimize waiting time. *One feature I didn't try yet is the DistributedSearch (and how it will help with multiple cores)*, find it here: http://wiki.apache.org/solr/DistributedSearch Regards, Hannes On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote: Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com. Solr1.3 MultiCore Scenario core0 (french)core1 (english) ... core8 (russian) |schema.xml schema.xml schema.xml |- analyzers |- analyzers|- analyzers |-- FrenchAnalyzer|-- EnglishAnalyzer |-- RussianAnalyzer |-- FrenchStops |-- EnglishStops|-- RussianStops |- fields |- fields |- fields |-- title |-- title |-- title |-- description |-- description |-- description |-- id|-- id |-- id -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19991949.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Hi, Sorry I didnt get your example can you send me it again? thanks, Hannes Carl Meyer-2 wrote: I attached an example for you. The challenge with MultiCore is on the client's search logic. It would help if you know which language the person wants to search through. If not you would have to perform multiple requests to the multiple cores. Ordinary logic would be: 1. search chien in core0 (english) 2. if #1 returned zero results search for chien in core1 (french) --- In your client you could even parallelize the requests to minimize waiting time. *One feature I didn't try yet is the DistributedSearch (and how it will help with multiple cores)*, find it here: http://wiki.apache.org/solr/DistributedSearch Regards, Hannes On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote: Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com. Solr1.3 MultiCore Scenario core0 (french)core1 (english) ... core8 (russian) |schema.xml schema.xml schema.xml |- analyzers |- analyzers|- analyzers |-- FrenchAnalyzer|-- EnglishAnalyzer |-- RussianAnalyzer |-- FrenchStops |-- EnglishStops|-- RussianStops |- fields |- fields |- fields |-- title |-- title |-- title |-- description |-- description |-- description |-- id|-- id |-- id -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19990348.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
It should work, but if you want to handle multiple languages in ONE index you end up with a lot of filters and fields handled with different analyzers in a SINGLE configuration. On Wed, Oct 15, 2008 at 3:03 PM, sunnyfr [EMAIL PROTECTED] wrote: But about stopwords and stemming, is it a real issue if on one core I've several stemming and stopwords(with a different name), it should work? Hannes Carl Meyer-2 wrote: Hi, yes, if you don't handle (stopwords, stemming etc.) a specific language you should create a general core. In my project I'm supporting 10 languages and if I get unsupported languages it is going to be logged and discarded right away! Boosting on multiple cores is indeed a problem. An idea would be to merge the result sets from core0 and core1 and sort by scoring? Regards Hannes On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr [EMAIL PROTECTED] wrote: ok MultiCore is handy indeed to don't have this big index wich manage every language, but when you have one modification to do you have to do it on all of them. And the point as well is it's complicate too boost more one language than another one, ie with an Italian search video, if we don't have that much video then it might be more interesting to bring back english one. And if there is some language like Slovakia which are not managed by the website but people can come from there ... so the video will be stored in core0 which will be all language which are not english, spanish, germany .. french. so this kind of garbage core for every language which are not managed ... and I think it might be hard to manage. What do you think? Hannes Carl Meyer-2 wrote: I attached an example for you. The challenge with MultiCore is on the client's search logic. It would help if you know which language the person wants to search through. If not you would have to perform multiple requests to the multiple cores. Ordinary logic would be: 1. search chien in core0 (english) 2. if #1 returned zero results search for chien in core1 (french) --- In your client you could even parallelize the requests to minimize waiting time. *One feature I didn't try yet is the DistributedSearch (and how it will help with multiple cores)*, find it here: http://wiki.apache.org/solr/DistributedSearch Regards, Hannes On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote: Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
But about stopwords and stemming, is it a real issue if on one core I've several stemming and stopwords(with a different name), it should work? Hannes Carl Meyer-2 wrote: Hi, yes, if you don't handle (stopwords, stemming etc.) a specific language you should create a general core. In my project I'm supporting 10 languages and if I get unsupported languages it is going to be logged and discarded right away! Boosting on multiple cores is indeed a problem. An idea would be to merge the result sets from core0 and core1 and sort by scoring? Regards Hannes On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr [EMAIL PROTECTED] wrote: ok MultiCore is handy indeed to don't have this big index wich manage every language, but when you have one modification to do you have to do it on all of them. And the point as well is it's complicate too boost more one language than another one, ie with an Italian search video, if we don't have that much video then it might be more interesting to bring back english one. And if there is some language like Slovakia which are not managed by the website but people can come from there ... so the video will be stored in core0 which will be all language which are not english, spanish, germany .. french. so this kind of garbage core for every language which are not managed ... and I think it might be hard to manage. What do you think? Hannes Carl Meyer-2 wrote: I attached an example for you. The challenge with MultiCore is on the client's search logic. It would help if you know which language the person wants to search through. If not you would have to perform multiple requests to the multiple cores. Ordinary logic would be: 1. search chien in core0 (english) 2. if #1 returned zero results search for chien in core1 (french) --- In your client you could even parallelize the requests to minimize waiting time. *One feature I didn't try yet is the DistributedSearch (and how it will help with multiple cores)*, find it here: http://wiki.apache.org/solr/DistributedSearch Regards, Hannes On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote: Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com. Solr1.3 MultiCore Scenario core0 (french)core1 (english) ... core8 (russian) |schema.xml schema.xml schema.xml |- analyzers |- analyzers|- analyzers |-- FrenchAnalyzer|-- EnglishAnalyzer |-- RussianAnalyzer |-- FrenchStops |-- EnglishStops |--
Re: Multi-language solr1.3 what would you reckon?
Hi, yes, if you don't handle (stopwords, stemming etc.) a specific language you should create a general core. In my project I'm supporting 10 languages and if I get unsupported languages it is going to be logged and discarded right away! Boosting on multiple cores is indeed a problem. An idea would be to merge the result sets from core0 and core1 and sort by scoring? Regards Hannes On Wed, Oct 15, 2008 at 1:50 PM, sunnyfr [EMAIL PROTECTED] wrote: ok MultiCore is handy indeed to don't have this big index wich manage every language, but when you have one modification to do you have to do it on all of them. And the point as well is it's complicate too boost more one language than another one, ie with an Italian search video, if we don't have that much video then it might be more interesting to bring back english one. And if there is some language like Slovakia which are not managed by the website but people can come from there ... so the video will be stored in core0 which will be all language which are not english, spanish, germany .. french. so this kind of garbage core for every language which are not managed ... and I think it might be hard to manage. What do you think? Hannes Carl Meyer-2 wrote: I attached an example for you. The challenge with MultiCore is on the client's search logic. It would help if you know which language the person wants to search through. If not you would have to perform multiple requests to the multiple cores. Ordinary logic would be: 1. search chien in core0 (english) 2. if #1 returned zero results search for chien in core1 (french) --- In your client you could even parallelize the requests to minimize waiting time. *One feature I didn't try yet is the DistributedSearch (and how it will help with multiple cores)*, find it here: http://wiki.apache.org/solr/DistributedSearch Regards, Hannes On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote: Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com. Solr1.3 MultiCore Scenario core0 (french)core1 (english) ... core8 (russian) |schema.xml schema.xml schema.xml |- analyzers |- analyzers|- analyzers |-- FrenchAnalyzer|-- EnglishAnalyzer |-- RussianAnalyzer |-- FrenchStops |-- EnglishStops|-- RussianStops |- fields |- fields |- fields |-- title |-- title |-- title |-- description |-- description |-- description
Re: Multi-language solr1.3 what would you reckon?
is it ??? sunnyfr wrote: Ok so actually multi-core is multi-index? Cheers for this links Hannes Carl Meyer-2 wrote: Nope, your schema defines a single index with alle languages being stored. The other way would be MultiCore/MultipleIndexes as described here: http://wiki.apache.org/solr/CoreAdmin and http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59 On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote: But I don't get, if you look in my schema.xml it's what I've done, multi index? So I was right ? Hannes Carl Meyer-2 wrote: Hi Ralf, you should also check on the example inside the Solr 1.3 download package! The management of multiple languages inside multiple indexes really makes sense in terms of configuration efforts (look at your big kahuna configuration file!), performance and gives an additional scalibility feature (in fact that you index/search in multiple cores which could be theoretically placed on different machines). But, from the perspecitve of the search client you will have to execute search processes on multiple cores simultaneously. If this is feasible you should really think about using multiple indexes. Regards, Hannes On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH [EMAIL PROTECTED] wrote: Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf- -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19970307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
I attached an example for you. The challenge with MultiCore is on the client's search logic. It would help if you know which language the person wants to search through. If not you would have to perform multiple requests to the multiple cores. Ordinary logic would be: 1. search chien in core0 (english) 2. if #1 returned zero results search for chien in core1 (french) --- In your client you could even parallelize the requests to minimize waiting time. *One feature I didn't try yet is the DistributedSearch (and how it will help with multiple cores)*, find it here: http://wiki.apache.org/solr/DistributedSearch Regards, Hannes On Tue, Oct 14, 2008 at 4:26 PM, sunnyfr [EMAIL PROTECTED] wrote: Thanks for this explanation, but just to get it properly : One core per language, so with the same field and schema just the language part and management which is different? and one core which consider every language which are not managed by solr like russian or ??? so different request to the dabase ok Just don't get really when you look for the word 'chien' on the english website I want get back result from french video because chien is french so if it doesn't find any english video with chien I need my french video then. Exactly the same for user's core, if somebody look for 'chien' and there is one user with exactly the same username I would like to show it up. thanks for your time, really, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19974618.html Sent from the Solr - User mailing list archive at Nabble.com. Solr1.3 MultiCore Scenario core0 (french) core1 (english) ... core8 (russian) |schema.xml schema.xml schema.xml |- analyzers|- analyzers|- analyzers |-- FrenchAnalyzer |-- EnglishAnalyzer |-- RussianAnalyzer |-- FrenchStops |-- EnglishStops|-- RussianStops |- fields |- fields |- fields |-- title |-- title |-- title |-- description |-- description |-- description |-- id |-- id |-- id
Re: Multi-language solr1.3 what would you reckon?
Sorry, yes MultiCore means multiple indexes! Regards, Hannes On Tue, Oct 14, 2008 at 11:53 AM, sunnyfr [EMAIL PROTECTED] wrote: is it ??? sunnyfr wrote: Ok so actually multi-core is multi-index? Cheers for this links Hannes Carl Meyer-2 wrote: Nope, your schema defines a single index with alle languages being stored. The other way would be MultiCore/MultipleIndexes as described here: http://wiki.apache.org/solr/CoreAdmin and http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59 On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote: But I don't get, if you look in my schema.xml it's what I've done, multi index? So I was right ? Hannes Carl Meyer-2 wrote: Hi Ralf, you should also check on the example inside the Solr 1.3 download package! The management of multiple languages inside multiple indexes really makes sense in terms of configuration efforts (look at your big kahuna configuration file!), performance and gives an additional scalibility feature (in fact that you index/search in multiple cores which could be theoretically placed on different machines). But, from the perspecitve of the search client you will have to execute search processes on multiple cores simultaneously. If this is feasible you should really think about using multiple indexes. Regards, Hannes On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH [EMAIL PROTECTED] wrote: Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf- -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19970307.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Regards, Hannes On Mon, Oct 13, 2008 at 3:20 PM, sunnyfr [EMAIL PROTECTED] wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19954805.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot,
Re: Multi-language solr1.3 what would you reckon?
Hi, Thanks guys for your answer, but I don't think I can use multi-core for each language, because for exemple if somebody is connected from Italia and if there is not that much Italian's book, so by default I will show up few italian books but all the english one as well. Do you have an example ? I'm quite lost about it, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19955092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Well, it's this section shown below, which would change from geography to geography. Parameterise the EnglishPorterFilterFactory and protwords. You could introduce logic in the front end which asks if num results is zero then makes a call to the english language, but it doesn't make logical sense? why would a search in the italian language bring up anything in the english index? I think you need to explain your application in a little more detail. fieldType name=text class=solr.TextField positionIncrementGap=100 - analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ - !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- - !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer - analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType sunnyfr wrote: Hi, Thanks guys for your answer, but I don't think I can use multi-core for each language, because for exemple if somebody is connected from Italia and if there is not that much Italian's book, so by default I will show up few italian books but all the english one as well. Do you have an example ? I'm quite lost about it, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot,
Re: Multi-language solr1.3 what would you reckon?
What is the problem with the way that I've done, Does that's means that there is some which are linked with language that we won't manage by search, there is too many language, the application will be for video, we will manage around 10 language, but in our database we have around 25 language, Should i create a core text and others like text_en, text_fr, text_es, and all the video which are not in this language manage by the search engine should be stored in text ? Because even if they are on the english website they should be able if they enter a french word chien for dog to find french videos. I don't know if I'm clear?? and even so text should manage all the other language which are not managed in the other cores ?? thanks John E. McBride wrote: Well, it's this section shown below, which would change from geography to geography. Parameterise the EnglishPorterFilterFactory and protwords. You could introduce logic in the front end which asks if num results is zero then makes a call to the english language, but it doesn't make logical sense? why would a search in the italian language bring up anything in the english index? I think you need to explain your application in a little more detail. fieldType name=text class=solr.TextField positionIncrementGap=100 - analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ - !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- - !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer - analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType sunnyfr wrote: Hi, Thanks guys for your answer, but I don't think I can use multi-core for each language, because for exemple if somebody is connected from Italia and if there is not that much Italian's book, so by default I will show up few italian books but all the english one as well. Do you have an example ? I'm quite lost about it, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each different geography is different it makes sense to separate the indexes and subsequently the ingestion mechanisms and schedules. Just a few thoughts. John sunnyfr wrote: Hi, I would like to manage properly multi language search motor, I would like your advice about what have I done. Solr1.3 tomcat55 http://www.nabble.com/file/p19954805/schema.xml schema.xml Thanks a lot, -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19955411.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf-
Re: Multi-language solr1.3 what would you reckon?
In your schema you define each field as follows: fieldtype name=text_it class=solr.TextField − analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Italian/ /analyzer /fieldtype etc However, you have not defined the query filters - if you do not this then you will not get any matches for searches in different languages. for example, in english if you index the sentence the joyful boy played tennis, this would typically get stored as joy boy play tennis due to the analysis filters. If you then made a query for joyful without applying the same filters on the query side you would get no matches. You will also want to get some multilingual stop words lists from snowball website eg http://snowball.tartarus.org/algorithms/german/stop.txt. sunnyfr wrote: What is the problem with the way that I've done, Does that's means that there is some which are linked with language that we won't manage by search, there is too many language, the application will be for video, we will manage around 10 language, but in our database we have around 25 language, Should i create a core text and others like text_en, text_fr, text_es, and all the video which are not in this language manage by the search engine should be stored in text ? Because even if they are on the english website they should be able if they enter a french word chien for dog to find french videos. I don't know if I'm clear?? and even so text should manage all the other language which are not managed in the other cores ?? thanks John E. McBride wrote: Well, it's this section shown below, which would change from geography to geography. Parameterise the EnglishPorterFilterFactory and protwords. You could introduce logic in the front end which asks if num results is zero then makes a call to the english language, but it doesn't make logical sense? why would a search in the italian language bring up anything in the english index? I think you need to explain your application in a little more detail. fieldType name=text class=solr.TextField positionIncrementGap=100 - analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ - !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- - !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer - analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType sunnyfr wrote: Hi, Thanks guys for your answer, but I don't think I can use multi-core for each language, because for exemple if somebody is connected from Italia and if there is not that much Italian's book, so by default I will show up few italian books but all the english one as well. Do you have an example ? I'm quite lost about it, John E. McBride wrote: Fairly nebulous requirements, but I recently was involved in a multilingual search platform. The approach, translated to solr 1.3 would be to use multicore - one core per geography. Then a schema.xml per core, each with a different language in the porter algorithm, stopwords etc - taken from snowball. Then on the german front end you make requests to the de core, on the english front end make requests to the english core. This is much simpler than sorting every language in the one index, for example german queries will need to be run through the german query filters etc. If you have all languages in one schema, then you will have to do some front end logic to map the query to the correct field. You have failed to consider internationalisation of the query side of the process - your field type merely have analysis filters. Additionally, if the data source for each
Re: Multi-language solr1.3 what would you reckon?
Hi Ralf, you should also check on the example inside the Solr 1.3 download package! The management of multiple languages inside multiple indexes really makes sense in terms of configuration efforts (look at your big kahuna configuration file!), performance and gives an additional scalibility feature (in fact that you index/search in multiple cores which could be theoretically placed on different machines). But, from the perspecitve of the search client you will have to execute search processes on multiple cores simultaneously. If this is feasible you should really think about using multiple indexes. Regards, Hannes On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH [EMAIL PROTECTED] wrote: Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf-
Re: Multi-language solr1.3 what would you reckon?
But I don't get, if you look in my schema.xml it's what I've done, multi index? So I was right ? Hannes Carl Meyer-2 wrote: Hi Ralf, you should also check on the example inside the Solr 1.3 download package! The management of multiple languages inside multiple indexes really makes sense in terms of configuration efforts (look at your big kahuna configuration file!), performance and gives an additional scalibility feature (in fact that you index/search in multiple cores which could be theoretically placed on different machines). But, from the perspecitve of the search client you will have to execute search processes on multiple cores simultaneously. If this is feasible you should really think about using multiple indexes. Regards, Hannes On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH [EMAIL PROTECTED] wrote: Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf- -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Nope, your schema defines a single index with alle languages being stored. The other way would be MultiCore/MultipleIndexes as described here: http://wiki.apache.org/solr/CoreAdmin and http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59 On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote: But I don't get, if you look in my schema.xml it's what I've done, multi index? So I was right ? Hannes Carl Meyer-2 wrote: Hi Ralf, you should also check on the example inside the Solr 1.3 download package! The management of multiple languages inside multiple indexes really makes sense in terms of configuration efforts (look at your big kahuna configuration file!), performance and gives an additional scalibility feature (in fact that you index/search in multiple cores which could be theoretically placed on different machines). But, from the perspecitve of the search client you will have to execute search processes on multiple cores simultaneously. If this is feasible you should really think about using multiple indexes. Regards, Hannes On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH [EMAIL PROTECTED] wrote: Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf- -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multi-language solr1.3 what would you reckon?
Ok so actually multi-core is multi-index? Cheers for this links Hannes Carl Meyer-2 wrote: Nope, your schema defines a single index with alle languages being stored. The other way would be MultiCore/MultipleIndexes as described here: http://wiki.apache.org/solr/CoreAdmin and http://wiki.apache.org/solr/MultipleIndexes#head-e517417ef9b96e32168b2cf35ab6ff393f360d59 On Mon, Oct 13, 2008 at 5:05 PM, sunnyfr [EMAIL PROTECTED] wrote: But I don't get, if you look in my schema.xml it's what I've done, multi index? So I was right ? Hannes Carl Meyer-2 wrote: Hi Ralf, you should also check on the example inside the Solr 1.3 download package! The management of multiple languages inside multiple indexes really makes sense in terms of configuration efforts (look at your big kahuna configuration file!), performance and gives an additional scalibility feature (in fact that you index/search in multiple cores which could be theoretically placed on different machines). But, from the perspecitve of the search client you will have to execute search processes on multiple cores simultaneously. If this is feasible you should really think about using multiple indexes. Regards, Hannes On Mon, Oct 13, 2008 at 4:14 PM, Kraus, Ralf | pixelhouse GmbH [EMAIL PROTECTED] wrote: Hannes Carl Meyer schrieb: Hi, is it really neccessary to put it all into one index? You could also use the Solr MultiCore/MultipleIndexes feature and seperate by language. Is there a good webpage with infos about the multiindex-feature ? I know http://wiki.apache.org/solr/MultipleIndexes but there is not enough info :-( Greets -Ralf- -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19956421.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Multi-language-solr1.3-what-would-you-reckon--tp19954805p19957842.html Sent from the Solr - User mailing list archive at Nabble.com.