Re: How to use polish stemmer - Stempel - in schema.xml?
After all I choose hunspell-solr as a Polish language interpreter. It understands Polish and is much easier to install. But look out! I does not work with current nightly build - works good with solr 1.4.1! It just works well, and hey! I got Ukrainian out of the box too. I am thinking of replacing all required lanugages' SnowballPorterFilters with *.aff and *.dic support. Thanks for the help everyone! On Wed, 2010-11-24 at 19:00 +0100, Jakub Godawa wrote: Yes, from the current nightly release setting up Stempel is quite easy. All I did was: svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr cd lucene-solr/solr ant example cp ./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar ./lib cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar ./lib in solrschema.xml lib path=../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar / lib path=../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar / in schema.xml !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory / filter class=solr.StempelPolishStemFilterFactory language=Polish / /analyzer /fieldType The end. Anyway. I don't know if that is Polish stemmer or bad configurated fieldType, but the results are just wrong. example: index for type text_pl: bilety query for type text_pl: bilet Index Analyzer org.apache.solr.analysis.StempelPolishStemFilterFactory {language=Polish, luceneMatchVersion=LUCENE_24} term position 1 term text bilić term type word source start,end 0,6 payload Query Analyzer org.apache.solr.analysis.StempelPolishStemFilterFactory {language=Polish, luceneMatchVersion=LUCENE_24} term position 1 term text binąć term type word source start,end 0,5 payload But I imagine the result as: bilet and bilet which are the base. Any clues how to make it work like Polish? Maybe someone has good experience with hunspell-solr and Polish dictonaries? Thanks for letting me know! Cheers, Jakub Godawa. On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote: https://issues.apache.org/jira/browse/SOLR-2237 On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa jakub.god...@gmail.com wrote: I tried to reach the autors twice, but with no luck. I've seen some posts where people finally were able to lunch it (without much pain). I don't know. If any pro would be so nice to try to run the stempel on his/her machine and paste me some verbose step by step solution I would really appreciate. Cheers, Jakub Godawa. 2010/11/13 Lance Norskog goks...@gmail.com: I don't know of the Stempel jar includes the Java source. At this point I think you should ask the author to Stempel to make a Solr front-end for it. It's very simple for him. Jakub Godawa wrote: Am I not doing it in the point no 4? I am compiling all the folder that was extracted before, but now with that new class file. 2010/11/12 Lance Norskoggoks...@gmail.com: I think you have to compile all of the stempel source including your filter factory into one jar at the same time. Everybody does this; I don't know how different Java versions make class file binaries. On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawajakub.god...@gmail.com wrote: Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; public class StempelTokenFilterFactory extends BaseTokenFilterFactory { public StempelFilter create(TokenStream input) { return new StempelFilter(input); } } 2. Then I put the file to the extracted stempel-1.0.jar in ./org/getopt/solr/analysis/ 3. Then I created a class from it: jar -cf StempelTokenFilterFactory.class StempelFilterFactory.java 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar -C ./stempel-1.0/ . 5. Then in schema.xml I've put: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.solr.analysis.StempelTokenFilterFactory / /analyzer /fieldType 6. I started the solr server and I recieved the following error: 2010-11-11 11:50:56 org.apache.solr.common.SolrException log SEVERE: java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org
Re: How to use polish stemmer - Stempel - in schema.xml?
Yes, from the current nightly release setting up Stempel is quite easy. All I did was: svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr cd lucene-solr/solr ant example cp ./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar ./lib cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar ./lib in solrschema.xml lib path=../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar / lib path=../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar / in schema.xml !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory / filter class=solr.StempelPolishStemFilterFactory language=Polish / /analyzer /fieldType The end. Anyway. I don't know if that is Polish stemmer or bad configurated fieldType, but the results are just wrong. example: index for type text_pl: bilety query for type text_pl: bilet Index Analyzer org.apache.solr.analysis.StempelPolishStemFilterFactory {language=Polish, luceneMatchVersion=LUCENE_24} term position 1 term text bilić term type word source start,end 0,6 payload Query Analyzer org.apache.solr.analysis.StempelPolishStemFilterFactory {language=Polish, luceneMatchVersion=LUCENE_24} term position 1 term text binąć term type word source start,end 0,5 payload But I imagine the result as: bilet and bilet which are the base. Any clues how to make it work like Polish? Maybe someone has good experience with hunspell-solr and Polish dictonaries? Thanks for letting me know! Cheers, Jakub Godawa. On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote: https://issues.apache.org/jira/browse/SOLR-2237 On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa jakub.god...@gmail.com wrote: I tried to reach the autors twice, but with no luck. I've seen some posts where people finally were able to lunch it (without much pain). I don't know. If any pro would be so nice to try to run the stempel on his/her machine and paste me some verbose step by step solution I would really appreciate. Cheers, Jakub Godawa. 2010/11/13 Lance Norskog goks...@gmail.com: I don't know of the Stempel jar includes the Java source. At this point I think you should ask the author to Stempel to make a Solr front-end for it. It's very simple for him. Jakub Godawa wrote: Am I not doing it in the point no 4? I am compiling all the folder that was extracted before, but now with that new class file. 2010/11/12 Lance Norskoggoks...@gmail.com: I think you have to compile all of the stempel source including your filter factory into one jar at the same time. Everybody does this; I don't know how different Java versions make class file binaries. On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawajakub.god...@gmail.com wrote: Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; public class StempelTokenFilterFactory extends BaseTokenFilterFactory { public StempelFilter create(TokenStream input) { return new StempelFilter(input); } } 2. Then I put the file to the extracted stempel-1.0.jar in ./org/getopt/solr/analysis/ 3. Then I created a class from it: jar -cf StempelTokenFilterFactory.class StempelFilterFactory.java 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar -C ./stempel-1.0/ . 5. Then in schema.xml I've put: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.solr.analysis.StempelTokenFilterFactory / /analyzer /fieldType 6. I started the solr server and I recieved the following error: 2010-11-11 11:50:56 org.apache.solr.common.SolrException log SEVERE: java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/getopt/solr/analysis/StempelTokenFilterFactory at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ... Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars, I googled on that error but with no answer gave me idea what is wrong in my .java file. Please help, as I believe I am close to the end of that subject. Cheers, Jakub Godawa. 2010/11/3 Lance Norskoggoks...@gmail.com: Here's the problem: Solr is a little dumb about
Re: How to use polish stemmer - Stempel - in schema.xml?
On Wed, 2010-11-24 at 19:00 +0100, Jakub Godawa wrote: Yes, from the current nightly release setting up Stempel is quite easy. Thanks to Rober Muir :)
Re: How to use polish stemmer - Stempel - in schema.xml?
I tried to reach the autors twice, but with no luck. I've seen some posts where people finally were able to lunch it (without much pain). I don't know. If any pro would be so nice to try to run the stempel on his/her machine and paste me some verbose step by step solution I would really appreciate. Cheers, Jakub Godawa. 2010/11/13 Lance Norskog goks...@gmail.com: I don't know of the Stempel jar includes the Java source. At this point I think you should ask the author to Stempel to make a Solr front-end for it. It's very simple for him. Jakub Godawa wrote: Am I not doing it in the point no 4? I am compiling all the folder that was extracted before, but now with that new class file. 2010/11/12 Lance Norskoggoks...@gmail.com: I think you have to compile all of the stempel source including your filter factory into one jar at the same time. Everybody does this; I don't know how different Java versions make class file binaries. On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawajakub.god...@gmail.com wrote: Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; public class StempelTokenFilterFactory extends BaseTokenFilterFactory { public StempelFilter create(TokenStream input) { return new StempelFilter(input); } } 2. Then I put the file to the extracted stempel-1.0.jar in ./org/getopt/solr/analysis/ 3. Then I created a class from it: jar -cf StempelTokenFilterFactory.class StempelFilterFactory.java 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar -C ./stempel-1.0/ . 5. Then in schema.xml I've put: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.solr.analysis.StempelTokenFilterFactory / /analyzer /fieldType 6. I started the solr server and I recieved the following error: 2010-11-11 11:50:56 org.apache.solr.common.SolrException log SEVERE: java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/getopt/solr/analysis/StempelTokenFilterFactory at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ... Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars, I googled on that error but with no answer gave me idea what is wrong in my .java file. Please help, as I believe I am close to the end of that subject. Cheers, Jakub Godawa. 2010/11/3 Lance Norskoggoks...@gmail.com: Here's the problem: Solr is a little dumb about these Filter classes, and so you have to make a Factory object for the Stempel Filter. There are a lot of other FilterFactory classes. You would have to just copy one and change the names to Stempel and it might actually work. This will take some Solr programming- perhaps the author can help you? On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawajakub.god...@gmail.com wrote: Sorry, I am not Java programmer at all. I would appreciate more verbose (or step by step) help. 2010/11/2 Bernd Fehlingbernd.fehl...@uni-bielefeld.de: So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.class Gener.class MultiTrie2.class Optimizer2.class Reduce.class Row.class TestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehlingbernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you
simple dismax with OR
Hi! I have my dismax that is searching through two fields. requestHandler name=en class=solr.searchHandler lst name=defaults str name=defTypedismax/str str name=qt name_en^1.0 answe_en^1.5 /str /lst /requestHandler Now I have a document that has Various appliances can be installed here in the answen_en field, indexed with English analyzer. When I query installation I have the result of that doc, which is OK.\ When I query How to install something? I get nothing which is bad, because there is match highligthed on the analysis page. I've read that dismax don't read the q.op (query default operator). How should I do my dismax to handle that? Cheers, Jakub Godawa.
Re: simple dismax with OR
thank you, that works well. 2010/11/15 Matti Oinas matti.oi...@gmail.com: Define mm(Minimum 'should' match) value for dismax. The default is 100% so every clause must match. http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29 2010/11/15 Jakub Godawa jakub.god...@gmail.com: Hi! I have my dismax that is searching through two fields. requestHandler name=en class=solr.searchHandler lst name=defaults str name=defTypedismax/str str name=qt name_en^1.0 answe_en^1.5 /str /lst /requestHandler Now I have a document that has Various appliances can be installed here in the answen_en field, indexed with English analyzer. When I query installation I have the result of that doc, which is OK.\ When I query How to install something? I get nothing which is bad, because there is match highligthed on the analysis page. I've read that dismax don't read the q.op (query default operator). How should I do my dismax to handle that? Cheers, Jakub Godawa.
Re: How to use polish stemmer - Stempel - in schema.xml?
Am I not doing it in the point no 4? I am compiling all the folder that was extracted before, but now with that new class file. 2010/11/12 Lance Norskog goks...@gmail.com: I think you have to compile all of the stempel source including your filter factory into one jar at the same time. Everybody does this; I don't know how different Java versions make class file binaries. On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa jakub.god...@gmail.com wrote: Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; public class StempelTokenFilterFactory extends BaseTokenFilterFactory { public StempelFilter create(TokenStream input) { return new StempelFilter(input); } } 2. Then I put the file to the extracted stempel-1.0.jar in ./org/getopt/solr/analysis/ 3. Then I created a class from it: jar -cf StempelTokenFilterFactory.class StempelFilterFactory.java 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar -C ./stempel-1.0/ . 5. Then in schema.xml I've put: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.solr.analysis.StempelTokenFilterFactory / /analyzer /fieldType 6. I started the solr server and I recieved the following error: 2010-11-11 11:50:56 org.apache.solr.common.SolrException log SEVERE: java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/getopt/solr/analysis/StempelTokenFilterFactory at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ... Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars, I googled on that error but with no answer gave me idea what is wrong in my .java file. Please help, as I believe I am close to the end of that subject. Cheers, Jakub Godawa. 2010/11/3 Lance Norskog goks...@gmail.com: Here's the problem: Solr is a little dumb about these Filter classes, and so you have to make a Factory object for the Stempel Filter. There are a lot of other FilterFactory classes. You would have to just copy one and change the names to Stempel and it might actually work. This will take some Solr programming- perhaps the author can help you? On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa jakub.god...@gmail.com wrote: Sorry, I am not Java programmer at all. I would appreciate more verbose (or step by step) help. 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.class Gener.class MultiTrie2.class Optimizer2.class Reduce.class Row.class TestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting
Re: How to use polish stemmer - Stempel - in schema.xml?
Hi! Sorry for such a break, but I was moving house... anyway: 1. I took the ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java file and modified it (named as StempelFilterFactory.java) in Vim that way: package org.getopt.solr.analysis; import org.apache.lucene.analysis.TokenStream; import org.apache.lucene.analysis.standard.StandardFilter; public class StempelTokenFilterFactory extends BaseTokenFilterFactory { public StempelFilter create(TokenStream input) { return new StempelFilter(input); } } 2. Then I put the file to the extracted stempel-1.0.jar in ./org/getopt/solr/analysis/ 3. Then I created a class from it: jar -cf StempelTokenFilterFactory.class StempelFilterFactory.java 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar -C ./stempel-1.0/ . 5. Then in schema.xml I've put: fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.solr.analysis.StempelTokenFilterFactory / /analyzer /fieldType 6. I started the solr server and I recieved the following error: 2010-11-11 11:50:56 org.apache.solr.common.SolrException log SEVERE: java.lang.ClassFormatError: Incompatible magic value 1347093252 in class file org/getopt/solr/analysis/StempelTokenFilterFactory at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) ... Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars, I googled on that error but with no answer gave me idea what is wrong in my .java file. Please help, as I believe I am close to the end of that subject. Cheers, Jakub Godawa. 2010/11/3 Lance Norskog goks...@gmail.com: Here's the problem: Solr is a little dumb about these Filter classes, and so you have to make a Factory object for the Stempel Filter. There are a lot of other FilterFactory classes. You would have to just copy one and change the names to Stempel and it might actually work. This will take some Solr programming- perhaps the author can help you? On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa jakub.god...@gmail.com wrote: Sorry, I am not Java programmer at all. I would appreciate more verbose (or step by step) help. 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.class Gener.class MultiTrie2.class Optimizer2.class Reduce.class Row.class TestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010
Re: How to use polish stemmer - Stempel - in schema.xml?
Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !--filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regards, Bernd Am 28.10.2010 14:56, schrieb Jakub Godawa: Hi! There is a polish stemmer http://www.getopt.org/stempel/ and I have problems connecting it with solr 1.4.1 Questions: 1. Where EXACTLY do I put stemper-1.0.jar file? 2. How do I register the file, so I can build a fieldType like: fieldType name=text_pl class=solr.TextField analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/ /fieldType 3. Is that the right approach to make it work? Thanks for verbose explanation, Jakub.
Re: How to use polish stemmer - Stempel - in schema.xml?
Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Regards, Bernd Am 28.10.2010 14:56, schrieb Jakub Godawa: Hi! There is a polish stemmer http://www.getopt.org/stempel/ and I have problems connecting it with solr 1.4.1 Questions: 1. Where EXACTLY do I put stemper-1.0.jar file? 2. How do I register the file, so I can build a fieldType like: fieldType name=text_pl class=solr.TextField analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/ /fieldType 3. Is that the right approach to make it work? Thanks for verbose explanation, Jakub.
Re: How to use polish stemmer - Stempel - in schema.xml?
This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.classGener.class MultiTrie2.class Optimizer2.class Reduce.classRow.classTestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class
Re: How to use polish stemmer - Stempel - in schema.xml?
Sorry, I am not Java programmer at all. I would appreciate more verbose (or step by step) help. 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: So you call org.getopt.solr.analysis.StempelTokenFilterFactory. In this case I would assume a file StempelTokenFilterFactory.class in your directory org/getopt/solr/analysis/. And a class which extends the BaseTokenFilterFactory rigth? ... public class StempelTokenFilterFactory extends BaseTokenFilterFactory implements ResourceLoaderAware { ... Am 02.11.2010 14:20, schrieb Jakub Godawa: This is what stempel-1.0.jar consist of after jar -xf: jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/ org/: egothor getopt org/egothor: stemmer org/egothor/stemmer: Cell.class Diff.class Gener.class MultiTrie2.class Optimizer2.class Reduce.class Row.class TestAll.class TestLoad.class Trie$StrEnum.class Compile.class DiffIt.class Lift.class MultiTrie.class Optimizer.class Reduce$Remap.class Stock.class Test.class Trie.class org/getopt: stempel org/getopt/stempel: Benchmark.class lucene Stemmer.class org/getopt/stempel/lucene: StempelAnalyzer.class StempelFilter.class jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/ META-INF/: MANIFEST.MF jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res res: tables res/tables: readme.txt stemmer_1000.out stemmer_100.out stemmer_2000.out stemmer_200.out stemmer_500.out stemmer_700.out 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, if you unzip your stempel-1.0.jar do you have the required directory structure and file in there? org/getopt/stempel/lucene/StempelFilter.class Regards, Bernd Am 02.11.2010 13:54, schrieb Jakub Godawa: Erick I've put the jar files like that before. I also added the directive and put the file in instanceDir/lib What is still a problem is that even the files are loaded: 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar' to classloader I am not able to use the FilterFactory... maybe I am attempting it in a wrong way? Cheers, Jakub Godawa. 2010/11/2 Erick Erickson erickerick...@gmail.com: The polish stemmer jar file needs to be findable by Solr, if you copy it to solr_home/lib and restart solr you should be set. Alternatively, you can add another lib directive to the solrconfig.xml file (there are several examples in that file already). I'm a little confused about not being able to find TokenFilter, is that still a problem? HTH Erick On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote: Thank you Bernd! I couldn't make it run though. Here is my problem: 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a directive: lib path=../lib/stempel-1.0.jar / 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType: (...) !-- Polish -- fieldType name=text_pl class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=org.getopt.stempel.lucene.StempelFilter / !-- filter class=org.getopt.solr.analysis.StempelTokenFilterFactory protected=protwords.txt / -- /analyzer /fieldType (...) 4. jar file is loaded but I got an error: SEVERE: Could not start SOLR. Check solr/home property java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:634) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) (...) 5. Different class gave me that one: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.getopt.solr.analysis.StempelTokenFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390) (...) Question is: How to make fieldType / and filter / work with that Stempel? :) Cheers, Jakub Godawa. 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de: Hi Jakub, I have ported the KStemmer for use in most recent Solr trunk version. My stemmer is located in the lib directory of Solr solr/lib/KStemmer-2.00.jar because it belongs to Solr. Write it as FilterFactory and use it as Filter like: filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory protected=protwords.txt / This is how my fieldType looks like: fieldType name=text_kstem class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter
How to use polish stemmer - Stempel - in schema.xml?
Hi! There is a polish stemmer http://www.getopt.org/stempel/ and I have problems connecting it with solr 1.4.1 Questions: 1. Where EXACTLY do I put stemper-1.0.jar file? 2. How do I register the file, so I can build a fieldType like: fieldType name=text_pl class=solr.TextField analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/ /fieldType 3. Is that the right approach to make it work? Thanks for verbose explanation, Jakub.
Re: Implementing Search Suggestion on Solr
I am a real rookie at solr, but try this: http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en 2010/10/27 Pablo Recio pre...@yaco.es Hi, I don't want to be annoying, but I'm looking for a way to do that. I repeat the question: is there a way to implement Search Suggestion manually? Thanks in advance. Regards, 2010/10/18 Pablo Recio Quijano pre...@yaco.es Hi! I'm trying to implement some kind of Search Suggestion on a search engine I have implemented. This search suggestions should not be automatically like the one described for the SpellCheckComponent [1]. I'm looking something like: SAS oppositions = Public job offers for some-company So I will have to define it manually. I was thinking about synonyms [2] but I don't know if it's the proper way to do it, because semantically those terms are not synonyms. Any ideas or suggestions? Regards, [1] http://wiki.apache.org/solr/SpellCheckComponent [2] http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
Re: Step by step tutorial for multi-language indexing and search
Hi Erick, thanks for your help! I need some technical help though... let me put it that way: 1. I deleted everything in index with: curl http://localhost:8983/solr/update -F stream.body=' deletequery*:*/query/delete' curl http://localhost:8983/solr/update -F stream.body=' commit /' 2. I created 2 documents with fields: name_en, answer_en, name_es, answer_es 3. I made a query through admin page, with response: response - lst name=responseHeader int name=status0/int int name=QTime9/int - lst name=params str name=indenton/str str name=start0/str str name=qJakub /str str name=version2.2/str str name=rows10/str /lst /lst - result name=response numFound=2 start=0 - doc - arr name=answer_en_t strMy name is Jakub/str /arr - arr name=answer_es_t strMe llamo Jakub./str /arr - arr name=id strQuestion:1/str /arr - arr name=name_en_t strWhat is your name?/str /arr - arr name=name_es_t strComo te llamas?/str /arr - arr name=pk_s str1/str /arr - arr name=spell strWhat is your name?/str strMy name is Jakub/str strComo te llamas?/str strMe llamo Jakub./str /arr /doc - doc - arr name=answer_en_t strI am in the kitchen Jakub!/str /arr - arr name=answer_es_t strEstoy en la cocina./str /arr - arr name=id strQuestion:2/str /arr - arr name=name_en_t strWhere are you?/str /arr - arr name=name_es_t strDonde estas?/str /arr - arr name=pk_s str2/str /arr - arr name=spell strWhere are you?/str strI am in the kitchen Jakub!/str strDonde estas?/str strEstoy en la cocina./str /arr /doc /result /response 4. Now I needed two dismaxes to make it work in two separate languages. Lets say I just want to look up in *_en fields, then I created a dismax: requestHandler name=/English class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf name_en_t^0.5 answer_en_t^1.0 /str /lst /requestHandler 5. Hitting the url: http://localhost:8982/solr/English/?q=Jakub gaves me an error: there are more terms than documents in field name_en_t, but it's impossible to sort on tokenized fields 6. I know that I should create a separate dismax for Spanish. My questions: 1. Why those fields are named with *_t? I saw in schema.xml that they are made dynamicly. Can/should I create my own predefined fields in schema.xml? Is this the place where you put HOW the field should be interpreted by indexer? 2. Why the error in no. 5 is being thrown? I know that you cannot do sorting on tokenized fields, but I don't see myself trying to index anything nor tokenizing. 3. How should it be changed to work properly? Thank you and I ask for patience as this can help many rookies like to me to get started. Jakub. 2010/10/21 Erick Erickson erickerick...@gmail.com See below: But also search the archives for multilanguage, this topic has been discussed many times before. Lucid Imagination maintains a Solr-powered (of course) searchable list at: http://www.lucidimagination.com/search/ http://www.lucidimagination.com/search/ On Wed, Oct 20, 2010 at 9:03 AM, Jakub Godawa jakub.god...@gmail.com wrote: Hi everyone! (my first post) I am new, but really curious about usefullness of lucene/solr in documents search from the web applications. I use Ruby on Rails to create one, with plugin acts_as_solr_reloaded that makes connection between web app and solr easy. So I am in a point, where I know that good solution is to prepare multi-language documents with fields like: question_en, answer_en, question_fr, answer_fr, question_pl, answer_pl... etc. I need to create an index that would work with 6 languages: english, french, german, russian, ukrainian and polish. My questions are: 1. Is it doable to have just one search field that behaves like Google's for all those documents? It can be an option to indicate a language to search. This depends on what you mean by do-able. Are you going to allow a French user to search an English document ( etc)? But the real answer is yes, you can if you .. There'll be tradeoffs. Take a look at the dismax handler. It's kind of hard to grok all at once, but you can cause it to search across multiple fields. That is, the user types language, and you can turn it into a complex query under the covers like lang_en:language lang_fr:language lang_ru:language, etc. You can also apply boosts. Note that this has obvious problems with, say, Russian. Half your job will be figuring out what will satisfy the user. You could also have a #different# dismax handler defined for various languages. Say the user was coming from Spanish. Consider a browseES handler. See solrconfig.xml for the default dismax handler. The Solr book mentioned above describes this. 2. How should I begin changing the solr/conf/schema.xml (or other) file to tailor it to my needs? As I am a real rookie here, I am still a bit confused about fields, fieldTypes
Step by step tutorial for multi-language indexing and search
Hi everyone! (my first post) I am new, but really curious about usefullness of lucene/solr in documents search from the web applications. I use Ruby on Rails to create one, with plugin acts_as_solr_reloaded that makes connection between web app and solr easy. So I am in a point, where I know that good solution is to prepare multi-language documents with fields like: question_en, answer_en, question_fr, answer_fr, question_pl, answer_pl... etc. I need to create an index that would work with 6 languages: english, french, german, russian, ukrainian and polish. My questions are: 1. Is it doable to have just one search field that behaves like Google's for all those documents? It can be an option to indicate a language to search. 2. How should I begin changing the solr/conf/schema.xml (or other) file to tailor it to my needs? As I am a real rookie here, I am still a bit confused about fields, fieldTypes and their connection with particular field (ex. answer_fr) and the tokenizers and analyzers. If someone can provide a basic step by step tutorial on how to make it work in two languages I would be more that happy. 3. Do all those languages are supported (officially/unofficialy) by lucene/solr? Thank you for help, Jakub Godawa.
Re: Step by step tutorial for multi-language indexing and search
2010/10/20 Dennis Gearon gear...@sbcglobal.net Thre's approximately a 100% chance that you are going to go through a server side langauge(php, ruby, pearl, java, VB/asp/,net[cough,cough]), before you get to Solr/Lucene. I'd recommend it anyway. I use a server side language (Ruby) as I build the web application. This code will should look at the user's browser locale (en_US, pl_PL, es_CO, etc). The server side langauge would then choose wich language to search by and display. As I said, I may provide locale as an addition to the search query. NOW, that being said, are you going to have the exact same content for all langauges, just translated? The temptation would be to translate to a common language like English, then do the search, then get the translation. I wouln'dt recommend it, but I'm no expert. Translation of single words can be OK, but mulitword ideas and especially sentences doesn't work so well that way. I would like not to yield that temptation. I know that Solr/Lucene can work with many lanugages and I think is has a purpose - like languages' semantic diversity. Whats more, you often don't translate things literally even if they are just translations. you probably will have separate content for that reason, AND another. Different cultures are interested in different things and only have common ground on cetain things like international news (but with different opinions) and medical news. So different content for differnt cultures speaking different languages. I need to treat each culture separetly regarding the subject of query. Are you tryihg to address differnt languages in some place like the US or Great Britain, with LOTS of different languages spoken in minority cultures? Only then would you want a geographically centered server and information gathering organization. If you were going to have search for other countries, then I'd recommend those resources be geogrpahically close to their source culture. No I am not trying to address miniority cultures. Thanks for answer, Jakub Godawa. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from ' http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. --- On Wed, 10/20/10, Jakub Godawa jakub.god...@gmail.com wrote: From: Jakub Godawa jakub.god...@gmail.com Subject: Step by step tutorial for multi-language indexing and search To: solr-user@lucene.apache.org Date: Wednesday, October 20, 2010, 6:03 AM Hi everyone! (my first post) I am new, but really curious about usefullness of lucene/solr in documents search from the web applications. I use Ruby on Rails to create one, with plugin acts_as_solr_reloaded that makes connection between web app and solr easy. So I am in a point, where I know that good solution is to prepare multi-language documents with fields like: question_en, answer_en, question_fr, answer_fr, question_pl, answer_pl... etc. I need to create an index that would work with 6 languages: english, french, german, russian, ukrainian and polish. My questions are: 1. Is it doable to have just one search field that behaves like Google's for all those documents? It can be an option to indicate a language to search. 2. How should I begin changing the solr/conf/schema.xml (or other) file to tailor it to my needs? As I am a real rookie here, I am still a bit confused about fields, fieldTypes and their connection with particular field (ex. answer_fr) and the tokenizers and analyzers. If someone can provide a basic step by step tutorial on how to make it work in two languages I would be more that happy. 3. Do all those languages are supported (officially/unofficialy) by lucene/solr? Thank you for help, Jakub Godawa.