Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-25 Thread Jakub Godawa
After all I choose hunspell-solr as a Polish language interpreter. It
understands Polish and is much easier to install. But look out! I does
not work with current nightly build - works good with solr 1.4.1!

It just works well, and hey! I got Ukrainian out of the box too. I am
thinking of replacing all required lanugages' SnowballPorterFilters with
*.aff and *.dic support.

Thanks for the help everyone!

On Wed, 2010-11-24 at 19:00 +0100, Jakub Godawa wrote:
 Yes, from the current nightly release setting up Stempel is quite easy.
 
 All I did was:
 
 svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr
 
 cd lucene-solr/solr
 ant example
 
 cp 
 ./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar
  ./lib
 cp 
 ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar 
 ./lib
 
 in solrschema.xml
 
 lib path=../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar /
 lib path=../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar /
 
 in schema.xml
 
 !-- Polish --
 fieldType name=text_pl class=solr.TextField
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.WordDelimiterFilterFactory /
 
 filter class=solr.StempelPolishStemFilterFactory
 language=Polish /
   /analyzer
 /fieldType
 
 The end.
 
 Anyway. I don't know if that is Polish stemmer or bad configurated
 fieldType, but the results are just wrong.
 
 example:
 
 index for type text_pl: bilety
 query for type text_pl: bilet
  
 Index Analyzer
 
 org.apache.solr.analysis.StempelPolishStemFilterFactory
 {language=Polish, luceneMatchVersion=LUCENE_24}
 term position
 1
 term text
 bilić
 term type
 word
 source start,end
 0,6
 payload
 
 Query Analyzer
 
 org.apache.solr.analysis.StempelPolishStemFilterFactory
 {language=Polish, luceneMatchVersion=LUCENE_24}
 term position
 1
 term text
 binąć
 term type
 word
 source start,end
 0,5
 payload
 
 
 But I imagine the result as: bilet and bilet which are the base.
 
 Any clues how to make it work like Polish? Maybe someone has good
 experience with hunspell-solr and Polish dictonaries?
 
 Thanks for letting me know!
 
 Cheers,
 Jakub Godawa.
 
 
 
 
 On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote:
  https://issues.apache.org/jira/browse/SOLR-2237
  
  On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa jakub.god...@gmail.com
  wrote:
   I tried to reach the autors twice, but with no luck. I've seen some
   posts where people finally were able to lunch it (without much
  pain).
   I don't know. If any pro would be so nice to try to run the stempel
  on
   his/her machine and paste me some verbose step by step solution I
   would really appreciate.
  
   Cheers,
   Jakub Godawa.
  
   2010/11/13 Lance Norskog goks...@gmail.com:
   I don't know of the Stempel jar includes the Java source. At this
  point I
   think you should ask the author to Stempel to make a Solr front-end
  for it.
   It's very simple for him.
  
   Jakub Godawa wrote:
  
   Am I not doing it in the point no 4? I am compiling all the folder
   that was extracted before, but now with that new class file.
  
   2010/11/12 Lance Norskoggoks...@gmail.com:
  
  
   I think you have to compile all of the stempel source including
  your
   filter factory into one jar at the same time. Everybody does
  this; I
   don't know how different Java versions make class file binaries.
  
   On Thu, Nov 11, 2010 at 3:06 AM, Jakub
  Godawajakub.god...@gmail.com
wrote:
  
  
   Hi! Sorry for such a break, but I was moving house... anyway:
  
   1. I took the
  
  ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
   file and modified it (named as StempelFilterFactory.java) in Vim
  that
   way:
  
   package org.getopt.solr.analysis;
  
   import org.apache.lucene.analysis.TokenStream;
   import org.apache.lucene.analysis.standard.StandardFilter;
  
   public class StempelTokenFilterFactory extends
  BaseTokenFilterFactory {
public StempelFilter create(TokenStream input) {
  return new StempelFilter(input);
}
   }
  
   2. Then I put the file to the extracted stempel-1.0.jar in
   ./org/getopt/solr/analysis/
   3. Then I created a class from it: jar -cf
   StempelTokenFilterFactory.class StempelFilterFactory.java
   4. Then I created new stempel-1.0.jar archive: jar -cf
  stempel-1.0.jar
   -C ./stempel-1.0/ .
   5. Then in schema.xml I've put:
  
  fieldType name=text_pl class=solr.TextField
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter
   class=org.getopt.solr.analysis.StempelTokenFilterFactory /
/analyzer
  /fieldType
  
   6. I started the solr server and I recieved the following error:
  
   2010-11-11 11:50:56 org.apache.solr.common.SolrException log
   SEVERE: java.lang.ClassFormatError: Incompatible magic value
   1347093252 in class file
   org

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-24 Thread Jakub Godawa
Yes, from the current nightly release setting up Stempel is quite easy.

All I did was:

svn co https://svn.apache.org/repos/asf/lucene/dev/trunk ./lucene-solr

cd lucene-solr/solr
ant example

cp 
./contrib/analysis-extras/lucene-libs/lucene-analyzers-stempel-4.0-SNAPSHOT.jar 
./lib
cp ./contrib/analysis-extras/build/apache-solr-analysis-extras-4.0-SNAPSHOT.jar 
./lib

in solrschema.xml

lib path=../../lib/apache-solr-analysis-extras-4.0-SNAPSHOT.jar /
lib path=../../lib/lucene-analyzers-stempel-4.0-SNAPSHOT.jar /

in schema.xml

!-- Polish --
fieldType name=text_pl class=solr.TextField
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.WordDelimiterFilterFactory /

filter class=solr.StempelPolishStemFilterFactory
language=Polish /
  /analyzer
/fieldType

The end.

Anyway. I don't know if that is Polish stemmer or bad configurated
fieldType, but the results are just wrong.

example:

index for type text_pl: bilety
query for type text_pl: bilet
 
Index Analyzer

org.apache.solr.analysis.StempelPolishStemFilterFactory
{language=Polish, luceneMatchVersion=LUCENE_24}
term position
1
term text
bilić
term type
word
source start,end
0,6
payload

Query Analyzer

org.apache.solr.analysis.StempelPolishStemFilterFactory
{language=Polish, luceneMatchVersion=LUCENE_24}
term position
1
term text
binąć
term type
word
source start,end
0,5
payload


But I imagine the result as: bilet and bilet which are the base.

Any clues how to make it work like Polish? Maybe someone has good
experience with hunspell-solr and Polish dictonaries?

Thanks for letting me know!

Cheers,
Jakub Godawa.




On Mon, 2010-11-15 at 08:35 -0500, Robert Muir wrote:
 https://issues.apache.org/jira/browse/SOLR-2237
 
 On Mon, Nov 15, 2010 at 5:04 AM, Jakub Godawa jakub.god...@gmail.com
 wrote:
  I tried to reach the autors twice, but with no luck. I've seen some
  posts where people finally were able to lunch it (without much
 pain).
  I don't know. If any pro would be so nice to try to run the stempel
 on
  his/her machine and paste me some verbose step by step solution I
  would really appreciate.
 
  Cheers,
  Jakub Godawa.
 
  2010/11/13 Lance Norskog goks...@gmail.com:
  I don't know of the Stempel jar includes the Java source. At this
 point I
  think you should ask the author to Stempel to make a Solr front-end
 for it.
  It's very simple for him.
 
  Jakub Godawa wrote:
 
  Am I not doing it in the point no 4? I am compiling all the folder
  that was extracted before, but now with that new class file.
 
  2010/11/12 Lance Norskoggoks...@gmail.com:
 
 
  I think you have to compile all of the stempel source including
 your
  filter factory into one jar at the same time. Everybody does
 this; I
  don't know how different Java versions make class file binaries.
 
  On Thu, Nov 11, 2010 at 3:06 AM, Jakub
 Godawajakub.god...@gmail.com
   wrote:
 
 
  Hi! Sorry for such a break, but I was moving house... anyway:
 
  1. I took the
 
 ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
  file and modified it (named as StempelFilterFactory.java) in Vim
 that
  way:
 
  package org.getopt.solr.analysis;
 
  import org.apache.lucene.analysis.TokenStream;
  import org.apache.lucene.analysis.standard.StandardFilter;
 
  public class StempelTokenFilterFactory extends
 BaseTokenFilterFactory {
   public StempelFilter create(TokenStream input) {
 return new StempelFilter(input);
   }
  }
 
  2. Then I put the file to the extracted stempel-1.0.jar in
  ./org/getopt/solr/analysis/
  3. Then I created a class from it: jar -cf
  StempelTokenFilterFactory.class StempelFilterFactory.java
  4. Then I created new stempel-1.0.jar archive: jar -cf
 stempel-1.0.jar
  -C ./stempel-1.0/ .
  5. Then in schema.xml I've put:
 
 fieldType name=text_pl class=solr.TextField
   analyzer
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter
  class=org.getopt.solr.analysis.StempelTokenFilterFactory /
   /analyzer
 /fieldType
 
  6. I started the solr server and I recieved the following error:
 
  2010-11-11 11:50:56 org.apache.solr.common.SolrException log
  SEVERE: java.lang.ClassFormatError: Incompatible magic value
  1347093252 in class file
  org/getopt/solr/analysis/StempelTokenFilterFactory
 at java.lang.ClassLoader.defineClass1(Native Method)
 at
 java.lang.ClassLoader.defineClass(ClassLoader.java:634)
 at
 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  ...
 
  Question: What is wrong? :) I use jar (fastjar) 0.98 to create
 jars,
  I googled on that error but with no answer gave me idea what is
 wrong
  in my .java file.
 
  Please help, as I believe I am close to the end of that subject.
 
  Cheers,
  Jakub Godawa.
 
  2010/11/3 Lance Norskoggoks...@gmail.com:
 
 
  Here's the problem: Solr is a little dumb about

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-24 Thread Jakub Godawa
On Wed, 2010-11-24 at 19:00 +0100, Jakub Godawa wrote:
 Yes, from the current nightly release setting up Stempel is quite easy.
Thanks to Rober Muir :)



Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-15 Thread Jakub Godawa
I tried to reach the autors twice, but with no luck. I've seen some
posts where people finally were able to lunch it (without much pain).
I don't know. If any pro would be so nice to try to run the stempel on
his/her machine and paste me some verbose step by step solution I
would really appreciate.

Cheers,
Jakub Godawa.

2010/11/13 Lance Norskog goks...@gmail.com:
 I don't know of the Stempel jar includes the Java source. At this point I
 think you should ask the author to Stempel to make a Solr front-end for it.
 It's very simple for him.

 Jakub Godawa wrote:

 Am I not doing it in the point no 4? I am compiling all the folder
 that was extracted before, but now with that new class file.

 2010/11/12 Lance Norskoggoks...@gmail.com:


 I think you have to compile all of the stempel source including your
 filter factory into one jar at the same time. Everybody does this; I
 don't know how different Java versions make class file binaries.

 On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawajakub.god...@gmail.com
  wrote:


 Hi! Sorry for such a break, but I was moving house... anyway:

 1. I took the
 ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
 file and modified it (named as StempelFilterFactory.java) in Vim that
 way:

 package org.getopt.solr.analysis;

 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.standard.StandardFilter;

 public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
  public StempelFilter create(TokenStream input) {
    return new StempelFilter(input);
  }
 }

 2. Then I put the file to the extracted stempel-1.0.jar in
 ./org/getopt/solr/analysis/
 3. Then I created a class from it: jar -cf
 StempelTokenFilterFactory.class StempelFilterFactory.java
 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
 -C ./stempel-1.0/ .
 5. Then in schema.xml I've put:

    fieldType name=text_pl class=solr.TextField
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory /
      /analyzer
    /fieldType

 6. I started the solr server and I recieved the following error:

 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
 SEVERE: java.lang.ClassFormatError: Incompatible magic value
 1347093252 in class file
 org/getopt/solr/analysis/StempelTokenFilterFactory
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
        at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 ...

 Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars,
 I googled on that error but with no answer gave me idea what is wrong
 in my .java file.

 Please help, as I believe I am close to the end of that subject.

 Cheers,
 Jakub Godawa.

 2010/11/3 Lance Norskoggoks...@gmail.com:


 Here's the problem: Solr is a little dumb about these Filter classes,
 and so you have to make a Factory object for the Stempel Filter.

 There are a lot of other FilterFactory classes. You would have to just
 copy one and change the names to Stempel and it might actually work.

 This will take some Solr programming- perhaps the author can help you?

 On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawajakub.god...@gmail.com
  wrote:


 Sorry, I am not Java programmer at all. I would appreciate more
 verbose (or step by step) help.

 2010/11/2 Bernd Fehlingbernd.fehl...@uni-bielefeld.de:


 So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
 In this case I would assume a file StempelTokenFilterFactory.class
 in your directory org/getopt/solr/analysis/.

 And a class which extends the BaseTokenFilterFactory rigth?
 ...
 public class StempelTokenFilterFactory extends BaseTokenFilterFactory
 implements ResourceLoaderAware {
 ...



 Am 02.11.2010 14:20, schrieb Jakub Godawa:


 This is what stempel-1.0.jar consist of after jar -xf:

 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt

 org/egothor:
 stemmer

 org/egothor/stemmer:
 Cell.class     Diff.class    Gener.class  MultiTrie2.class
 Optimizer2.class  Reduce.class        Row.class    TestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class

 org/getopt:
 stempel

 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class

 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables

 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out

 2010/11/2 Bernd Fehlingbernd.fehl...@uni-bielefeld.de:


 Hi Jakub,

 if you unzip your stempel-1.0.jar do you

simple dismax with OR

2010-11-15 Thread Jakub Godawa
Hi! I have my dismax that is searching through two fields.

requestHandler name=en class=solr.searchHandler
  lst name=defaults
str name=defTypedismax/str
str name=qt
  name_en^1.0 answe_en^1.5
/str
  /lst
/requestHandler

Now I have a document that has Various appliances can be installed
here in the answen_en field, indexed with English analyzer.
When I query installation I have the result of that doc, which is OK.\
When I query How to install something? I get nothing which is bad,
because there is match highligthed on the analysis page.

I've read that dismax don't read the q.op (query default operator).
How should I do my dismax to handle that?

Cheers,
Jakub Godawa.


Re: simple dismax with OR

2010-11-15 Thread Jakub Godawa
thank you, that works well.

2010/11/15 Matti Oinas matti.oi...@gmail.com:
 Define mm(Minimum 'should' match) value for dismax. The default is
 100% so every clause must match.

 http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29

 2010/11/15 Jakub Godawa jakub.god...@gmail.com:
 Hi! I have my dismax that is searching through two fields.

 requestHandler name=en class=solr.searchHandler
  lst name=defaults
    str name=defTypedismax/str
    str name=qt
      name_en^1.0 answe_en^1.5
    /str
  /lst
 /requestHandler

 Now I have a document that has Various appliances can be installed
 here in the answen_en field, indexed with English analyzer.
 When I query installation I have the result of that doc, which is OK.\
 When I query How to install something? I get nothing which is bad,
 because there is match highligthed on the analysis page.

 I've read that dismax don't read the q.op (query default operator).
 How should I do my dismax to handle that?

 Cheers,
 Jakub Godawa.




Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-12 Thread Jakub Godawa
Am I not doing it in the point no 4? I am compiling all the folder
that was extracted before, but now with that new class file.

2010/11/12 Lance Norskog goks...@gmail.com:
 I think you have to compile all of the stempel source including your
 filter factory into one jar at the same time. Everybody does this; I
 don't know how different Java versions make class file binaries.

 On Thu, Nov 11, 2010 at 3:06 AM, Jakub Godawa jakub.god...@gmail.com wrote:
 Hi! Sorry for such a break, but I was moving house... anyway:

 1. I took the 
 ~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
 file and modified it (named as StempelFilterFactory.java) in Vim that
 way:

 package org.getopt.solr.analysis;

 import org.apache.lucene.analysis.TokenStream;
 import org.apache.lucene.analysis.standard.StandardFilter;

 public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
  public StempelFilter create(TokenStream input) {
    return new StempelFilter(input);
  }
 }

 2. Then I put the file to the extracted stempel-1.0.jar in
 ./org/getopt/solr/analysis/
 3. Then I created a class from it: jar -cf
 StempelTokenFilterFactory.class StempelFilterFactory.java
 4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
 -C ./stempel-1.0/ .
 5. Then in schema.xml I've put:

    fieldType name=text_pl class=solr.TextField
      analyzer
        tokenizer class=solr.WhitespaceTokenizerFactory/
        filter class=solr.LowerCaseFilterFactory/
        filter class=org.getopt.solr.analysis.StempelTokenFilterFactory /
      /analyzer
    /fieldType

 6. I started the solr server and I recieved the following error:

 2010-11-11 11:50:56 org.apache.solr.common.SolrException log
 SEVERE: java.lang.ClassFormatError: Incompatible magic value
 1347093252 in class file
 org/getopt/solr/analysis/StempelTokenFilterFactory
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
        at 
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 ...

 Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars,
 I googled on that error but with no answer gave me idea what is wrong
 in my .java file.

 Please help, as I believe I am close to the end of that subject.

 Cheers,
 Jakub Godawa.

 2010/11/3 Lance Norskog goks...@gmail.com:
 Here's the problem: Solr is a little dumb about these Filter classes,
 and so you have to make a Factory object for the Stempel Filter.

 There are a lot of other FilterFactory classes. You would have to just
 copy one and change the names to Stempel and it might actually work.

 This will take some Solr programming- perhaps the author can help you?

 On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa jakub.god...@gmail.com wrote:
 Sorry, I am not Java programmer at all. I would appreciate more
 verbose (or step by step) help.

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:

 So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
 In this case I would assume a file StempelTokenFilterFactory.class
 in your directory org/getopt/solr/analysis/.

 And a class which extends the BaseTokenFilterFactory rigth?
 ...
 public class StempelTokenFilterFactory extends BaseTokenFilterFactory 
 implements ResourceLoaderAware {
 ...



 Am 02.11.2010 14:20, schrieb Jakub Godawa:
 This is what stempel-1.0.jar consist of after jar -xf:

 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt

 org/egothor:
 stemmer

 org/egothor/stemmer:
 Cell.class     Diff.class    Gener.class  MultiTrie2.class
 Optimizer2.class  Reduce.class        Row.class    TestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class

 org/getopt:
 stempel

 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class

 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables

 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 
 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-11 Thread Jakub Godawa
Hi! Sorry for such a break, but I was moving house... anyway:

1. I took the 
~/apache-solr/src/java/org/apache/solr/analysis/StandardFilterFactory.java
file and modified it (named as StempelFilterFactory.java) in Vim that
way:

package org.getopt.solr.analysis;

import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardFilter;

public class StempelTokenFilterFactory extends BaseTokenFilterFactory {
  public StempelFilter create(TokenStream input) {
return new StempelFilter(input);
  }
}

2. Then I put the file to the extracted stempel-1.0.jar in
./org/getopt/solr/analysis/
3. Then I created a class from it: jar -cf
StempelTokenFilterFactory.class StempelFilterFactory.java
4. Then I created new stempel-1.0.jar archive: jar -cf stempel-1.0.jar
-C ./stempel-1.0/ .
5. Then in schema.xml I've put:

fieldType name=text_pl class=solr.TextField
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=org.getopt.solr.analysis.StempelTokenFilterFactory /
  /analyzer
/fieldType

6. I started the solr server and I recieved the following error:

2010-11-11 11:50:56 org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassFormatError: Incompatible magic value
1347093252 in class file
org/getopt/solr/analysis/StempelTokenFilterFactory
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
...

Question: What is wrong? :) I use jar (fastjar) 0.98 to create jars,
I googled on that error but with no answer gave me idea what is wrong
in my .java file.

Please help, as I believe I am close to the end of that subject.

Cheers,
Jakub Godawa.

2010/11/3 Lance Norskog goks...@gmail.com:
 Here's the problem: Solr is a little dumb about these Filter classes,
 and so you have to make a Factory object for the Stempel Filter.

 There are a lot of other FilterFactory classes. You would have to just
 copy one and change the names to Stempel and it might actually work.

 This will take some Solr programming- perhaps the author can help you?

 On Tue, Nov 2, 2010 at 7:08 AM, Jakub Godawa jakub.god...@gmail.com wrote:
 Sorry, I am not Java programmer at all. I would appreciate more
 verbose (or step by step) help.

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:

 So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
 In this case I would assume a file StempelTokenFilterFactory.class
 in your directory org/getopt/solr/analysis/.

 And a class which extends the BaseTokenFilterFactory rigth?
 ...
 public class StempelTokenFilterFactory extends BaseTokenFilterFactory 
 implements ResourceLoaderAware {
 ...



 Am 02.11.2010 14:20, schrieb Jakub Godawa:
 This is what stempel-1.0.jar consist of after jar -xf:

 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt

 org/egothor:
 stemmer

 org/egothor/stemmer:
 Cell.class     Diff.class    Gener.class  MultiTrie2.class
 Optimizer2.class  Reduce.class        Row.class    TestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class

 org/getopt:
 stempel

 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class

 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables

 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 
 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Jakub Godawa
Thank you Bernd! I couldn't make it run though. Here is my problem:

1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
directive: lib path=../lib/stempel-1.0.jar /
3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

(...)
  !-- Polish --
  fieldType name=text_pl class=solr.TextField
analyzer
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=org.getopt.stempel.lucene.StempelFilter /
  !--filter
class=org.getopt.solr.analysis.StempelTokenFilterFactory
protected=protwords.txt / --
/analyzer
  /fieldType
(...)

4. jar file is loaded but I got an error:
SEVERE: Could not start SOLR. Check solr/home property
java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
  at java.lang.ClassLoader.defineClass1(Native Method)
  at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
  at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
(...)

5. Different class gave me that one:
SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.getopt.solr.analysis.StempelTokenFilterFactory'
  at 
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
  at 
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
(...)

Question is: How to make fieldType / and filter / work with that Stempel? :)

Cheers,
Jakub Godawa.

2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr 
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory 
 protected=protwords.txt /

 This is how my fieldType looks like:

    fieldType name=text_kstem class=solr.TextField 
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=false /
        filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=1 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /
        filter class=solr.LowerCaseFilterFactory /
        filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory 
 protected=protwords.txt /
        filter class=solr.RemoveDuplicatesTokenFilterFactory /
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt /
        filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
 generateNumberParts=1 catenateWords=0 catenateNumbers=0
 catenateAll=0 splitOnCaseChange=1 /
        filter class=solr.LowerCaseFilterFactory /
        filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory 
 protected=protwords.txt /
        filter class=solr.RemoveDuplicatesTokenFilterFactory /
      /analyzer
    /fieldType

 Regards,
 Bernd



 Am 28.10.2010 14:56, schrieb Jakub Godawa:
 Hi!
 There is a polish stemmer http://www.getopt.org/stempel/ and I have
 problems connecting it with solr 1.4.1
 Questions:

 1. Where EXACTLY do I put stemper-1.0.jar file?
 2. How do I register the file, so I can build a fieldType like:

 fieldType name=text_pl class=solr.TextField
   analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/
 /fieldType

 3. Is that the right approach to make it work?

 Thanks for verbose explanation,
 Jakub.



Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Jakub Godawa
Erick I've put the jar files like that before. I also added the
directive and put the file in instanceDir/lib

What is still a problem is that even the files are loaded:
2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader replaceClassLoader
INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
to classloader

I am not able to use the FilterFactory... maybe I am attempting it in
a wrong way?

Cheers,
Jakub Godawa.

2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
  Hi Jakub,
 
  I have ported the KStemmer for use in most recent Solr trunk version.
  My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
  because it belongs to Solr.
 
  Write it as FilterFactory and use it as Filter like:
  filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
 
  This is how my fieldType looks like:
 
     fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
       analyzer type=index
         tokenizer class=solr.WhitespaceTokenizerFactory /
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1
  catenateAll=0 splitOnCaseChange=1 /
         filter class=solr.LowerCaseFilterFactory /
         filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
         filter class=solr.RemoveDuplicatesTokenFilterFactory /
       /analyzer
       analyzer type=query
         tokenizer class=solr.WhitespaceTokenizerFactory /
         filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
         filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0
  catenateAll=0 splitOnCaseChange=1 /
         filter class=solr.LowerCaseFilterFactory /
         filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
         filter class=solr.RemoveDuplicatesTokenFilterFactory /
       /analyzer
     /fieldType
 
  Regards,
  Bernd
 
 
 
  Am 28.10.2010 14:56, schrieb Jakub Godawa:
  Hi!
  There is a polish stemmer http://www.getopt.org/stempel/ and I have
  problems connecting it with solr 1.4.1
  Questions:
 
  1. Where EXACTLY do I put stemper-1.0.jar file?
  2. How do I register the file, so I can build a fieldType like:
 
  fieldType name=text_pl class=solr.TextField
    analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/
  /fieldType
 
  3. Is that the right approach to make it work?
 
  Thanks for verbose explanation,
  Jakub.
 




Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Jakub Godawa
This is what stempel-1.0.jar consist of after jar -xf:

jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
org/:
egothor  getopt

org/egothor:
stemmer

org/egothor/stemmer:
Cell.class Diff.classGener.class  MultiTrie2.class
Optimizer2.class  Reduce.classRow.classTestAll.class
TestLoad.class  Trie$StrEnum.class
Compile.class  DiffIt.class  Lift.class   MultiTrie.class
Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
Trie.class

org/getopt:
stempel

org/getopt/stempel:
Benchmark.class  lucene  Stemmer.class

org/getopt/stempel/lucene:
StempelAnalyzer.class  StempelFilter.class
jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
META-INF/:
MANIFEST.MF
jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
res:
tables

res/tables:
readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
stemmer_200.out  stemmer_500.out  stemmer_700.out

2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /

 This is how my fieldType looks like:

    fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
        filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1
 catenateAll=0 splitOnCaseChange=1 /
        filter class=solr.LowerCaseFilterFactory /
        filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /
        filter class=solr.RemoveDuplicatesTokenFilterFactory /
      /analyzer
      analyzer type=query
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt /
        filter class

Re: How to use polish stemmer - Stempel - in schema.xml?

2010-11-02 Thread Jakub Godawa
Sorry, I am not Java programmer at all. I would appreciate more
verbose (or step by step) help.

2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:

 So you call org.getopt.solr.analysis.StempelTokenFilterFactory.
 In this case I would assume a file StempelTokenFilterFactory.class
 in your directory org/getopt/solr/analysis/.

 And a class which extends the BaseTokenFilterFactory rigth?
 ...
 public class StempelTokenFilterFactory extends BaseTokenFilterFactory 
 implements ResourceLoaderAware {
 ...



 Am 02.11.2010 14:20, schrieb Jakub Godawa:
 This is what stempel-1.0.jar consist of after jar -xf:

 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R org/
 org/:
 egothor  getopt

 org/egothor:
 stemmer

 org/egothor/stemmer:
 Cell.class     Diff.class    Gener.class  MultiTrie2.class
 Optimizer2.class  Reduce.class        Row.class    TestAll.class
 TestLoad.class  Trie$StrEnum.class
 Compile.class  DiffIt.class  Lift.class   MultiTrie.class
 Optimizer.class   Reduce$Remap.class  Stock.class  Test.class
 Trie.class

 org/getopt:
 stempel

 org/getopt/stempel:
 Benchmark.class  lucene  Stemmer.class

 org/getopt/stempel/lucene:
 StempelAnalyzer.class  StempelFilter.class
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R META-INF/
 META-INF/:
 MANIFEST.MF
 jgod...@ubuntu:~/apache-solr-1.4.1/ifaq/lib$ ls -R res
 res:
 tables

 res/tables:
 readme.txt  stemmer_1000.out  stemmer_100.out  stemmer_2000.out
 stemmer_200.out  stemmer_500.out  stemmer_700.out

 2010/11/2 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 if you unzip your stempel-1.0.jar do you have the
 required directory structure and file in there?
 org/getopt/stempel/lucene/StempelFilter.class

 Regards,
 Bernd

 Am 02.11.2010 13:54, schrieb Jakub Godawa:
 Erick I've put the jar files like that before. I also added the
 directive and put the file in instanceDir/lib

 What is still a problem is that even the files are loaded:
 2010-11-02 13:20:48 org.apache.solr.core.SolrResourceLoader 
 replaceClassLoader
 INFO: Adding 
 'file:/home/jgodawa/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar'
 to classloader

 I am not able to use the FilterFactory... maybe I am attempting it in
 a wrong way?

 Cheers,
 Jakub Godawa.

 2010/11/2 Erick Erickson erickerick...@gmail.com:
 The polish stemmer jar file needs to be findable by Solr, if you copy
 it to solr_home/lib and restart solr you should be set.

 Alternatively, you can add another lib directive to the solrconfig.xml
 file
 (there are several examples in that file already).

 I'm a little confused about not being able to find TokenFilter, is that
 still
 a problem?

 HTH
 Erick

 On Tue, Nov 2, 2010 at 8:07 AM, Jakub Godawa jakub.god...@gmail.com 
 wrote:

 Thank you Bernd! I couldn't make it run though. Here is my problem:

 1. There is a file ~/apache-solr-1.4.1/ifaq/lib/stempel-1.0.jar
 2. In ~/apache-solr-1.4.1/ifaq/solr/conf/solrconfig.xml there is a
 directive: lib path=../lib/stempel-1.0.jar /
 3. In ~/apache-solr-1.4.1/ifaq/solr/conf/schema.xml there is fieldType:

 (...)
  !-- Polish --
   fieldType name=text_pl class=solr.TextField
    analyzer
       tokenizer class=solr.WhitespaceTokenizerFactory/
      filter class=solr.LowerCaseFilterFactory/
      filter class=org.getopt.stempel.lucene.StempelFilter /
      !--    filter
 class=org.getopt.solr.analysis.StempelTokenFilterFactory
 protected=protwords.txt / --
    /analyzer
  /fieldType
 (...)

 4. jar file is loaded but I got an error:
 SEVERE: Could not start SOLR. Check solr/home property
 java.lang.NoClassDefFoundError: org/apache/lucene/analysis/TokenFilter
      at java.lang.ClassLoader.defineClass1(Native Method)
      at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
      at
 java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
 (...)

 5. Different class gave me that one:
 SEVERE: org.apache.solr.common.SolrException: Error loading class
 'org.getopt.solr.analysis.StempelTokenFilterFactory'
      at
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:375)
      at
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:390)
 (...)

 Question is: How to make fieldType / and filter / work with that
 Stempel? :)

 Cheers,
 Jakub Godawa.

 2010/10/29 Bernd Fehling bernd.fehl...@uni-bielefeld.de:
 Hi Jakub,

 I have ported the KStemmer for use in most recent Solr trunk version.
 My stemmer is located in the lib directory of Solr
 solr/lib/KStemmer-2.00.jar
 because it belongs to Solr.

 Write it as FilterFactory and use it as Filter like:
 filter class=de.ubbielefeld.solr.analysis.KStemFilterFactory
 protected=protwords.txt /

 This is how my fieldType looks like:

    fieldType name=text_kstem class=solr.TextField
 positionIncrementGap=100
      analyzer type=index
        tokenizer class=solr.WhitespaceTokenizerFactory /
        filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt enablePositionIncrements=false /
        filter

How to use polish stemmer - Stempel - in schema.xml?

2010-10-28 Thread Jakub Godawa
Hi!
There is a polish stemmer http://www.getopt.org/stempel/ and I have
problems connecting it with solr 1.4.1
Questions:

1. Where EXACTLY do I put stemper-1.0.jar file?
2. How do I register the file, so I can build a fieldType like:

fieldType name=text_pl class=solr.TextField
  analyzer class=org.geoopt.solr.analysis.StempelTokenFilterFactory/
/fieldType

3. Is that the right approach to make it work?

Thanks for verbose explanation,
Jakub.


Re: Implementing Search Suggestion on Solr

2010-10-27 Thread Jakub Godawa
I am a real rookie at solr, but try this:
http://solr.pl/2010/10/18/solr-and-autocomplete-part-1/?lang=en

2010/10/27 Pablo Recio pre...@yaco.es

 Hi,

 I don't want to be annoying, but I'm looking for a way to do that.

 I repeat the question: is there a way to implement Search Suggestion
 manually?

 Thanks in advance.
 Regards,

 2010/10/18 Pablo Recio Quijano pre...@yaco.es

  Hi!
 
  I'm trying to implement some kind of Search Suggestion on a search engine
 I
  have implemented. This search suggestions should not be automatically
 like
  the one described for the SpellCheckComponent [1]. I'm looking something
  like:
 
  SAS oppositions = Public job offers for some-company
 
  So I will have to define it manually. I was thinking about synonyms [2]
 but
  I don't know if it's the proper way to do it, because semantically those
  terms are not synonyms.
 
  Any ideas or suggestions?
 
  Regards,
 
  [1] http://wiki.apache.org/solr/SpellCheckComponent
  [2]
 
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 



Re: Step by step tutorial for multi-language indexing and search

2010-10-24 Thread Jakub Godawa
Hi Erick, thanks for your help!

I need some technical help though... let me put it that way:

1. I deleted everything in index with:
curl http://localhost:8983/solr/update -F stream.body='
deletequery*:*/query/delete'
curl http://localhost:8983/solr/update -F stream.body=' commit /'

2. I created 2 documents with fields: name_en, answer_en, name_es, answer_es
3. I made a query through admin page, with response:

response
-
lst name=responseHeader
int name=status0/int
int name=QTime9/int
-
lst name=params
str name=indenton/str
str name=start0/str
str name=qJakub
/str
str name=version2.2/str
str name=rows10/str
/lst
/lst
-
result name=response numFound=2 start=0
-
doc
-
arr name=answer_en_t
strMy name is Jakub/str
/arr
-
arr name=answer_es_t
strMe llamo Jakub./str
/arr
-
arr name=id
strQuestion:1/str
/arr
-
arr name=name_en_t
strWhat is your name?/str
/arr
-
arr name=name_es_t
strComo te llamas?/str
/arr
-
arr name=pk_s
str1/str
/arr
-
arr name=spell
strWhat is your name?/str
strMy name is Jakub/str
strComo te llamas?/str
strMe llamo Jakub./str
/arr
/doc
-
doc
-
arr name=answer_en_t
strI am in the kitchen Jakub!/str
/arr
-
arr name=answer_es_t
strEstoy en la cocina./str
/arr
-
arr name=id
strQuestion:2/str
/arr
-
arr name=name_en_t
strWhere are you?/str
/arr
-
arr name=name_es_t
strDonde estas?/str
/arr
-
arr name=pk_s
str2/str
/arr
-
arr name=spell
strWhere are you?/str
strI am in the kitchen Jakub!/str
strDonde estas?/str
strEstoy en la cocina./str
/arr
/doc
/result
/response

4. Now I needed two dismaxes to make it work in two separate languages. Lets
say I just want to look up in *_en fields, then I created a dismax:

requestHandler name=/English class=solr.SearchHandler
lst name=defaults
  str name=defTypedismax/str
  str name=echoParamsexplicit/str
  float name=tie0.01/float
  str name=qf
name_en_t^0.5 answer_en_t^1.0
 /str
 /lst
  /requestHandler


5. Hitting the url: http://localhost:8982/solr/English/?q=Jakub gaves me an
error:

there are more terms than documents in field name_en_t, but it's
impossible to sort on tokenized fields

6. I know that I should create a separate dismax for Spanish.

My questions:
1. Why those fields are named with *_t? I saw in schema.xml that they are
made dynamicly. Can/should I create my own predefined fields in schema.xml?
Is this the place where you put HOW the field should be interpreted by
indexer?
2. Why the error in no. 5 is being thrown? I know that you cannot do sorting
on tokenized fields, but I don't see myself trying to index anything nor
tokenizing.
3. How should it be changed to work properly?

Thank you and I ask for patience as this can help many rookies like to me to
get started.
Jakub.

2010/10/21 Erick Erickson erickerick...@gmail.com

 See below:

 But also search the archives for multilanguage, this topic has been
 discussed
 many times before. Lucid Imagination maintains a Solr-powered (of course)
 searchable
 list at: http://www.lucidimagination.com/search/

 http://www.lucidimagination.com/search/

 On Wed, Oct 20, 2010 at 9:03 AM, Jakub Godawa jakub.god...@gmail.com
 wrote:

  Hi everyone! (my first post)
 
  I am new, but really curious about usefullness of lucene/solr in
 documents
  search from the web applications. I use Ruby on Rails to create one, with
  plugin acts_as_solr_reloaded that makes connection between web app and
  solr easy.
 
  So I am in a point, where I know that good solution is to prepare
  multi-language documents with fields like:
  question_en, answer_en,
  question_fr, answer_fr,
  question_pl,  answer_pl... etc.
 
  I need to create an index that would work with 6 languages: english,
  french,
  german, russian, ukrainian and polish.
 
  My questions are:
  1. Is it doable to have just one search field that behaves like Google's
  for
  all those documents? It can be an option to indicate a language to
 search.
 

 This depends on what you mean by do-able. Are you going to allow a French
 user to search an English document ( etc)? But the real answer is yes,
 you
 can
 if you .. There'll be tradeoffs.

 Take a look at the dismax handler. It's kind of hard to grok all at once,
 but you
 can cause it to search across multiple fields. That is, the user types
 language,
 and you can turn it into a complex query under the covers like
 lang_en:language lang_fr:language lang_ru:language, etc. You can also
 apply boosts. Note that this has obvious problems with, say, Russian. Half
 your
 job will be figuring out what will satisfy the user.

 You could also have a #different# dismax handler defined for various
 languages. Say
 the user was coming from Spanish. Consider a browseES handler. See
 solrconfig.xml
 for the default dismax handler. The Solr book mentioned above describes
 this.


  2. How should I begin changing the solr/conf/schema.xml (or other) file
 to
  tailor it to my needs? As I am a real rookie here, I am still a bit
  confused
  about fields, fieldTypes

Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Jakub Godawa
Hi everyone! (my first post)

I am new, but really curious about usefullness of lucene/solr in documents
search from the web applications. I use Ruby on Rails to create one, with
plugin acts_as_solr_reloaded that makes connection between web app and
solr easy.

So I am in a point, where I know that good solution is to prepare
multi-language documents with fields like:
question_en, answer_en,
question_fr, answer_fr,
question_pl,  answer_pl... etc.

I need to create an index that would work with 6 languages: english, french,
german, russian, ukrainian and polish.

My questions are:
1. Is it doable to have just one search field that behaves like Google's for
all those documents? It can be an option to indicate a language to search.
2. How should I begin changing the solr/conf/schema.xml (or other) file to
tailor it to my needs? As I am a real rookie here, I am still a bit confused
about fields, fieldTypes and their connection with particular field (ex.
answer_fr) and the tokenizers and analyzers. If someone can provide a
basic step by step tutorial on how to make it work in two languages I would
be more that happy.
3. Do all those languages are supported (officially/unofficialy) by
lucene/solr?

Thank you for help,
Jakub Godawa.


Re: Step by step tutorial for multi-language indexing and search

2010-10-20 Thread Jakub Godawa
2010/10/20 Dennis Gearon gear...@sbcglobal.net

 Thre's approximately a 100% chance that you are going to go through a
 server side langauge(php, ruby, pearl, java, VB/asp/,net[cough,cough]),
 before you get to Solr/Lucene. I'd recommend it anyway.


I use a server side language (Ruby) as I build the web application.


 This code will should look at the user's browser locale (en_US, pl_PL,
 es_CO, etc). The server side langauge would then choose wich language to
 search by and display.


As I said, I may provide locale as an addition to the search query.


 NOW, that being said, are you going to have the exact same content for all
 langauges, just translated? The temptation would be to translate to a common
 language like English, then do the search, then get the translation. I
 wouln'dt recommend it, but I'm no expert. Translation of single words can be
 OK, but mulitword ideas and especially sentences doesn't work so well that
 way.


I would like not to yield that temptation. I know that Solr/Lucene can work
with many lanugages and I think is has a purpose - like languages' semantic
diversity. Whats more, you often don't translate things literally even if
they are just translations.


 you probably will have separate content for that reason, AND another.
 Different cultures are interested in different things and only have common
 ground on cetain things like international news (but with different
 opinions) and medical news. So different content for differnt cultures
 speaking different languages.


I need to treat each culture separetly regarding the subject of query.


 Are you tryihg to address differnt languages in some place like the US or
 Great Britain, with LOTS of different languages spoken in minority cultures?
 Only then would you want a geographically centered server and information
 gathering organization. If you were going to have search for other
 countries, then I'd recommend those resources be geogrpahically close to
 their source culture.


No I am not trying to address miniority cultures.

Thanks for answer,
Jakub Godawa.

Dennis Gearon

 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better idea to learn from others’ mistakes, so you do not have to make them
 yourself. from '
 http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
  otherwise we all die.


 --- On Wed, 10/20/10, Jakub Godawa jakub.god...@gmail.com wrote:

  From: Jakub Godawa jakub.god...@gmail.com
  Subject: Step by step tutorial for multi-language indexing and search
  To: solr-user@lucene.apache.org
  Date: Wednesday, October 20, 2010, 6:03 AM
  Hi everyone! (my first post)
 
  I am new, but really curious about usefullness of
  lucene/solr in documents
  search from the web applications. I use Ruby on Rails to
  create one, with
  plugin acts_as_solr_reloaded that makes connection
  between web app and
  solr easy.
 
  So I am in a point, where I know that good solution is to
  prepare
  multi-language documents with fields like:
  question_en, answer_en,
  question_fr, answer_fr,
  question_pl,  answer_pl... etc.
 
  I need to create an index that would work with 6 languages:
  english, french,
  german, russian, ukrainian and polish.
 
  My questions are:
  1. Is it doable to have just one search field that behaves
  like Google's for
  all those documents? It can be an option to indicate a
  language to search.
  2. How should I begin changing the solr/conf/schema.xml (or
  other) file to
  tailor it to my needs? As I am a real rookie here, I am
  still a bit confused
  about fields, fieldTypes and their connection with
  particular field (ex.
  answer_fr) and the tokenizers and analyzers. If someone
  can provide a
  basic step by step tutorial on how to make it work in two
  languages I would
  be more that happy.
  3. Do all those languages are supported
  (officially/unofficialy) by
  lucene/solr?
 
  Thank you for help,
  Jakub Godawa.