Re: Custom TokenFilter
Hi Erick, For me, this classCastException is caused by the wrong use of TokenFilter.In fieldType declaration (schema.xml), i've put :tokenizer class=com.tamingtext.texttamer.solr.SentenceTokenizerFactory/And instead using TokenizerFactory in my class, i utilize TokenFilterFactory like this :public class SentenceTokenizerFactory extends TokenFilterFactory So when solr try to load my class, it expects to load TokenizerFactory class but it has TokenFilterFactory class. Regards,Andry Le Jeudi 26 mars 2015 4h13, Erick Erickson erickerick...@gmail.com a écrit : Thanks for letting us know the resolution, the problem was bugging me Erick On Wed, Mar 25, 2015 at 4:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Finally, i think i found where this problem comes.I didn't use the right class extender, instead using Tokenizers, i'm using Token filter. Eric, thanks for your replies.Regards. Le Mercredi 25 mars 2015 23h55, Test Test andymish...@yahoo.fr a écrit : Re, I have tried to remove all the redundant jar files.Then i've relaunched it but it's blocked directly on the same issue. It's very strange. Regards, Le Mercredi 25 mars 2015 23h31, Erick Erickson erickerick...@gmail.com a écrit : Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamingtext/dependency directory did you? That means you have Lucene (and solr and solrj and ...) in your class path twice since they're _already_ in your classpath by default since you're running Solr. All your jars should be in your aggregate classpath exactly once. Having them in twice would explain the cast exception. not need these in the tamingtext/dependency subdirectory, just the things that are _not_ in Solr already.. Best, Erick On Wed, Mar 25, 2015 at 12:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar - junit-4.10.jar - log4j-1.2.17.jar - lucene-analyzers-common-4.10.2.jar - lucene-benchmark-4.10.2.jar - lucene-core-4.10.2.jar - mahout-core-0.9.jar - noggit-0.5.jar - opennlp-maxent-3.0.3.jar - opennlp-tools-1.5.3.jar - slf4j-api-1.7.9.jar - slf4j-simple-1.7.10.jar - solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar / - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance Regards. Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a écrit : Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have
Re: Custom TokenFilter
Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. I can't figure out which one make this issue. ThanksRegards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486)... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)... 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat java.lang.Class.asSubclass(Class.java:3208)at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) Someone can help? Thanks.Regards.
Re: Custom TokenFilter
Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486)... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)... 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat java.lang.Class.asSubclass(Class.java:3208)at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) Someone can help? Thanks.Regards.
Re: Custom TokenFilter
Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486)... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)... 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat java.lang.Class.asSubclass(Class.java:3208)at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) Someone can help? Thanks.Regards.
Re: Custom TokenFilter
Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar - junit-4.10.jar - log4j-1.2.17.jar - lucene-analyzers-common-4.10.2.jar - lucene-benchmark-4.10.2.jar - lucene-core-4.10.2.jar - mahout-core-0.9.jar - noggit-0.5.jar - opennlp-maxent-3.0.3.jar - opennlp-tools-1.5.3.jar - slf4j-api-1.7.9.jar - slf4j-simple-1.7.10.jar - solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar / - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance Regards. Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a écrit : Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486)... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)... 13 moreCaused by: java.lang.ClassCastException: class
Re: Custom TokenFilter
Re, Finally, i think i found where this problem comes.I didn't use the right class extender, instead using Tokenizers, i'm using Token filter. Eric, thanks for your replies.Regards. Le Mercredi 25 mars 2015 23h55, Test Test andymish...@yahoo.fr a écrit : Re, I have tried to remove all the redundant jar files.Then i've relaunched it but it's blocked directly on the same issue. It's very strange. Regards, Le Mercredi 25 mars 2015 23h31, Erick Erickson erickerick...@gmail.com a écrit : Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamingtext/dependency directory did you? That means you have Lucene (and solr and solrj and ...) in your class path twice since they're _already_ in your classpath by default since you're running Solr. All your jars should be in your aggregate classpath exactly once. Having them in twice would explain the cast exception. not need these in the tamingtext/dependency subdirectory, just the things that are _not_ in Solr already.. Best, Erick On Wed, Mar 25, 2015 at 12:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar - junit-4.10.jar - log4j-1.2.17.jar - lucene-analyzers-common-4.10.2.jar - lucene-benchmark-4.10.2.jar - lucene-core-4.10.2.jar - mahout-core-0.9.jar - noggit-0.5.jar - opennlp-maxent-3.0.3.jar - opennlp-tools-1.5.3.jar - slf4j-api-1.7.9.jar - slf4j-simple-1.7.10.jar - solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar / - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance Regards. Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a écrit : Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7
Re: Custom TokenFilter
Re, I have tried to remove all the redundant jar files.Then i've relaunched it but it's blocked directly on the same issue. It's very strange. Regards, Le Mercredi 25 mars 2015 23h31, Erick Erickson erickerick...@gmail.com a écrit : Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamingtext/dependency directory did you? That means you have Lucene (and solr and solrj and ...) in your class path twice since they're _already_ in your classpath by default since you're running Solr. All your jars should be in your aggregate classpath exactly once. Having them in twice would explain the cast exception. not need these in the tamingtext/dependency subdirectory, just the things that are _not_ in Solr already.. Best, Erick On Wed, Mar 25, 2015 at 12:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar - junit-4.10.jar - log4j-1.2.17.jar - lucene-analyzers-common-4.10.2.jar - lucene-benchmark-4.10.2.jar - lucene-core-4.10.2.jar - mahout-core-0.9.jar - noggit-0.5.jar - opennlp-maxent-3.0.3.jar - opennlp-tools-1.5.3.jar - slf4j-api-1.7.9.jar - slf4j-simple-1.7.10.jar - solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar / - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance Regards. Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a écrit : Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat
Re: Custom TokenFilter
Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamingtext/dependency directory did you? That means you have Lucene (and solr and solrj and ...) in your class path twice since they're _already_ in your classpath by default since you're running Solr. All your jars should be in your aggregate classpath exactly once. Having them in twice would explain the cast exception. not need these in the tamingtext/dependency subdirectory, just the things that are _not_ in Solr already.. Best, Erick On Wed, Mar 25, 2015 at 12:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar - junit-4.10.jar - log4j-1.2.17.jar - lucene-analyzers-common-4.10.2.jar - lucene-benchmark-4.10.2.jar - lucene-core-4.10.2.jar - mahout-core-0.9.jar - noggit-0.5.jar - opennlp-maxent-3.0.3.jar - opennlp-tools-1.5.3.jar - slf4j-api-1.7.9.jar - slf4j-simple-1.7.10.jar - solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar / - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance Regards. Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a écrit : Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486)... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class
Re: Custom TokenFilter
Thanks for letting us know the resolution, the problem was bugging me Erick On Wed, Mar 25, 2015 at 4:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Finally, i think i found where this problem comes.I didn't use the right class extender, instead using Tokenizers, i'm using Token filter. Eric, thanks for your replies.Regards. Le Mercredi 25 mars 2015 23h55, Test Test andymish...@yahoo.fr a écrit : Re, I have tried to remove all the redundant jar files.Then i've relaunched it but it's blocked directly on the same issue. It's very strange. Regards, Le Mercredi 25 mars 2015 23h31, Erick Erickson erickerick...@gmail.com a écrit : Wait, you didn't put, say, lucene-core-4.10.2.jar into your contrib/tamingtext/dependency directory did you? That means you have Lucene (and solr and solrj and ...) in your class path twice since they're _already_ in your classpath by default since you're running Solr. All your jars should be in your aggregate classpath exactly once. Having them in twice would explain the cast exception. not need these in the tamingtext/dependency subdirectory, just the things that are _not_ in Solr already.. Best, Erick On Wed, Mar 25, 2015 at 12:21 PM, Test Test andymish...@yahoo.fr wrote: Re, Sorry about the image.So, there are all my dependencies jar in listing below : - commons-cli-2.0-mahout.jar - commons-compress-1.9.jar - commons-io-2.4.jar - commons-logging-1.2.jar - httpclient-4.4.jar - httpcore-4.4.jar - httpmime-4.4.jar - junit-4.10.jar - log4j-1.2.17.jar - lucene-analyzers-common-4.10.2.jar - lucene-benchmark-4.10.2.jar - lucene-core-4.10.2.jar - mahout-core-0.9.jar - noggit-0.5.jar - opennlp-maxent-3.0.3.jar - opennlp-tools-1.5.3.jar - slf4j-api-1.7.9.jar - slf4j-simple-1.7.10.jar - solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml - lib dir=../../../contrib/tamingtext/lib regex=.*\.jar / - lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance Regards. Le Mercredi 25 mars 2015 20h18, Test Test andymish...@yahoo.fr a écrit : Re, Sorry about the image.So, there are all my dependencies jar in listing below :- commons-cli-2.0-mahout.jar- commons-compress-1.9.jar- commons-io-2.4.jar- commons-logging-1.2.jar- httpclient-4.4.jar- httpcore-4.4.jar- httpmime-4.4.jar- junit-4.10.jar- log4j-1.2.17.jar- lucene-analyzers-common-4.10.2.jar- lucene-benchmark-4.10.2.jar- lucene-core-4.10.2.jar- mahout-core-0.9.jar- noggit-0.5.jar- opennlp-maxent-3.0.3.jar- opennlp-tools-1.5.3.jar- slf4j-api-1.7.9.jar- slf4j-simple-1.7.10.jar- solr-solrj-4.10.2.jar I have put them into a specific repository (contrib/tamingtext/dependency).And my jar containing my class into another repository (contrib/tamingtext/lib).I added these paths in solrconfig.xml lib dir=../../../contrib/tamingtext/lib regex=.*\.jar /lib dir=../../../contrib/tamingtext/dependency regex=.*\.jar / Thanks for advance,Regards. Le Mercredi 25 mars 2015 17h12, Erick Erickson erickerick...@gmail.com a écrit : Images don't come through the mailing list, can't see your image. Whether or not all the jars in the directory you're working on are consistent is the least of your problems. Are the libs to be found in any _other_ place specified on your classpath? Best, Erick On Wed, Mar 25, 2015 at 12:36 AM, Test Test andymish...@yahoo.fr wrote: Thanks Eric, I'm working on Solr 4.10.2 and all my dependencies jar seems to be compatible with this version. [image: Image en ligne] I can't figure out which one make this issue. Thanks Regards, Le Mardi 24 mars 2015 23h45, Erick Erickson erickerick...@gmail.com a écrit : bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at
Re: Custom TokenFilter
bq: 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr. This usually means you have jar files from different versions of Solr in your classpath. Best, Erick On Tue, Mar 24, 2015 at 2:38 PM, Test Test andymish...@yahoo.fr wrote: Hi there, I'm trying to create my own TokenizerFactory (from tamingtext's book).After setting schema.xml and have adding path in solrconfig.xml, i start solr.I have this error message : Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactory. Schema file is .../conf/schema.xmlat org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:595)at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:166)at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)at org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:90)at org.apache.solr.core.ConfigSetService.getConfig(ConfigSetService.java:62)... 7 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] fieldType text: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:486)... 12 moreCaused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/tokenizer: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177)at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:362)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151)... 13 moreCaused by: java.lang.ClassCastException: class com.tamingtext.texttamer.solr.SentenceTokenizerFactoryat java.lang.Class.asSubclass(Class.java:3208)at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:474)at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:593)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:342)at org.apache.solr.schema.FieldTypePluginLoader$2.create(FieldTypePluginLoader.java:335)at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) Someone can help? Thanks.Regards.
Re: custom TokenFilter
If you are writing a custom tokenstream, I recommend using some of the resources in Lucene's test-framework.jar to test it. These find lots of bugs! (including thread-safety bugs) For a filter: I recommend to use the assertions in BaseTokenStreamTestCase: assertTokenStreamContents, assertAnalyzesTo, and especially checkRandomData http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java When testing your filter, for even more checks, don't use Whitespace or Keyword Tokenizer, use MockTokenizer, it has more checks: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/MockTokenizer.java For some examples, you can look at the tests in modules/analysis. And of course enable assertions (-ea) when testing! On Thu, Feb 9, 2012 at 6:30 PM, Jamie Johnson jej2...@gmail.com wrote: I have the need to take user input and index it in a unique fashion, essentially the value is some string (say abcdefghijk) and needs to be converted into a set of tokens (say 1 2 3 4). I am currently have implemented a custom TokenFilter to do this, is this appropriate? In cases where I am indexing things slowly (i.e. 1 at a time) this works fine, but when I send 10,000 things to solr (all in one thread) I am noticing exceptions where it seems that the generated instance variable is being used by several threads. Is my implementation appropriate or is there another more appropriate way to do this? Are TokenFilters reused? Would it be more appropriate to convert the stream to 1 token space separated then run that through a WhiteSpaceTokenizer? Any guidance on this would be greatly appreciated. class CustomFilter extends TokenFilter { private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); private final PositionIncrementAttribute posAtt = addAttribute(PositionIncrementAttribute.class); protected CustomFilter(TokenStream input) { super(input); } IteratorAttributeSource replacement; @Override public boolean incrementToken() throws IOException { if(generated == null){ //setup generated if(!input.incrementToken()){ return false; } //clearAttributes(); ListString cells = StaticClass.generateTokens(termAtt.toString()); generated = new ArrayListAttributeSource(cells.size()); boolean first = true; for(String cell : cells) { AttributeSource newTokenSource = this.cloneAttributes(); CharTermAttribute newTermAtt = newTokenSource.addAttribute(CharTermAttribute.class); newTermAtt.setEmpty(); newTermAtt.append(cell); OffsetAttribute newOffsetAtt = newTokenSource.addAttribute(OffsetAttribute.class); PositionIncrementAttribute newPosIncAtt = newTokenSource.addAttribute(PositionIncrementAttribute.class); newOffsetAtt.setOffset(0,0); newPosIncAtt.setPositionIncrement(first ? 1 : 0); generated.add(newTokenSource); first = false; generated.add(newTokenSource); } } if(!generated.isEmpty()){ copy(this, generated.remove(0)); return true; } return false; } private void copy(AttributeSource target, AttributeSource source) { if (target != source) source.copyTo(target); } private LinkedListAttributeSource buffer; private LinkedListAttributeSource matched; private boolean exhausted; private AttributeSource nextTok() throws IOException { if (buffer != null !buffer.isEmpty()) { return buffer.removeFirst(); } else { if (!exhausted input.incrementToken()) { return this; } else { exhausted = true;
Re: custom TokenFilter
Thanks Robert, I'll take a look there. Does it sound like I'm on the right the right track with what I'm implementing, in other words is a TokenFilter appropriate or is there something else that would be a better fit for what I've described? On Thu, Feb 9, 2012 at 6:44 PM, Robert Muir rcm...@gmail.com wrote: If you are writing a custom tokenstream, I recommend using some of the resources in Lucene's test-framework.jar to test it. These find lots of bugs! (including thread-safety bugs) For a filter: I recommend to use the assertions in BaseTokenStreamTestCase: assertTokenStreamContents, assertAnalyzesTo, and especially checkRandomData http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/BaseTokenStreamTestCase.java When testing your filter, for even more checks, don't use Whitespace or Keyword Tokenizer, use MockTokenizer, it has more checks: http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/test-framework/src/java/org/apache/lucene/analysis/MockTokenizer.java For some examples, you can look at the tests in modules/analysis. And of course enable assertions (-ea) when testing! On Thu, Feb 9, 2012 at 6:30 PM, Jamie Johnson jej2...@gmail.com wrote: I have the need to take user input and index it in a unique fashion, essentially the value is some string (say abcdefghijk) and needs to be converted into a set of tokens (say 1 2 3 4). I am currently have implemented a custom TokenFilter to do this, is this appropriate? In cases where I am indexing things slowly (i.e. 1 at a time) this works fine, but when I send 10,000 things to solr (all in one thread) I am noticing exceptions where it seems that the generated instance variable is being used by several threads. Is my implementation appropriate or is there another more appropriate way to do this? Are TokenFilters reused? Would it be more appropriate to convert the stream to 1 token space separated then run that through a WhiteSpaceTokenizer? Any guidance on this would be greatly appreciated. class CustomFilter extends TokenFilter { private final CharTermAttribute termAtt = addAttribute(CharTermAttribute.class); private final PositionIncrementAttribute posAtt = addAttribute(PositionIncrementAttribute.class); protected CustomFilter(TokenStream input) { super(input); } IteratorAttributeSource replacement; @Override public boolean incrementToken() throws IOException { if(generated == null){ //setup generated if(!input.incrementToken()){ return false; } //clearAttributes(); ListString cells = StaticClass.generateTokens(termAtt.toString()); generated = new ArrayListAttributeSource(cells.size()); boolean first = true; for(String cell : cells) { AttributeSource newTokenSource = this.cloneAttributes(); CharTermAttribute newTermAtt = newTokenSource.addAttribute(CharTermAttribute.class); newTermAtt.setEmpty(); newTermAtt.append(cell); OffsetAttribute newOffsetAtt = newTokenSource.addAttribute(OffsetAttribute.class); PositionIncrementAttribute newPosIncAtt = newTokenSource.addAttribute(PositionIncrementAttribute.class); newOffsetAtt.setOffset(0,0); newPosIncAtt.setPositionIncrement(first ? 1 : 0); generated.add(newTokenSource); first = false; generated.add(newTokenSource); } } if(!generated.isEmpty()){ copy(this, generated.remove(0)); return true; } return false; } private void copy(AttributeSource target, AttributeSource source) { if (target != source) source.copyTo(target); } private LinkedListAttributeSource buffer; private LinkedListAttributeSource matched; private boolean exhausted; private AttributeSource nextTok() throws IOException { if (buffer != null !buffer.isEmpty()) {
Re: custom TokenFilter
On Thu, Feb 9, 2012 at 8:28 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Robert, I'll take a look there. Does it sound like I'm on the right the right track with what I'm implementing, in other words is a TokenFilter appropriate or is there something else that would be a better fit for what I've described? I can't say for sure to be honest... because its a bit too abstract...I don't know the reasoning behind trying to convert abcdefghijk to 1 2 3 4, and I'm not sure I really understand what that means either. But in general: if you are taking the whole content of a field and making it into tokens, then its best implemented as a tokenizer. -- lucidimagination.com
Re: custom TokenFilter
Again thanks. I'll take a stab at that are you aware of any resources/examples of how to do this? I figured I'd start with WhiteSpaceTokenizer but wasn't sure if there was a simpler place to start. On Thu, Feb 9, 2012 at 8:44 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Feb 9, 2012 at 8:28 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Robert, I'll take a look there. Does it sound like I'm on the right the right track with what I'm implementing, in other words is a TokenFilter appropriate or is there something else that would be a better fit for what I've described? I can't say for sure to be honest... because its a bit too abstract...I don't know the reasoning behind trying to convert abcdefghijk to 1 2 3 4, and I'm not sure I really understand what that means either. But in general: if you are taking the whole content of a field and making it into tokens, then its best implemented as a tokenizer. -- lucidimagination.com
Re: custom TokenFilter
On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson jej2...@gmail.com wrote: Again thanks. I'll take a stab at that are you aware of any resources/examples of how to do this? I figured I'd start with WhiteSpaceTokenizer but wasn't sure if there was a simpler place to start. Well, easiest is if you can build what you need out of existing resources... But if you need to write your own, and If your input is not massive documents/you have no problem processing the whole field in RAM at once, you could try looking at PatternTokenizer for an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java -- lucidimagination.com
Re: custom TokenFilter
Thanks Robert, worked perfect for the index side of the house. Now on the query side I have a similar Tokenizer, but it's not operating quite the way I want it to. The query tokenizer generates the tokens properly except I'm ending up with a phrase query, i.e. field:1 2 3 4 when I really want field:1 OR field:2 OR field:3 OR field:4. Is there something in the tokenizer that needs to be set for this to generate this type of query or is it something in the query parser? On Thu, Feb 9, 2012 at 9:02 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson jej2...@gmail.com wrote: Again thanks. I'll take a stab at that are you aware of any resources/examples of how to do this? I figured I'd start with WhiteSpaceTokenizer but wasn't sure if there was a simpler place to start. Well, easiest is if you can build what you need out of existing resources... But if you need to write your own, and If your input is not massive documents/you have no problem processing the whole field in RAM at once, you could try looking at PatternTokenizer for an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java -- lucidimagination.com
Re: custom TokenFilter
Think I figured it out, the tokens just needed the same position attribute. On Thu, Feb 9, 2012 at 10:38 PM, Jamie Johnson jej2...@gmail.com wrote: Thanks Robert, worked perfect for the index side of the house. Now on the query side I have a similar Tokenizer, but it's not operating quite the way I want it to. The query tokenizer generates the tokens properly except I'm ending up with a phrase query, i.e. field:1 2 3 4 when I really want field:1 OR field:2 OR field:3 OR field:4. Is there something in the tokenizer that needs to be set for this to generate this type of query or is it something in the query parser? On Thu, Feb 9, 2012 at 9:02 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Feb 9, 2012 at 8:54 PM, Jamie Johnson jej2...@gmail.com wrote: Again thanks. I'll take a stab at that are you aware of any resources/examples of how to do this? I figured I'd start with WhiteSpaceTokenizer but wasn't sure if there was a simpler place to start. Well, easiest is if you can build what you need out of existing resources... But if you need to write your own, and If your input is not massive documents/you have no problem processing the whole field in RAM at once, you could try looking at PatternTokenizer for an example: http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/pattern/PatternTokenizer.java -- lucidimagination.com