[jira] [Commented] (SOLR-3350) TextField's parseFieldQuery method not using analyzer's enablePosIncr parameter
[ https://issues.apache.org/jira/browse/SOLR-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252285#comment-13252285 ] Tommaso Teofili commented on SOLR-3350: --- Hi Robert, For TextField having enablePositionIncrements just set to true and then evaluating an always true condition seems just wrong (code wise) so we should discuss if the issue is either in the true constant or in the code switching on it. It should be clear how a mixed configuration like the one above should result in terms of an overall enablePositionIncrements property (true, false, not set) if that's needed in the field type implementation (maybe traversing objects from the QParser to the SchemaField or in some more convenient way, if it exists). Depending on the choice taken on how to fix the code, if a Solr type using TextField has a tokenizer/some filters with enablePositionIncrements set to false then there would be different options: - option 1: it should raise a configuration error - option 2: log a warning message - option 3: don't care (like it is now) TextField's parseFieldQuery method not using analyzer's enablePosIncr parameter --- Key: SOLR-3350 URL: https://issues.apache.org/jira/browse/SOLR-3350 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 3.5, 4.0 Reporter: Tommaso Teofili Priority: Minor parseFieldQuery method of TextField class just set {code} ... boolean enablePositionIncrements = true; ... {code} while that should be taken from Analyzer's configuration. The above condition is evaluated afterwards in two points: {code} ... if (enablePositionIncrements) { mpq.add((Term[]) multiTerms.toArray(new Term[0]), position); } else { mpq.add((Term[]) multiTerms.toArray(new Term[0])); } return mpq; ... ... if (enablePositionIncrements) { position += positionIncrement; pq.add(new Term(field, term), position); } else { pq.add(new Term(field, term)); } ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241049#comment-13241049 ] Tommaso Teofili commented on SOLR-2983: --- and merged to branch_rx on r1306733 Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Assignee: Tommaso Teofili Priority: Minor Fix For: 4.0 Attachments: SOLR-2983.patch, SOLR-2983_2.patch As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235413#comment-13235413 ] Tommaso Teofili commented on SOLR-2983: --- I just noticed also the toIndexWriter method should be explicitly tested, going to work on it and attach a new patch Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Assignee: Tommaso Teofili Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2983.patch As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236111#comment-13236111 ] Tommaso Teofili commented on SOLR-2983: --- committed on trunk at r1304098 Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Assignee: Tommaso Teofili Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2983.patch, SOLR-2983_2.patch As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233731#comment-13233731 ] Tommaso Teofili commented on SOLR-2983: --- Agreed, I'm going to do the needed updates to changes.txt and upgrade/backcompat. Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Assignee: Tommaso Teofili Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-2983.patch As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
[ https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230987#comment-13230987 ] Tommaso Teofili commented on LUCENE-3869: - Ok thanks Robert, let me know if you find a specific scenario where this happens more frequently so that I can try it out as well. possible hang in UIMATypeAwareAnalyzerTest -- Key: LUCENE-3869 URL: https://issues.apache.org/jira/browse/LUCENE-3869 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Robert Muir Just testing an unrelated patch, I was hung (with 100% cpu) in UIMATypeAwareAnalyzerTest. I'll attach stacktrace at the moment of the hang. The fact we get a seed in the actual stacktraces for cases like this is awesome! Thanks Dawid! I don't think it reproduces 100%, but I'll try beasting this seed to see if i can reproduce the hang: should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' from what I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
[ https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229939#comment-13229939 ] Tommaso Teofili commented on LUCENE-3869: - I tried to reproduce that many times (same command/seed) but with no luck so far, which environment are you running Robert? possible hang in UIMATypeAwareAnalyzerTest -- Key: LUCENE-3869 URL: https://issues.apache.org/jira/browse/LUCENE-3869 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Robert Muir Just testing an unrelated patch, I was hung (with 100% cpu) in UIMATypeAwareAnalyzerTest. I'll attach stacktrace at the moment of the hang. The fact we get a seed in the actual stacktraces for cases like this is awesome! Thanks Dawid! I don't think it reproduces 100%, but I'll try beasting this seed to see if i can reproduce the hang: should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' from what I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest
[ https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229241#comment-13229241 ] Tommaso Teofili commented on LUCENE-3869: - Thanks Robert, I'm taking a look possible hang in UIMATypeAwareAnalyzerTest -- Key: LUCENE-3869 URL: https://issues.apache.org/jira/browse/LUCENE-3869 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Robert Muir Just testing an unrelated patch, I was hung (with 100% cpu) in UIMATypeAwareAnalyzerTest. I'll attach stacktrace at the moment of the hang. The fact we get a seed in the actual stacktraces for cases like this is awesome! Thanks Dawid! I don't think it reproduces 100%, but I'll try beasting this seed to see if i can reproduce the hang: should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' from what I can see. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228941#comment-13228941 ] Tommaso Teofili commented on SOLR-3013: --- yes, this is committed but it's not resolved yet as it needs to be adapted to 3.x as well. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3204) solr-commons-csv must not use the org.apache.commons.csv package
[ https://issues.apache.org/jira/browse/SOLR-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224678#comment-13224678 ] Tommaso Teofili commented on SOLR-3204: --- bq. Well the carrot is similar to the uima case, in both cases we have their committers also as committers working on integrations within on our project, and they have voiced no problem with how things work so far, so why break it? also starting from 3.5.0 the UIMA dependencies' jars are released artifacts (see SOLR-2746 and thus http://mvnrepository.com/artifact/org.apache.solr/solr-uima/3.5.0 ) solr-commons-csv must not use the org.apache.commons.csv package Key: SOLR-3204 URL: https://issues.apache.org/jira/browse/SOLR-3204 Project: Solr Issue Type: Bug Components: Build Affects Versions: 3.5 Reporter: Emmanuel Bourg Priority: Blocker Fix For: 3.6 Attachments: SOLR-3204.patch, SOLR-3204.patch, SOLR-3204.patch, apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar, rule.txt, rule.txt, solr-csv.patch The solr-commons-csv artifact reused the code from the Apache Commons CSV project but the package wasn't changed to something else than org.apache.commons.csv in the process. This creates a compatibility issue as the Apache Commons team works toward an official release of Commons CSV. It prevents Commons CSV from using its own org.apache.commons.csv package, or forces the renaming of all the classes to avoid a classpath conflict. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3204) solr-commons-csv must not use the org.apache.commons.csv package
[ https://issues.apache.org/jira/browse/SOLR-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224688#comment-13224688 ] Tommaso Teofili commented on SOLR-3204: --- bq. The issue we had caused by our separate release of unreleased package under the solr group-id was that maven is seeing our repackaged dependency under another artifact id - so it cannot prevent that a project adds solr-commons-xx, version-foo and commony-xx, version-bar, because it is two different things. yes, that's also my understanding of this issue with unreleased dependencies' jars. solr-commons-csv must not use the org.apache.commons.csv package Key: SOLR-3204 URL: https://issues.apache.org/jira/browse/SOLR-3204 Project: Solr Issue Type: Bug Components: Build Affects Versions: 3.5 Reporter: Emmanuel Bourg Priority: Blocker Fix For: 3.6 Attachments: SOLR-3204.patch, SOLR-3204.patch, SOLR-3204.patch, apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar, rule.txt, rule.txt, solr-csv.patch The solr-commons-csv artifact reused the code from the Apache Commons CSV project but the package wasn't changed to something else than org.apache.commons.csv in the process. This creates a compatibility issue as the Apache Commons team works toward an official release of Commons CSV. It prevents Commons CSV from using its own org.apache.commons.csv package, or forces the renaming of all the classes to avoid a classpath conflict. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2693) Solr 4.0 - start and rows parameter addup together
[ https://issues.apache.org/jira/browse/SOLR-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224708#comment-13224708 ] Tommaso Teofili commented on SOLR-2693: --- bq. I can't reproduce this using the current trunk example. same here, Marcin could you please say something more about the configuration used where this is happening? Solr 4.0 - start and rows parameter addup together -- Key: SOLR-2693 URL: https://issues.apache.org/jira/browse/SOLR-2693 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: centos 5.4, tomcat 7, Java 1.6.0_14 Reporter: Marcin Priority: Blocker Hi guys, I have a weird problem with rows and start parameters, start simply is not working but any number of it is being added to rows so i.e. when used like that start=10 rows=20 then 30 rows will be returned beginning from the 1st result. any ideas ? cheers -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2253) Solr should be able to keep on truckin' if a shard fails during a distributed search
[ https://issues.apache.org/jira/browse/SOLR-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224760#comment-13224760 ] Tommaso Teofili commented on SOLR-2253: --- I think we can mark this one as duplicate of SOLR-1143 Solr should be able to keep on truckin' if a shard fails during a distributed search Key: SOLR-2253 URL: https://issues.apache.org/jira/browse/SOLR-2253 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4.1 Environment: All Reporter: Rich Cariens Priority: Critical Attachments: SOLR-2253.patch Original Estimate: 1h Remaining Estimate: 1h Solr 1.4.x currently abandons searches if a shard fails during a distributed search. A trivial patch to the SearchHandler class would allow the user to tell Solr to keep on trucking in these cases. Solr can indicate that the search response is partial via existing response header conventions, as well as include details about which shard failed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3197) Allow firstSearcher and newSearcher listeners to run in multiple threads
[ https://issues.apache.org/jira/browse/SOLR-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221539#comment-13221539 ] Tommaso Teofili commented on SOLR-3197: --- An alternative would be to use the CachedThreadPool as default as it makes it possible to reuse cached threads (see http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool() ) Allow firstSearcher and newSearcher listeners to run in multiple threads Key: SOLR-3197 URL: https://issues.apache.org/jira/browse/SOLR-3197 Project: Solr Issue Type: Improvement Reporter: Lance Norskog SolrCore submits all listeners (firstSearcher and newSearcher) to a java ExecutorService, but uses a single-threaded one. line 965 in the trunk: {code} SolrCore.java around line 965: final ExecutorService searcherExecutor = Executors.newSingleThreadExecutor(); line 1280 in the trunk: SolrCore.java around line 1280 runs first the, and then the first and new searchers, all with the searcherExecutor object created at line 965. Would it work if we changed this ExecutorService to a thread pool version? This seems like it should work: {code} java.util.concurrent.Executors.newFixedThreadPool(int nThreads, ThreadFactory threadFactory); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219906#comment-13219906 ] Tommaso Teofili commented on SOLR-3013: --- thanks Steven, now fixing Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219962#comment-13219962 ] Tommaso Teofili commented on SOLR-3013: --- it should be ok now. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218975#comment-13218975 ] Tommaso Teofili commented on LUCENE-3731: - I think we can mark this one as resolved, just I'd keep this only for trunk and backport the whole thing to 3.x once SOLR-3013 is resolved and committed to trunk too. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3140) Make omitNorms default for all numeric field types
[ https://issues.apache.org/jira/browse/SOLR-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219612#comment-13219612 ] Tommaso Teofili commented on SOLR-3140: --- bq. Is there a better place to set this default than in init() in the new base class? I agree that's the method responsible for doing this kind of stuff bq. I don't think so? if you search on a multivalued string field like keywords or tags it's reasonable to want length normalization to be a factor to prevent keyword stuffing. good point Make omitNorms default for all numeric field types -- Key: SOLR-3140 URL: https://issues.apache.org/jira/browse/SOLR-3140 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Jan Høydahl Labels: omitNorms Fix For: 4.0 Attachments: SOLR-3140.patch Today norms are enabled for all Solr field types by default, while in Lucene norms are omitted for the numeric types. Propose to make the Solr defaults the same as in Lucene, so that if someone occasionally wants index-side boost for a numeric field type they must say omitNorms=false. This lets us simplify the example schema too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219615#comment-13219615 ] Tommaso Teofili commented on SOLR-3013: --- Now that LUCENE-3731 has been resolved I'll proceed with adding the needed factories for the Tokenizers in Solr. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219624#comment-13219624 ] Tommaso Teofili commented on SOLR-3013: --- Solr factories committed in r1295330 Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3140) Make omitNorms default for all numeric field types
[ https://issues.apache.org/jira/browse/SOLR-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219625#comment-13219625 ] Tommaso Teofili commented on SOLR-3140: --- maybe something like PrimitiveFieldType (that should recall Java primitive types) Make omitNorms default for all numeric field types -- Key: SOLR-3140 URL: https://issues.apache.org/jira/browse/SOLR-3140 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Jan Høydahl Labels: omitNorms Fix For: 4.0 Attachments: SOLR-3140.patch Today norms are enabled for all Solr field types by default, while in Lucene norms are omitted for the numeric types. Propose to make the Solr defaults the same as in Lucene, so that if someone occasionally wants index-side boost for a numeric field type they must say omitNorms=false. This lets us simplify the example schema too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3140) Make omitNorms default for all numeric field types
[ https://issues.apache.org/jira/browse/SOLR-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218600#comment-13218600 ] Tommaso Teofili commented on SOLR-3140: --- yes big +1 Make omitNorms default for all numeric field types -- Key: SOLR-3140 URL: https://issues.apache.org/jira/browse/SOLR-3140 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Jan Høydahl Labels: omitNorms Fix For: 4.0 Today norms are enabled for all Solr field types by default, while in Lucene norms are omitted for the numeric types. Propose to make the Solr defaults the same as in Lucene, so that if someone occasionally wants index-side boost for a numeric field type they must say omitNorms=false. This lets us simplify the example schema too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218602#comment-13218602 ] Tommaso Teofili commented on SOLR-3174: --- yes this'd be a very nice improvement Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Reporter: Ryan McKinley It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216459#comment-13216459 ] Tommaso Teofili commented on LUCENE-3731: - the two methods analyzeText() and analyzeInput() are confusing so the first one should just be renamed as initializeIterator() as its main purpose is to prepare the FSIterator which holds the annotations that will be used inside the incrementToken() method. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214093#comment-13214093 ] Tommaso Teofili commented on LUCENE-3731: - After some more testing I think the CasPool is good just for scenarios where the pool serves different CAS to different clients (the tokenizers), so not really helpful in the current implementation, however it may be useful if we abstract the operation of obtaining and releasing a CAS outside the BaseTokenizer. In the meantime I noticed the AEProviderFactory getAEProvider() methods have a keyPrefix parameter that came from Solr implementation and was intended to hold the core name, so, at the moment I think it'd be better to have (also) methods which don't need that paramater for the Lucene uses. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209247#comment-13209247 ] Tommaso Teofili commented on LUCENE-3731: - Right, everything seems ok now. I also tried to comment the {noformat} property name=tests.threadspercpu value=0 / {noformat} line in build.xml in order to execute tests in parallel. Multiple parallel tests executions, with also -Dtests.multiplier=100, with Java6 passed flawlessly; will see if that is the case for Java7 too. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209301#comment-13209301 ] Tommaso Teofili commented on LUCENE-3731: - some improvement in performance came out releasing the CAS and AE on close() call {noformat} @Override public void close() throws IOException { super.close(); // release UIMA resources cas.release(); ae.destroy(); } {noformat} Now investigating the use of CASPool for improving throughput on high usages scenarios. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209490#comment-13209490 ] Tommaso Teofili commented on LUCENE-3731: - bq. But the question is: is it safe to use CAS/AE after you call release()/destroy() on them? no it isn't, so you're right: those methods should not be inside the close() method. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208393#comment-13208393 ] Tommaso Teofili commented on LUCENE-3731: - Ok, I noticed this was due to an issue on the UIMA side. I think the best option (as those are used just for testing) is to use a dummy implementation of both UIMA based whitespace tokenizer and PoS tagger thus also avoiding the log lines when executing tests using Maven. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208460#comment-13208460 ] Tommaso Teofili commented on LUCENE-3731: - fix for the issues reported by Steven committed in r1244474 Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208753#comment-13208753 ] Tommaso Teofili commented on LUCENE-3731: - Thanks Robert for taking care of this, nice improvement :) I agree on the OverridingParams extending the base one, it was also my intent to do that. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208766#comment-13208766 ] Tommaso Teofili commented on LUCENE-3731: - bq. OK, if there is no objection I will commit this one. +1, I'll post my progress on other possible improvements in performances I'm testing later. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208076#comment-13208076 ] Tommaso Teofili commented on LUCENE-3731: - committed on trunk in r1244236 Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208145#comment-13208145 ] Tommaso Teofili commented on LUCENE-3731: - Thank you very much Steven for reporting. The {noformat} Feb 14, 2012 6:34:18 PM WhitespaceTokenizer initialize INFO: Whitespace tokenizer successfully initialized Feb 14, 2012 6:34:18 PM WhitespaceTokenizer typeSystemInit INFO: Whitespace tokenizer typesystem initialized {noformat} messages are due to UIMA WhitespaceTokenizer Annotator which logs the initialization/processing/etc. calls. That is printed out many times because the testRandomStrings test method just does lots of tricky tests on the UIMATokenizer which require the above calls to be executed repeatedly. I'll take a look to the other failures which didn't show up on the tests I had done till now. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch, LUCENE-3731_4.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206997#comment-13206997 ] Tommaso Teofili commented on LUCENE-3731: - bq. Hi Tommaso, I think it would be cleaner to set the final offset in end() instead? ok, +1. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, LUCENE-3731_3.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported
[ https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200362#comment-13200362 ] Tommaso Teofili commented on SOLR-3049: --- Hi Harsh, I think there should be a more general way of mapping typed parameters, just need to dig a little deeper to find it. However in the meantime I'll try and test your patch, thanks! UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported - Key: SOLR-3049 URL: https://issues.apache.org/jira/browse/SOLR-3049 Project: Solr Issue Type: Bug Components: update Reporter: Harsh P Priority: Minor Labels: uima, update_request_handler Attachments: SOLR-3049.patch solrconfig.xml file has an option to override certain UIMA runtime parameters in the UpdateRequestProcessorChain section. There are certain UIMA annotators like RegexAnnotator which define runtimeParameters value as an Array which is not currently supported in the Solr-UIMA interface. In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java, private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName) function defines override for UIMA analysis engine runtimeParameters as they are passed to UIMA Analysis Engine. runtimeParameters which are currently supported in the Solr-UIMA interface are: String Integer Boolean Float I have made a hack to fix this issue to add Array support. I would like to submit that as a patch if no one else is working on fixing this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3744) Add support for type whitelist in TypeTokenFilter
[ https://issues.apache.org/jira/browse/LUCENE-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199603#comment-13199603 ] Tommaso Teofili commented on LUCENE-3744: - applied on trunk r1240034 appliend on branch-3.x r1240035 Add support for type whitelist in TypeTokenFilter - Key: LUCENE-3744 URL: https://issues.apache.org/jira/browse/LUCENE-3744 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Santiago M. Mola Assignee: Tommaso Teofili Priority: Trivial Attachments: LUCENE-3744_2.patch, TypeTokenFilter-whitelist.patch, TypeTokenFilter_whitelst_lucene_and_solr.patch A usual use case for TypeTokenFilter is allowing only a set of token types. That is, listing allowed types, instead of filtered ones. I'm attaching a patch to add a useWhitelist option for that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199771#comment-13199771 ] Tommaso Teofili commented on LUCENE-3731: - right Uwe, thanks so much for the quick review :) Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3093) Remove unused features boolTofilterOptimizer and HashDocSet
[ https://issues.apache.org/jira/browse/SOLR-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199784#comment-13199784 ] Tommaso Teofili commented on SOLR-3093: --- bq. There is some code which tries to use it but I believe that since 1.4 there are more efficient ways to do the same. Should we also fail-fast if found in config or only print a warning? IMHO we should print a warning for 3.x and fail fast from 4 on. Remove unused features boolTofilterOptimizer and HashDocSet --- Key: SOLR-3093 URL: https://issues.apache.org/jira/browse/SOLR-3093 Project: Solr Issue Type: Improvement Reporter: Jan Høydahl Fix For: 3.6, 4.0 SolrConfig.java still tries to parse boolTofilterOptimizer But the only user of this param was SolrIndexSearcher.java line 366-381 which is commented out. Probably the whole logic should be ripped out, and we fail hard if we find this config option in solrconfig.xml Also, the HashDocSet config option is old and no longer used or needed? There is some code which tries to use it but I believe that since 1.4 there are more efficient ways to do the same. Should we also fail-fast if found in config or only print a warning? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199893#comment-13199893 ] Tommaso Teofili commented on LUCENE-3731: - Hey Robert, that's super, thanks! I'm going to collect your suggestions in a new patch shortly. Create a analysis/uima module for UIMA based tokenizers/analyzers - Key: LUCENE-3731 URL: https://issues.apache.org/jira/browse/LUCENE-3731 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Tommaso Teofili Assignee: Tommaso Teofili Fix For: 3.6, 4.0 Attachments: LUCENE-3731.patch As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored out in a separate module (modules/analysis/uima) as they can be used in plain Lucene. Then the solr/contrib/uima will contain only the related factories. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3744) Add support for type whitelist in TypeTokenFilter
[ https://issues.apache.org/jira/browse/LUCENE-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197834#comment-13197834 ] Tommaso Teofili commented on LUCENE-3744: - Hello Santiago, would you mind also providing unit tests for the whitelist usage? Add support for type whitelist in TypeTokenFilter - Key: LUCENE-3744 URL: https://issues.apache.org/jira/browse/LUCENE-3744 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Santiago M. Mola Priority: Trivial Attachments: TypeTokenFilter-whitelist.patch A usual use case for TypeTokenFilter is allowing only a set of token types. That is, listing allowed types, instead of filtered ones. I'm attaching a patch to add a useWhitelist option for that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196999#comment-13196999 ] Tommaso Teofili commented on SOLR-3013: --- Considering the needed refactoring to put the tokenizers/analyzers in a dedicated Lucene analysis module I think the 'ae' package for creating AnalysisEngines should be moved to that module as well, so that there is a common mechanism for instantiating AnalysisEngines both in Lucene and Solr. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195986#comment-13195986 ] Tommaso Teofili commented on SOLR-3013: --- Chris, Robert, thanks for your comments, I'll integrate your suggestions in a new patch. I agree with the module proposal as this was part of a following issue/discussion I'd be going to raise. Maybe I can create a new issue in Lucene for creating a new module under modules/analysis/uima containing just the Lucene UIMA tokenizers and then create a new patch for this one which contains only the factories. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported
[ https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195719#comment-13195719 ] Tommaso Teofili commented on SOLR-3049: --- Good catch, if you could provide that patch I will take care of review and commit it if that is ok. UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported - Key: SOLR-3049 URL: https://issues.apache.org/jira/browse/SOLR-3049 Project: Solr Issue Type: Bug Components: update Reporter: Harsh P Priority: Minor Labels: uima, update_request_handler solrconfig.xml file has an option to override certain UIMA runtime parameters in the UpdateRequestProcessorChain section. There are certain UIMA annotators like RegexAnnotator which define runtimeParameters value as an Array which is not currently supported in the Solr-UIMA interface. In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java, private Object getRuntimeValue(AnalysisEngineDescription desc, String attributeName) function defines override for UIMA analysis engine runtimeParameters as they are passed to UIMA Analysis Engine. runtimeParameters which are currently supported in the Solr-UIMA interface are: String Integer Boolean Float I have made a hack to fix this issue to add Array support. I would like to submit that as a patch if no one else is working on fixing this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195861#comment-13195861 ] Tommaso Teofili commented on SOLR-3013: --- If no one objects I'll commit this shortly. Add UIMA based tokenizers / filters that can be used in the schema.xml -- Key: SOLR-3013 URL: https://issues.apache.org/jira/browse/SOLR-3013 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.5 Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Labels: uima, update_request_handler Fix For: 3.6, 4.0 Attachments: SOLR-3013.patch Add UIMA based tokenizers / filters that can be declared and used directly inside the schema.xml. Thus instead of using the UIMA UpdateRequestProcessor one could directly define per-field NLP capable tokenizers / filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3054) Add a TypeTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190643#comment-13190643 ] Tommaso Teofili commented on SOLR-3054: --- bq. I looked up the other TokenFilters that filter tokens, unfortunately all of then default to enablePosIncr=false. I am not sure what the right solution is here? Consistency or correctness? in the first patch I went for consistency but then your comment made me realize the enablePosIncr should be true by default. I mean, as a user I'd expect it to be true by default. bq. I would only remove the try-catch blocks in the test methods and let the test method declare the exception. It then gets reported by JUnit with a failure automatically. ok bq. The question is, the wordset is initialized to be empty if missing. Does it make sense? I would maybe make the types file mandatory, as without the filter makes no sense. right, need to fix that Add a TypeTokenFilterFactory Key: SOLR-3054 URL: https://issues.apache.org/jira/browse/SOLR-3054 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Tommaso Teofili Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: SOLR-3054.patch, SOLR-3054_2.patch Create a TypeTokenFilterFactory to make the TypeTokenFilter (filtering tokens depending on token types, see LUCENE-3671) available in Solr too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3054) Add a TypeTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190749#comment-13190749 ] Tommaso Teofili commented on SOLR-3054: --- Thanks Uwe Add a TypeTokenFilterFactory Key: SOLR-3054 URL: https://issues.apache.org/jira/browse/SOLR-3054 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Tommaso Teofili Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: SOLR-3054.patch, SOLR-3054_2.patch, SOLR-3054_3.patch Create a TypeTokenFilterFactory to make the TypeTokenFilter (filtering tokens depending on token types, see LUCENE-3671) available in Solr too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter
[ https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190425#comment-13190425 ] Tommaso Teofili commented on LUCENE-3671: - Thanks Uwe for taking care of it :) Add a TypeTokenFilter - Key: LUCENE-3671 URL: https://issues.apache.org/jira/browse/LUCENE-3671 Project: Lucene - Java Issue Type: New Feature Components: core/queryparser Reporter: Santiago M. Mola Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, LUCENE-3671_3.patch It would be convenient to have a TypeTokenFilter that filters tokens by its type, either with an exclude or include list. This might be a stupid thing to provide for people who use Lucene directly, but it would be very useful to later expose it to Solr and other Lucene-backed search solutions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter
[ https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190544#comment-13190544 ] Tommaso Teofili commented on LUCENE-3671: - Sure Uwe, I'll open a new one for the related Solr factory Add a TypeTokenFilter - Key: LUCENE-3671 URL: https://issues.apache.org/jira/browse/LUCENE-3671 Project: Lucene - Java Issue Type: New Feature Components: core/queryparser Reporter: Santiago M. Mola Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, LUCENE-3671_3.patch It would be convenient to have a TypeTokenFilter that filters tokens by its type, either with an exclude or include list. This might be a stupid thing to provide for people who use Lucene directly, but it would be very useful to later expose it to Solr and other Lucene-backed search solutions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter
[ https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189206#comment-13189206 ] Tommaso Teofili commented on LUCENE-3671: - A very basic TypeTokenFilter can be implemented extending a FilteringTokenFilter where the accept() method checks on a stopType set for the typeAttribute.type() matching. Add a TypeTokenFilter - Key: LUCENE-3671 URL: https://issues.apache.org/jira/browse/LUCENE-3671 Project: Lucene - Java Issue Type: New Feature Components: core/queryparser Reporter: Santiago M. Mola Attachments: LUCENE-3671.patch It would be convenient to have a TypeTokenFilter that filters tokens by its type, either with an exclude or include list. This might be a stupid thing to provide for people who use Lucene directly, but it would be very useful to later expose it to Solr and other Lucene-backed search solutions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187733#comment-13187733 ] Tommaso Teofili commented on SOLR-2983: --- it looks like it's a cyclic dependency problem since the SolrIndexWriter uses the SolrIndexConfig.toIndexWriterConfig method to create an IndexWriterConfig which is used to call the basic Lucene IndexWriter constructor while at the same time the SolrIndexConfig.toIndexWriterConfig may need an IndexWriter to instantiate the MergePolicy (try clause). Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Priority: Critical Fix For: 3.6, 4.0 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy
[ https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187762#comment-13187762 ] Tommaso Teofili commented on SOLR-2983: --- that probably depends on migrations from old APIs, however, apart from that, I agree the setter (and SetOnce) facilities are the best way to inject an IW in the mergePolicy. Therefore the above try/catch clause has little meaning and IMHO it may be better to just keep the policy instantiation like this: MergePolicy policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); Unable to load custom MergePolicy - Key: SOLR-2983 URL: https://issues.apache.org/jira/browse/SOLR-2983 Project: Solr Issue Type: Bug Reporter: Mathias Herberts Priority: Critical Fix For: 3.6, 4.0 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to our use of LinkedIn's ZoieMergePolicy. It seems the code that loads a custom MergePolicy was at some point moved into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was copied verbatim it now contains a bug: try { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName, null, new Class[]{IndexWriter.class}, new Object[]{this}); } catch (Exception e) { policy = (MergePolicy) schema.getResourceLoader().newInstance(mpClassName); } 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call to newInstance will always throw an exception and the catch clause will be executed. If the custom MergePolicy does not have a default constructor (which is the case of ZoieMergePolicy), the second attempt to create the MergePolicy will also fail and Solr won't start. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org