Re: Eclipse project files...
Paolo Castagna wrote: Hi, could you be more precise about 'import project from source tree'. I think he suggested File New Java Project Create project from existing source ? Koji -- http://www.rondhuit.com/en/
[jira] Closed: (SOLR-1879) Error loading class 'Solr.ASCIIFoldingFilterFactory'
[ https://issues.apache.org/jira/browse/SOLR-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi closed SOLR-1879. Resolution: Not A Problem Adlene, please use solr-user mailing list for getting help. http://lucene.apache.org/solr/mailing_lists.html Error loading class 'Solr.ASCIIFoldingFilterFactory' Key: SOLR-1879 URL: https://issues.apache.org/jira/browse/SOLR-1879 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Environment: Windows XP, Apache Tomcat 6 Reporter: adlene sifi I am trying to use Solr.ASCIIFoldingFilterFactory filter as follow : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=french_stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French/ filter class=Solr.ASCIIFoldingFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ /analyzer ... /fieldType However I receive the following error message when restarting Apach Tomcat server : GRAVE: org.apache.solr.common.SolrException: Error loading class 'Solr.ASCIIFoldingFilterFactory' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388) . Caused by: java.lang.ClassNotFoundException: Solr.ASCIIFoldingFilterFactory at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) ... 40 more Could you please help me on that ? Thanks a lot Adlene -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (SOLR-1878) RelaxQueryComponent - A new SearchComponent that relaxes the main in a semiautomatic way
RelaxQueryComponent - A new SearchComponent that relaxes the main in a semiautomatic way Key: SOLR-1878 URL: https://issues.apache.org/jira/browse/SOLR-1878 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor I have the following use case: Imagine that you visit a web page for searching an apartment for rent. You choose parameters, usually mark check boxes and this makes AND queries: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} If the conditions are too tight, Solr may return few or zero leasehold properties. Because the things is not good for the site visitors and also owners, the owner may want to recommend the visitors to relax the conditions something like: {code} rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} or: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *] {code} And if the relaxed query get more numFound than original, the web page can provide a link with a comment if you can pay additional $100, ${numFound} properties will be found!. Today, I need to implement client for this scenario, but this way makes two round trips for showing one page and consistency problem (and laborious of course!). I'm thinking a new SearchComponent that can be used with QueryComponent. It does search when numFound of the main query is less than a threshold. Clients can specify via request parameters how the query can be relaxed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (SOLR-1878) RelaxQueryComponent - A new SearchComponent that relaxes the main query in a semiautomatic way
[ https://issues.apache.org/jira/browse/SOLR-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1878: - Summary: RelaxQueryComponent - A new SearchComponent that relaxes the main query in a semiautomatic way (was: RelaxQueryComponent - A new SearchComponent that relaxes the main in a semiautomatic way) Description: I have the following use case: Imagine that you visit a web page for searching an apartment for rent. You choose parameters, usually mark check boxes and this makes AND queries: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} If the conditions are too tight, Solr may return few or zero leasehold properties. Because the things is not good for the site visitors and also owners, the owner may want to recommend the visitors to relax the conditions something like: {code} rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} or: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *] {code} And if the relaxed query get more numFound than original, the web page can provide a link with a comment if you can pay additional $100, ${numFound} properties will be found!. Today, I need to implement Solr client for this scenario, but this way makes two round trips for showing one page and consistency problem (and laborious of course!). I'm thinking a new SearchComponent that can be used with QueryComponent. It does search when numFound of the main query is less than a threshold. Clients can specify via request parameters how the query can be relaxed. was: I have the following use case: Imagine that you visit a web page for searching an apartment for rent. You choose parameters, usually mark check boxes and this makes AND queries: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} If the conditions are too tight, Solr may return few or zero leasehold properties. Because the things is not good for the site visitors and also owners, the owner may want to recommend the visitors to relax the conditions something like: {code} rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} or: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *] {code} And if the relaxed query get more numFound than original, the web page can provide a link with a comment if you can pay additional $100, ${numFound} properties will be found!. Today, I need to implement client for this scenario, but this way makes two round trips for showing one page and consistency problem (and laborious of course!). I'm thinking a new SearchComponent that can be used with QueryComponent. It does search when numFound of the main query is less than a threshold. Clients can specify via request parameters how the query can be relaxed. RelaxQueryComponent - A new SearchComponent that relaxes the main query in a semiautomatic way -- Key: SOLR-1878 URL: https://issues.apache.org/jira/browse/SOLR-1878 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor I have the following use case: Imagine that you visit a web page for searching an apartment for rent. You choose parameters, usually mark check boxes and this makes AND queries: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} If the conditions are too tight, Solr may return few or zero leasehold properties. Because the things is not good for the site visitors and also owners, the owner may want to recommend the visitors to relax the conditions something like: {code} rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *] {code} or: {code} rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *] {code} And if the relaxed query get more numFound than original, the web page can provide a link with a comment if you can pay additional $100, ${numFound} properties will be found!. Today, I need to implement Solr client for this scenario, but this way makes two round trips for showing one page and consistency problem (and laborious of course!). I'm thinking a new SearchComponent that can be used with QueryComponent. It does search when numFound of the main query is less than a threshold. Clients can specify via request parameters how the query can be relaxed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Created: (SOLR-1868) Cutover to flex APIs
Michael McCandless (JIRA) wrote: Cutover to flex APIs Key: SOLR-1868 URL: https://issues.apache.org/jira/browse/SOLR-1868 Project: Solr Issue Type: Bug Reporter: Michael McCandless Fix For: 3.1 We need to fix Solr to use flex APIs! Hello, I'm a latecomer on the issue of flex, but I'd like to know it and if possible, contribute something. But I chicken out when I see LUCENE-1458 and its friend issues: https://issues.apache.org/jira/browse/LUCENE/fixforversion/12314439 I think I should read FlexibleIndexing on wiki, but I'd appreciate it if someone recommend pointers for flex latecomers, if any. Thank you! Koji -- http://www.rondhuit.com/en/
Re: (SOLR-1868) Cutover to flex APIs
for Solr is to cutover to FieldsEnum, TermsEnum, DocsEnum, DocsAndPositionsEnum instead of TermEnum, TermDocs, TermPositions. Hi Mike, These lines are really helps for me, thanks! Koji -- http://www.rondhuit.com/en/
[jira] Updated: (SOLR-860) moreLikeThis Degug
[ https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-860: Attachment: SOLR-860.patch With the attached patch, BooleanQueries constructed by MLT and MLT helper function can be seen in debug area. sample request and response: {code} http://localhost:8983/solr/select/?q=solr+ipodindent=onmlt=onmlt.fl=featuresmlt.mintf=1mlt.count=2debugQuery=onwt=json {code} {code} debug:{ moreLikeThis:{ IW-02:{ rawMLTQuery:, boostedMLTQuery:, realMLTQuery:+() -id:IW-02}, SOLR1000:{ rawMLTQuery:, boostedMLTQuery:, realMLTQuery:+() -id:SOLR1000}, F8V7067-APL-KIT:{ rawMLTQuery:, boostedMLTQuery:, realMLTQuery:+() -id:F8V7067-APL-KIT}, MA147LL/A:{ rawMLTQuery:features:2 features:0 features:lcd features:x features:3, boostedMLTQuery:features:2 features:0 features:lcd features:x features:3, realMLTQuery:+(features:2 features:0 features:lcd features:x features:3) -id:MA147LL/A}}, } {code} moreLikeThis Degug -- Key: SOLR-860 URL: https://issues.apache.org/jira/browse/SOLR-860 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Environment: Gentoo Linux, Solr 1.4, tomcat webserver Reporter: Jeff Assignee: Koji Sekiguchi Fix For: 1.5 Attachments: SOLR-860.patch moreLikeThis searchcomponent currently has no way to debug or see information on the process. This means that if moreLikeThis suggests another document there is no way to actually view why it picked that to hone the searching. Adding an explain would be extremely useful in determining the reasons why solr is recommending the items. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-860) moreLikeThis Degug
[ https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-860: Component/s: (was: search) SearchComponents - other Priority: Minor (was: Major) Fix Version/s: (was: 1.5) 3.1 moreLikeThis Degug -- Key: SOLR-860 URL: https://issues.apache.org/jira/browse/SOLR-860 Project: Solr Issue Type: New Feature Components: SearchComponents - other Affects Versions: 1.3 Environment: Gentoo Linux, Solr 1.4, tomcat webserver Reporter: Jeff Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1 Attachments: SOLR-860.patch moreLikeThis searchcomponent currently has no way to debug or see information on the process. This means that if moreLikeThis suggests another document there is no way to actually view why it picked that to hone the searching. Adding an explain would be extremely useful in determining the reasons why solr is recommending the items. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-860) moreLikeThis Degug
[ https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-860: --- Assignee: Koji Sekiguchi moreLikeThis Degug -- Key: SOLR-860 URL: https://issues.apache.org/jira/browse/SOLR-860 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Environment: Gentoo Linux, Solr 1.4, tomcat webserver Reporter: Jeff Assignee: Koji Sekiguchi Fix For: 1.5 moreLikeThis searchcomponent currently has no way to debug or see information on the process. This means that if moreLikeThis suggests another document there is no way to actually view why it picked that to hone the searching. Adding an explain would be extremely useful in determining the reasons why solr is recommending the items. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-860) moreLikeThis Degug
[ https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851072#action_12851072 ] Koji Sekiguchi commented on SOLR-860: - At minimum, I'd like to see how the BooleanQuery constructed by mlt look like. Can ResponseBuilder.addDebugInfo() be used for it? moreLikeThis Degug -- Key: SOLR-860 URL: https://issues.apache.org/jira/browse/SOLR-860 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Environment: Gentoo Linux, Solr 1.4, tomcat webserver Reporter: Jeff Assignee: Koji Sekiguchi Fix For: 1.5 moreLikeThis searchcomponent currently has no way to debug or see information on the process. This means that if moreLikeThis suggests another document there is no way to actually view why it picked that to hone the searching. Adding an explain would be extremely useful in determining the reasons why solr is recommending the items. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1703) Sorting by function problems on multicore (more than one core)
[ https://issues.apache.org/jira/browse/SOLR-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1703: - Description: When using sort by function (for example dist function) with multicore with more than one core (on multicore with one core, ie. the example deployment the problem doesn`t exist) there is a problem with not using the right schema. I think there is a problem with this portion of code: QueryParsing.java: {code} public static FunctionQuery parseFunction(String func, IndexSchema schema) throws ParseException { SolrCore core = SolrCore.getSolrCore(); return (FunctionQuery) (QParser.getParser(func, func, new LocalSolrQueryRequest(core, new HashMap())).parse()); // return new FunctionQuery(parseValSource(new StrParser(func), schema)); } {code} Code above uses deprecated method to get the core sometimes getting the wrong core effecting in impossibility to find the right fields in index. was: When using sort by function (for example dist function) with multicore with more than one core (on multicore with one core, ie. the example deployment the problem doesn`t exist) there is a problem with not using the right schema. I think there is a problem with this portion of code: QueryParsing.java: public static FunctionQuery parseFunction(String func, IndexSchema schema) throws ParseException { SolrCore core = SolrCore.getSolrCore(); return (FunctionQuery) (QParser.getParser(func, func, new LocalSolrQueryRequest(core, new HashMap())).parse()); // return new FunctionQuery(parseValSource(new StrParser(func), schema)); } Code above uses deprecated method to get the core sometimes getting the wrong core effecting in impossibility to find the right fields in index. Sorting by function problems on multicore (more than one core) -- Key: SOLR-1703 URL: https://issues.apache.org/jira/browse/SOLR-1703 Project: Solr Issue Type: Bug Components: multicore, search Affects Versions: 1.5 Environment: Linux (debian, ubuntu), 64bits Reporter: Rafał Kuć When using sort by function (for example dist function) with multicore with more than one core (on multicore with one core, ie. the example deployment the problem doesn`t exist) there is a problem with not using the right schema. I think there is a problem with this portion of code: QueryParsing.java: {code} public static FunctionQuery parseFunction(String func, IndexSchema schema) throws ParseException { SolrCore core = SolrCore.getSolrCore(); return (FunctionQuery) (QParser.getParser(func, func, new LocalSolrQueryRequest(core, new HashMap())).parse()); // return new FunctionQuery(parseValSource(new StrParser(func), schema)); } {code} Code above uses deprecated method to get the core sometimes getting the wrong core effecting in impossibility to find the right fields in index. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: intersection of the results of multiple queries
Seffie Schwartz wrote: hi - Is there any way to get the intersection of the results of multiple queries without iterating through each result set? seff How about using multiple fq parameters? Koji -- http://www.rondhuit.com/en/
[jira] Updated: (SOLR-1297) Enable sorting by Function Query
[ https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1297: - Attachment: SOLR-1297-2.patch When I set *bit* complex function to sort parameter, I got the error: {panel} Must declare sort field or function org.apache.solr.common.SolrException: Must declare sort field or function at org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376) at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281) at org.apache.solr.search.QueryParsingTest.testSort(QueryParsingTest.java:105) {panel} Attached the fix and the test case. Enable sorting by Function Query Key: SOLR-1297 URL: https://issues.apache.org/jira/browse/SOLR-1297 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 1.5 Attachments: SOLR-1297-2.patch, SOLR-1297.patch It would be nice if one could sort by FunctionQuery. See also SOLR-773, where this was first mentioned by Yonik as part of the generic solution to geo-search -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839879#action_12839879 ] Koji Sekiguchi commented on SOLR-1268: -- bq. When using Dismax, the fast vector highlighter fails to return any highlighting when there is more than one column in qf (eg. qf=Name Company)... Right. See https://issues.apache.org/jira/browse/LUCENE-2243 . Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833527#action_12833527 ] Koji Sekiguchi commented on SOLR-1773: -- Oops, I've glanced at SOLR-236 related issues, but I wasn't awake to the existence. I'll look into SOLR-1682. Thanks! :) Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: LOADTEST.patch, SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833527#action_12833527 ] Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 8:19 AM: --- Oops, I've glanced at SOLR-236 related issues, but I thought it was for finalize response format from Description. I'll look into SOLR-1682. Thanks! :) was (Author: koji): Oops, I've glanced at SOLR-236 related issues, but I wasn't awake to the existence. I'll look into SOLR-1682. Thanks! :) Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: LOADTEST.patch, SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Clover 2.6.3
Why don't we move to Clover 2.6.3? Index: build.xml === --- build.xml (revision 909743) +++ build.xml (working copy) @@ -429,7 +429,7 @@ description=Instrument the Unit tests using Clover. Requires a Clover license and clover.jar in the ANT classpath. To use, specify -Drun.clover=true on the command line./ target name=clover.setup if=clover.enabled - taskdef resource=clovertasks/ + taskdef resource=cloverlib.xml/ mkdir dir=${clover.db.dir}/ clover-setup initString=${clover.db.dir}/solr_coverage.db fileset dir=src/common/ Koji -- http://www.rondhuit.com/en/
[jira] Updated: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1773: - Attachment: SOLR-1773.patch The first draft, untested patch. Use for PoC only. In this patch, I hard-coded sort field by using java.util.Comparator. Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495 ] Koji Sekiguchi commented on SOLR-1773: -- Random comment on the patch: - TimeAllowed not supported - cache not supported - distributed search is not supported - sort field is hard-coded in the patch - collapse.type=adjacent is not supported - collapse.aggregate is not supported (but supportable) - not yet, but collapse.sort can be supported supported parameters: |collapse|set to on to use field collapsing| |collapse.field|field name to collapse (required)| |collapse.limit|maximum number of collapsed docs to return in each collapse group| |collapse.fl|comma- or space- delimited list of fields to return| Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1773: - Attachment: LOADTEST.patch A very rough/simple load test patch attached. QTime average of 1,000 times random queries were: ||num docs in index||SOLR-236||SOLR-1773|| |1M|321 ms|185ms| |10M|2,914 ms (*)|1,642 ms| (*) I needed to set -Xmx1024m in this case, though 512m for other cases, to avoid OOM. SOLR-1773 is 43% faster. Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: LOADTEST.patch, SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495 ] Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 4:51 AM: --- Random comment on the patch: - TimeAllowed not supported - cache not supported - distributed search is not supported - sort field is hard-coded in the patch - collapse.type=adjacent is not supported - collapse.aggregate is not supported (but supportable) - not yet, but collapse.sort can be supported to specify sort criteria in collapse group supported parameters: |collapse|set to on to use field collapsing| |collapse.field|field name to collapse (required)| |collapse.limit|maximum number of collapsed docs to return in each collapse group| |collapse.fl|comma- or space- delimited list of fields to return| was (Author: koji): Random comment on the patch: - TimeAllowed not supported - cache not supported - distributed search is not supported - sort field is hard-coded in the patch - collapse.type=adjacent is not supported - collapse.aggregate is not supported (but supportable) - not yet, but collapse.sort can be supported supported parameters: |collapse|set to on to use field collapsing| |collapse.field|field name to collapse (required)| |collapse.limit|maximum number of collapsed docs to return in each collapse group| |collapse.fl|comma- or space- delimited list of fields to return| Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: LOADTEST.patch, SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)
[ https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495 ] Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 4:54 AM: --- Random comment on the patch: - TimeAllowed not supported - cache not supported - distributed search is not supported - sort field is hard-coded in the patch - collapse.type=adjacent is not supported - collapse.aggregate is not supported (but supportable) - not yet, but collapse.sort can be supported to specify sort criteria in collapse group supported parameters: |collapse|set to on to use field collapsing| |collapse.field|field name to collapse (required)| |collapse.limit|maximum number of collapsed docs to return in each collapse group. default is 0.| |collapse.fl|comma- or space- delimited list of fields to return. multiValued field and TrieField are not supported yet| was (Author: koji): Random comment on the patch: - TimeAllowed not supported - cache not supported - distributed search is not supported - sort field is hard-coded in the patch - collapse.type=adjacent is not supported - collapse.aggregate is not supported (but supportable) - not yet, but collapse.sort can be supported to specify sort criteria in collapse group supported parameters: |collapse|set to on to use field collapsing| |collapse.field|field name to collapse (required)| |collapse.limit|maximum number of collapsed docs to return in each collapse group| |collapse.fl|comma- or space- delimited list of fields to return| Field Collapsing (lightweight version) -- Key: SOLR-1773 URL: https://issues.apache.org/jira/browse/SOLR-1773 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Attachments: LOADTEST.patch, SOLR-1773.patch I'd like to start another approach for field collapsing suggested by Yonik on 19/Dec/09 at SOLR-236. Re-posting the idea: {code} === two pass collapsing algorithm for collapse.aggregate=max First pass: pretend that collapseCount=1 - Use a TreeSet as a priority queue since one can remove and insert entries. - A HashMapKey,TreeSetEntry will be used to map from collapse group to top entry in the TreeSet - compare new doc with smallest element in treeset. If smaller discard and go to the next doc. - If new doc is bigger, look up it's group. Use the Map to find if the group has been added to the TreeSet and add it if not. - If the new bigger doc is already in the TreeSet, compare with the document in that group. If bigger, update the node, remove and re-add to the TreeSet to re-sort. efficiency: the treeset and hashmap are both only the size of the top number of docs we are looking at (10 for instance) We will now have the top 10 documents collapsed by the right field with a collapseCount of 1. Put another way, we have the top 10 groups. Second pass (if collapseCount1): - create a priority queue for each group (10) of size collapseCount - re-execute the query (or if the sort within the collapse groups does not involve score, we could just use the docids gathered during phase 1) - for each document, find it's appropriate priority queue and insert - optimization: we can use the previous info from phase1 to even avoid creating a priority queue if no other items matched. So instead of creating collapse groups for every group in the set (as is done now?), we create it for only 10 groups. Instead of collecting the score for every document in the set (40MB per request for a 10M doc index is *big*) we re-execute the query if needed. We could optionally store the score as is done now... but I bet aggregate throughput on large indexes would be better by just re-executing. Other thought: we could also cache the first phase in the query cache which would allow one to quickly move to the 2nd phase for any collapseCount. {code} The restriction is: {quote} one would not be able to tell the total number of collapsed docs, or the total number of hits (or the DocSet) after collapsing. So only collapse.facet=before would be supported. {quote} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1268: - Attachment: SOLR-1268.patch The patch includes: # eliminate hl.useHighlighter parameter # introduce hl.useFastVectorHighlighter parameter. The default is false Therefore, Highlighter will be used unless hl.useFastVectorHighlighter set to true. I'll commit in a few days. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829522#action_12829522 ] Koji Sekiguchi commented on SOLR-236: - The following snippet in CollapseComponent.doProcess(): {code} DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(), collapseResult == null ? rb.getFilters() : null, collapseResult.getCollapsedDocset(), rb.getSortSpec().getSort(), rb.getSortSpec().getOffset(), rb.getSortSpec().getCount(), rb.getFieldFlags()); {code} 2nd line implies that collapseResult may be null. If it is null, we got NPE at 3rd line? Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search
[ https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1753: - Affects Version/s: (was: 1.5) Fix Version/s: 1.5 StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search - Key: SOLR-1753 URL: https://issues.apache.org/jira/browse/SOLR-1753 Project: Solr Issue Type: Bug Affects Versions: 1.4 Environment: Windows Reporter: Janne Majaranta Assignee: Koji Sekiguchi Fix For: 1.5 Attachments: SOLR-1753.patch When using the StatsComponent with a sharded request and getting statistics over facets, a NullPointerException is thrown. Stacktrace: java.lang.NullPointerException at org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) at org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) at org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search
[ https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829914#action_12829914 ] Koji Sekiguchi commented on SOLR-1753: -- Patch looks good! Will commit shortly. StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search - Key: SOLR-1753 URL: https://issues.apache.org/jira/browse/SOLR-1753 Project: Solr Issue Type: Bug Affects Versions: 1.4 Environment: Windows Reporter: Janne Majaranta Assignee: Koji Sekiguchi Fix For: 1.5 Attachments: SOLR-1753.patch When using the StatsComponent with a sharded request and getting statistics over facets, a NullPointerException is thrown. Stacktrace: java.lang.NullPointerException at org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) at org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) at org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search
[ https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1753. -- Resolution: Fixed Committed revision 906781. Thanks Janne! StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search - Key: SOLR-1753 URL: https://issues.apache.org/jira/browse/SOLR-1753 Project: Solr Issue Type: Bug Affects Versions: 1.4 Environment: Windows Reporter: Janne Majaranta Assignee: Koji Sekiguchi Fix For: 1.5 Attachments: SOLR-1753.patch When using the StatsComponent with a sharded request and getting statistics over facets, a NullPointerException is thrown. Stacktrace: java.lang.NullPointerException at org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) at org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) at org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Unknown Source) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1268: - Attachment: SOLR-1268-0_fragsize.patch Hmm, FVH doesn't work appropriately when fragsize=Integer.MAX_SIZE (see test0FragSize() in attached patch. It indicates FVH cannot produce whole snippet when fragsize=Integer.MAX_SIZE). Now I think I should change the (traditional) Highlighter is default even if the highlighting field's termVectors/termPositions/termOffsets are all true, then only when hl.useFastVectorHighlighter is set to true, FVH will be used. hl.useFastVectorHighlighter parameter accepts per-field overrides. Plus FVH doesn't support 0 fragsize. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828039#action_12828039 ] Koji Sekiguchi commented on SOLR-236: - A random comment, don't we need to check collapse.field is indexed in checkCollapseField()? {code} protected void checkCollapseField(IndexSchema schema) { SchemaField schemaField = schema.getFieldOrNull(collapseField); if (schemaField == null) { throw new RuntimeException(Could not collapse, because collapse field does not exist in the schema.); } if (schemaField.multiValued()) { throw new RuntimeException(Could not collapse, because collapse field is multivalued); } if (schemaField.getType().isTokenized()) { throw new RuntimeException(Could not collapse, because collapse field is tokenized); } } {code} I accidentally specified an unindexed field for collapse.field, I got unexpected result without any errors. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: configure FastVectorHihglighter in trunk
Marc Sturlese wrote: I think it fails when using defType dismax with more than one field. In the default Solr example doesn't work eighter. I have added the default .xml files with docs and using standard requestHandler it works. It doesn't when using the dismax requestHandler Ah, I see. FVH doesn't support DisjunctionMaxQuery. I should be awake to it when you indicated that you used dismax at the previous mail. Sorry about that. I'll open an issue in Lucene and try to write a patch. Thank you, Koji -- http://www.rondhuit.com/en/
Re: configure FastVectorHihglighter in trunk
Koji Sekiguchi wrote: Marc Sturlese wrote: I think it fails when using defType dismax with more than one field. In the default Solr example doesn't work eighter. I have added the default .xml files with docs and using standard requestHandler it works. It doesn't when using the dismax requestHandler Ah, I see. FVH doesn't support DisjunctionMaxQuery. I should be awake to it when you indicated that you used dismax at the previous mail. Sorry about that. I'll open an issue in Lucene and try to write a patch. Thank you, Koji Opened: https://issues.apache.org/jira/browse/LUCENE-2243 Koji -- http://www.rondhuit.com/en/
[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1268: - Attachment: SOLR-1268-0_fragsize.patch {quote} I have noticed an exception is thrown when using fragSize = 0 (wich should return the whole field highlighted): fragCharSize(0) is too small. It must be 18 or higher. java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 or higher {quote} Thanks, Marc. Solr 1.4 uses NullFragmenter that highlights whole content when you set fragsize to 0. But FVH doesn't have such feature because of using different algorithm. In the attached patch, Solr sets fragsize to Integer.MAX_VALUE if user trys to set 0 when FVH is used. This prevents runtime error. I think it is necessary in Solr level because Solr automatically switch to use FVH when the highlighting field is termVectors/termPositions/termOffsets are all true unless hl.useHighlighter set to true. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: configure FastVectorHihglighter in trunk
Marc Sturlese wrote: Can you give me the following info to reproduce the problem? * field data all fields are plain english text analyzed with the same analyzer I meant I'd like to know your concrete data... Koji -- http://www.rondhuit.com/en/
Re: configure FastVectorHihglighter in trunk
Can you give me the following info to reproduce the problem? * field data * query string * field definition in schema.xml **I also have noticed that using snippet fragment size to 0 (wich in normal highlight returns the whole field highlighted) gives an error. Hmm, I should check it. Can you open a JIRA issue? Thank you, Koji -- http://www.rondhuit.com/en/ Marc Sturlese wrote: I am having some trouble to make it work. I am debuging the code and I see when de FastVectorHighlighter constructor is created, the parameters that it recieves are ok // get FastVectorHighlighter instance out of the processing loop FastVectorHighlighter fvh = new FastVectorHighlighter( // FVH cannot process hl.usePhraseHighlighter parameter per-field basis params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ), // FVH cannot process hl.requireFieldMatch parameter per-field basis params.getBool( HighlightParams.FIELD_MATCH, false ), getFragListBuilder( params ), getFragmentsBuilder( params ) ); The query here is ok aswell: FieldQuery fieldQuery = fvh.getFieldQuery( query ); But I can't see what's in fieldQuery (just a memory path and don't know to do someting similar to toString()) The problem I see is in: String[] snippets = highlighter.getBestFragments( fieldQuery, req.getSearcher().getReader(), docId, fieldName, params.getFieldInt( fieldName, HighlightParams.FRAGSIZE, 100 ), params.getFieldInt( fieldName, HighlightParams.SNIPPETS, 1 ) ); snippets ends up with an empty array so it jumps to: alternateField( docSummaries, params, doc, fieldName ); In solrconfig.xml I added: fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=false/ fragmentsBuilder name=colored class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder default=false/ Maybe I am missing something... any idea? Using the doHighlightingByHighlighter highlight works perfect. **I also have noticed that using snippet fragment size to 0 (wich in normal highlight returns the whole field highlighted) gives an error. Koji Sekiguchi-2 wrote: Marc Sturlese wrote: How do I activate FastVectorHighlighter in trunk? Wich of those params sets it up? !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ !-- Configure the standard fragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder default=true/ fragmentsBuilder name=scoreOrder class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder default=true/ Thanks in advance. You do not need to activate it. DefaultSolrHighlighter, which is the default SolrHighlighter impl, calls automatically uses FVH when you specify field names that are termVectors, termPositions and termOffsets are true through hl.fl parameter. If you want to use multi colored tag feature, you need to specify MultiColored*FragmentsBuilder in solrconfig.xml. Koji -- http://www.rondhuit.com/en/
Re: configure FastVectorHihglighter in trunk
Marc Sturlese wrote: How do I activate FastVectorHighlighter in trunk? Wich of those params sets it up? !-- Configure the standard fragListBuilder -- fragListBuilder name=simple class=org.apache.solr.highlight.SimpleFragListBuilder default=true/ !-- Configure the standard fragmentsBuilder -- fragmentsBuilder name=colored class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder default=true/ fragmentsBuilder name=scoreOrder class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder default=true/ Thanks in advance. You do not need to activate it. DefaultSolrHighlighter, which is the default SolrHighlighter impl, calls automatically uses FVH when you specify field names that are termVectors, termPositions and termOffsets are true through hl.fl parameter. If you want to use multi colored tag feature, you need to specify MultiColored*FragmentsBuilder in solrconfig.xml. Koji -- http://www.rondhuit.com/en/
Re: how to sort facets?
David Rühr wrote: hi, we make a Filter with Faceting feature. In our faceting list the order is by count by the matches: facet.sort=count but we need to sort by = facet.sort=manufacturer. Url manipulation doesn't change anything, why? select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 so long, David Try facet.sort=index. facet.sort accepts only count or index. http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort Koji -- http://www.rondhuit.com/en/
[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting
[ https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804087#action_12804087 ] Koji Sekiguchi commented on SOLR-1731: -- So why don't you uni-gram on both index and query for sku field? {code} fieldType name=text_1g class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ tokenizer class=solr.NGramTokenizerFactory minGramSize=1 maxGramSize=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.NGramTokenizerFactory minGramSize=1 maxGramSize=1/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} {quote} As far as my application cares, those are all equivalent and should just be indexed as: a1280c {quote} To eliminate space/period/hyphen, mapping.txt would look like: {code} = . = - = {code} ArrayIndexOutOfBoundsException when highlighting Key: SOLR-1731 URL: https://issues.apache.org/jira/browse/SOLR-1731 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.4 Reporter: Tim Underwood Priority: Minor I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to highlight for certain queries. The error seems to be an issue with the combination of the ShingleFilterFactory, PositionFilterFactory and the LengthFilterFactory. Here's my fieldType definition: fieldType name=textSku class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.LengthFilterFactory min=2 max=100/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.ShingleFilterFactory maxShingleSize=8 outputUnigrams=true/ filter class=solr.PositionFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.LengthFilterFactory min=2 max=100/ !-- works if this is commented out -- /analyzer /fieldType Here's the field definition: field name=sku_new type=textSku indexed=true stored=true omitNorms=true/ Here's a sample doc: add doc field name=id1/field field name=sku_newA 1280 C/field /doc /add Doing a query for sku_new:A 1280 C and requesting highlighting throws the exception (full stack trace below): http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=* If I comment out the LengthFilterFactory from my query analyzer section everything seems to work. Commenting out just the PositionFilterFactory also makes the exception go away and seems to work for this specific query. Full stack trace: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216
[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting
[ https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803976#action_12803976 ] Koji Sekiguchi commented on SOLR-1731: -- Can't you use WhitespaceTokenizer for index? ArrayIndexOutOfBoundsException when highlighting Key: SOLR-1731 URL: https://issues.apache.org/jira/browse/SOLR-1731 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.4 Reporter: Tim Underwood Priority: Minor I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to highlight for certain queries. The error seems to be an issue with the combination of the ShingleFilterFactory, PositionFilterFactory and the LengthFilterFactory. Here's my fieldType definition: fieldType name=textSku class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.LengthFilterFactory min=2 max=100/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.ShingleFilterFactory maxShingleSize=8 outputUnigrams=true/ filter class=solr.PositionFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ filter class=solr.LengthFilterFactory min=2 max=100/ !-- works if this is commented out -- /analyzer /fieldType Here's the field definition: field name=sku_new type=textSku indexed=true stored=true omitNorms=true/ Here's a sample doc: add doc field name=id1/field field name=sku_newA 1280 C/field /doc /add Doing a query for sku_new:A 1280 C and requesting highlighting throws the exception (full stack trace below): http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=* If I comment out the LengthFilterFactory from my query analyzer section everything seems to work. Commenting out just the PositionFilterFactory also makes the exception go away and seems to work for this specific query. Full stack trace: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821
Upgrading Lucene jars
I'd like to upgrade all Lucene jars to the latest 2.9 branch (r900222). If there is no objections, I'll commit tomorrow. Now I'm testing Lucene 2.9 branch and Solr trunk with latest 2.9. Thank you, Koji -- http://www.rondhuit.com/en/
Re: Build failed in Hudson: Solr-trunk #1027
http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/testReport/org.apache.solr.request/TestWriterPerf/testPerf/ The cause of this failure is undefined field t1 is set to hl.fl in the test code. Before FastVectorHighlighter committed, it seems undefined fields are ignored. I think I should ignore them in FVH, too. I'm look into it... Koji -- http://www.rondhuit.com/en/ Apache Hudson Server wrote: See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/ -- [...truncated 2343 lines...] [junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.49 sec [junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.574 sec [junit] Running org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.977 sec [junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.506 sec [junit] Running org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.618 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 17.669 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest [junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 33.972 sec [junit] Running org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest [junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 39.944 sec [junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.917 sec [junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.375 sec [junit] Running org.apache.solr.client.solrj.response.AnlysisResponseBaseTest [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.43 sec [junit] Running org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.488 sec [junit] Running org.apache.solr.client.solrj.response.FieldAnalysisResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.507 sec [junit] Running org.apache.solr.client.solrj.response.QueryResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.768 sec [junit] Running org.apache.solr.client.solrj.response.TermsResponseTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.705 sec [junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.645 sec [junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.408 sec [junit] Running org.apache.solr.common.SolrDocumentTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.549 sec [junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.48 sec [junit] Running org.apache.solr.common.params.SolrParamTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.431 sec [junit] Running org.apache.solr.common.util.ContentStreamTest [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.682 sec [junit] Running org.apache.solr.common.util.DOMUtilTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.565 sec [junit] Running org.apache.solr.common.util.FileUtilsTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.433 sec [junit] Running org.apache.solr.common.util.IteratorChainTest [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.446 sec [junit] Running org.apache.solr.common.util.NamedListTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.396 sec [junit] Running org.apache.solr.common.util.TestFastInputStream [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.547 sec [junit] Running org.apache.solr.common.util.TestHash [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.698 sec [junit] Running org.apache.solr.common.util.TestNamedListCodec [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.891 sec [junit] Running org.apache.solr.common.util.TestXMLEscaping [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.381 sec [junit] Running org.apache.solr.core.AlternateDirectoryTest [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.552 sec [junit]
Re: Build failed in Hudson: Solr-trunk #1027
Koji Sekiguchi wrote: http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/testReport/org.apache.solr.request/TestWriterPerf/testPerf/ The cause of this failure is undefined field t1 is set to hl.fl in the test code. Before FastVectorHighlighter committed, it seems undefined fields are ignored. I think I should ignore them in FVH, too. I'm look into it... Koji Committed revision 897611. Koji -- http://www.rondhuit.com/en/
[jira] Updated: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1696: - Attachment: SOLR-1696.patch A new patch attached. Just to sync with trunk plus warning log when deprecated syntax is found (the idea Chris mentioned above). Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1696.patch, SOLR-1696.patch There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798271#action_12798271 ] Koji Sekiguchi commented on SOLR-1653: -- Thanks, Paul! I've just committed revision 897357. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch, SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1268. -- Resolution: Fixed Committed revision 897383. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798312#action_12798312 ] Koji Sekiguchi commented on SOLR-1696: -- I've just committed SOLR-1268. Now I'm trying to contribute a patch for this to sync with trunk... Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1696.patch There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent
[ https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797841#action_12797841 ] Koji Sekiguchi commented on SOLR-1696: -- Noble, thank you for opening this and attaching the patch! Are you planning to commit this shortly? because I'm ready to commit SOLR-1268 that is using old style config. If you commit it, I'll rewrite SOLR-1268. Or I can assign SOLR-1696 to me. Deprecate old highlighting syntax and move configuration to HighlightComponent Key: SOLR-1696 URL: https://issues.apache.org/jira/browse/SOLR-1696 Project: Solr Issue Type: Improvement Components: highlighter Reporter: Noble Paul Fix For: 1.5 Attachments: SOLR-1696.patch There is no reason why we should have a custom syntax for highlighter configuration. It can be treated like any other SearchComponent and all the configuration can go in there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796147#action_12796147 ] Koji Sekiguchi commented on SOLR-1268: -- I'll commit in a few days if nobody objects. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796075#action_12796075 ] Koji Sekiguchi commented on SOLR-1268: -- I'm introducing fragListBuilder/ and fragmentsBuilder/ new sub tags of highlighting/ in solrconfig.xml in this patch, rather than searchComponent/. I think we can open a separate ticket for moving highlighting/ settings to searchComponent/, if needed. FYI: http://old.nabble.com/highlighting-setting-in-solrconfig.xml-td26984003.html Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268.patch, SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1268: - Attachment: SOLR-1268.patch First draft, untested patch attached. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1268.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug
[ https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792920#action_12792920 ] Koji Sekiguchi commented on SOLR-1670: -- bq. the test for 'repeats' has a flaw, it uses this assertTokEqual construct which does not really validate that two lists of token are equal, it just stops at the shorted one. I agree with you regarding this part. But I'm not sure that the following size() should be 1 in your patch: {code} +assertEquals(1, getTokList(map,a b,false).size()); {code} If what repeats implies is repeating same term intentionally, I think it can boost tf. synonymfilter/map repeat bug Key: SOLR-1670 URL: https://issues.apache.org/jira/browse/SOLR-1670 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Attachments: SOLR-1670_test.patch as part of converting tests for SOLR-1657, I ran into a problem with synonymfilter the test for 'repeats' has a flaw, it uses this assertTokEqual construct which does not really validate that two lists of token are equal, it just stops at the shorted one. {code} // repeats map.add(strings(a b), tokens(ab), orig, merge); map.add(strings(a b), tokens(ab), orig, merge); assertTokEqual(getTokList(map,a b,false), tokens(ab)); /* in reality the result from getTokList is ab ab ab! */ {code} when converted to assertTokenStreamContents this problem surfaced. attached is an additional assertion to the existing testcase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug
[ https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792928#action_12792928 ] Koji Sekiguchi commented on SOLR-1670: -- Robert, sorry, I wanted to say I agree with you regarding the test for 'repeats' has a flaw. Then boost TF was just an input, though I don't know it is intentional feature or side effect. Why don't you fix the flaws in SynonymFilter test in this ticket first, then fix SOLR-1674? (I've not looked into SOLR-1674 yet.) synonymfilter/map repeat bug Key: SOLR-1670 URL: https://issues.apache.org/jira/browse/SOLR-1670 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Robert Muir Attachments: SOLR-1670_test.patch as part of converting tests for SOLR-1657, I ran into a problem with synonymfilter the test for 'repeats' has a flaw, it uses this assertTokEqual construct which does not really validate that two lists of token are equal, it just stops at the shorted one. {code} // repeats map.add(strings(a b), tokens(ab), orig, merge); map.add(strings(a b), tokens(ab), orig, merge); assertTokEqual(getTokList(map,a b,false), tokens(ab)); /* in reality the result from getTokList is ab ab ab! */ {code} when converted to assertTokenStreamContents this problem surfaced. attached is an additional assertion to the existing testcase. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1653. -- Resolution: Fixed Committed revision 890798. Thanks Shalin and Noble for taking time to review the patch. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch, SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056 ] Koji Sekiguchi commented on SOLR-1653: -- Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order of the groups| add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056 ] Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:27 AM: Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the order of the groups| was (Author: koji): Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order of the groups| add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056 ] Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:29 AM: Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order of the groups| was (Author: koji): Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order of the groups| add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056 ] Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:28 AM: Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order of the groups| was (Author: koji): Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the order of the groups| add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056 ] Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:30 AM: Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc=1234=5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order of the groups| was (Author: koji): Ok. I'll show you same samples ;-) ||INPUT||groupedPattern||replaceGroups||OUTPUT||comment|| |see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word| |see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be omitted| |No.1 NO. no. 543|[nN][oO]\.\s*(\d+)|{#},1|#1 NO. #543|sample for literal. do not forget to set blockDelimiters other than period when you use period in groupedPattern| |abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order of the groups| add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790127#action_12790127 ] Koji Sekiguchi commented on SOLR-1653: -- bq. I guess this can be achieved with the matcher#replaceAll() directly You're right if we don't correct offset of the output char stream. I need to process one match at a time. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1653: - Attachment: SOLR-1653.patch Excuse myself, because I tried to correct offset per group in a match when I started the first patch, I introduced my own syntax. But, yes, now I've implemented the offset correction per match, so I can use standard syntax. Here is the new patch. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory pattern=([nN][oO]\.)\s*(\d+) replaceWith=$1$2/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} If there is no objections, I'll commit later today. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch, SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790572#action_12790572 ] Koji Sekiguchi commented on SOLR-1653: -- I see that existing PatternReplaceFilter (not CharFilter) is using pattern. But it uses replacement, not replaceWith. I think I use pattern and replacement. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch, SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1653) add PatternReplaceCharFilter
add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Fix For: 1.5 Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1653: - Attachment: SOLR-1653.patch add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-1653: Assignee: Koji Sekiguchi add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter
[ https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789957#action_12789957 ] Koji Sekiguchi commented on SOLR-1653: -- I'll commit in a few days. add PatternReplaceCharFilter Key: SOLR-1653 URL: https://issues.apache.org/jira/browse/SOLR-1653 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 Attachments: SOLR-1653.patch Add a new CharFilter that uses a regular expression for the target of replace string in char stream. Usage: {code:title=schema.xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.PatternReplaceCharFilterFactory groupedPattern=([nN][oO]\.)\s*(\d+) replaceGroups=1,2 blockDelimiters=:;/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Upgrading Lucene jars
Shalin Shekhar Mangar wrote: I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead and upgrade all Lucene jars to the latest 2.9 branch code? +1. Koji -- http://www.rondhuit.com/en/
[jira] Commented: (SOLR-1606) Integrate Near Realtime
[ https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786448#action_12786448 ] Koji Sekiguchi commented on SOLR-1606: -- Jason, I got a failure when running TestRefreshReader. Integrate Near Realtime Key: SOLR-1606 URL: https://issues.apache.org/jira/browse/SOLR-1606 Project: Solr Issue Type: Improvement Components: update Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: SOLR-1606.patch We'll integrate IndexWriter.getReader. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1607) use a proper key other than IndexReader for ExternalFileField and QueryElevationCompenent to work properly when reopenReaders is set to true
use a proper key other than IndexReader for ExternalFileField and QueryElevationCompenent to work properly when reopenReaders is set to true Key: SOLR-1607 URL: https://issues.apache.org/jira/browse/SOLR-1607 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 As introducing reopenReaders feature in 1.4, this prevent reload external_[fieldname] and elevate.xml files in dataDir when commit is submitted. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)
[ https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1489: - Attachment: SOLR-1489.patch Attached patch fixes the above failure, but I got another failure (no expires header): {code} Testcase: testCacheVetoHandler took 3.29 sec Testcase: testCacheVetoException took 1.395 sec FAILED We got no Expires header junit.framework.AssertionFailedError: We got no Expires header at org.apache.solr.servlet.CacheHeaderTest.checkVetoHeaders(CacheHeaderTest.java:73) at org.apache.solr.servlet.CacheHeaderTest.testCacheVetoException(CacheHeaderTest.java:59) Testcase: testLastModified took 1.485 sec Testcase: testEtag took 1.577 sec Testcase: testCacheControl took 1.035 sec {code} A UTF-8 character is output twice (Bug in Jetty) Key: SOLR-1489 URL: https://issues.apache.org/jira/browse/SOLR-1489 Project: Solr Issue Type: Bug Environment: Jetty-6.1.3 Jetty-6.1.21 Jetty-7.0.0RC6 Reporter: Jun Ohtani Assignee: Koji Sekiguchi Priority: Critical Attachments: error_utf8-example.xml, jetty-6.1.22.jar, jetty-util-6.1.22.jar, jettybugsample.war, jsp-2.1.zip, servlet-api-2.5-20081211.jar, SOLR-1489.patch A UTF-8 character is output twice under particular conditions. Attach the sample data.(error_utf8-example.xml) Registered only sample data, click the following URL. http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json Sample data is only B, but response is BB. When wt=phps, error occurs in PHP unsrialize() function. This bug is like a bug in Jetty. jettybugsample.war is the simplest one to reproduce the problem. Copy example/webapps, and start Jetty server, and click the following URL. http://localhost:8983/jettybugsample/filter/hoge Like earlier, B is output twice. Sysout only B once. I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6. (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in web.xml. ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1601) Schema browser does not indicate presence of charFilter
[ https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1601: - Component/s: Schema and Analysis Affects Version/s: 1.4 Fix Version/s: 1.5 Assignee: Koji Sekiguchi Schema browser does not indicate presence of charFilter --- Key: SOLR-1601 URL: https://issues.apache.org/jira/browse/SOLR-1601 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Jake Brownell Assignee: Koji Sekiguchi Priority: Trivial Fix For: 1.5 My schema has a field defined as: {noformat} fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType {noformat} and when I view the field in the schema browser, I see: {noformat} Tokenized: true Class Name: org.apache.solr.schema.TextField Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} {noformat} It's not a big deal, but I expected to see some indication of the charFilter that is in place. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1601) Schema browser does not indicate presence of charFilter
[ https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1601: - Attachment: SOLR-1601.patch Will commit shortly. Schema browser does not indicate presence of charFilter --- Key: SOLR-1601 URL: https://issues.apache.org/jira/browse/SOLR-1601 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Jake Brownell Assignee: Koji Sekiguchi Priority: Trivial Fix For: 1.5 Attachments: SOLR-1601.patch My schema has a field defined as: {noformat} fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType {noformat} and when I view the field in the schema browser, I see: {noformat} Tokenized: true Class Name: org.apache.solr.schema.TextField Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} {noformat} It's not a big deal, but I expected to see some indication of the charFilter that is in place. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1601) Schema browser does not indicate presence of charFilter
[ https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1601. -- Resolution: Fixed Committed revision 884180. Thanks, Jake. Schema browser does not indicate presence of charFilter --- Key: SOLR-1601 URL: https://issues.apache.org/jira/browse/SOLR-1601 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 1.4 Reporter: Jake Brownell Assignee: Koji Sekiguchi Priority: Trivial Fix For: 1.5 Attachments: SOLR-1601.patch My schema has a field defined as: {noformat} fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishPorterFilterFactory protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType {noformat} and when I view the field in the schema browser, I see: {noformat} Tokenized: true Class Name: org.apache.solr.schema.TextField Index Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 catenateNumbers: 1 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} Query Analyzer: org.apache.solr.analysis.TokenizerChain Tokenizer Class: org.apache.solr.analysis.WhitespaceTokenizerFactory Filters: org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand: true ignoreCase: true } org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase: true enablePositionIncrements: true } org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 catenateNumbers: 0 } org.apache.solr.analysis.LowerCaseFilterFactory args:{} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt } org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{} {noformat} It's not a big deal, but I expected to see some indication of the charFilter that is in place. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)
[ https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782335#action_12782335 ] Koji Sekiguchi commented on SOLR-1489: -- Thanks, Ohtani-san. Using these new jetty jars (6.1.22), I run ant test, but I got a failure: {code:title=TEST-org.apache.solr.servlet.CacheHeaderTest.txt} Testcase: testCacheVetoHandler took 2.469 sec Testcase: testCacheVetoException took 1.25 sec FAILED null expected:[no-cache, ]no-store but was:[must-revalidate,no-cache,]no-store junit.framework.ComparisonFailure: null expected:[no-cache, ]no-store but was:[must-revalidate,no-cache,]no-store at org.apache.solr.servlet.CacheHeaderTest.checkVetoHeaders(CacheHeaderTest.java:65) at org.apache.solr.servlet.CacheHeaderTest.testCacheVetoException(CacheHeaderTest.java:59) Testcase: testLastModified took 1.188 sec Testcase: testEtag took 1.11 sec Testcase: testCacheControl took 1.391 sec {code} According to SOLR-632, the cache header related test was failed when we used jetty-6.1.11, Lars filed https://jira.codehaus.org/browse/JETTY-646. Now the issue has been fixed, I thought jetty-6.1.22 should work. I've not looked into the details of cache header test, though. A UTF-8 character is output twice (Bug in Jetty) Key: SOLR-1489 URL: https://issues.apache.org/jira/browse/SOLR-1489 Project: Solr Issue Type: Bug Environment: Jetty-6.1.3 Jetty-6.1.21 Jetty-7.0.0RC6 Reporter: Jun Ohtani Assignee: Koji Sekiguchi Priority: Critical Attachments: error_utf8-example.xml, jetty-6.1.22.jar, jetty-util-6.1.22.jar, jettybugsample.war, jsp-2.1.zip, servlet-api-2.5-20081211.jar A UTF-8 character is output twice under particular conditions. Attach the sample data.(error_utf8-example.xml) Registered only sample data, click the following URL. http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json Sample data is only B, but response is BB. When wt=phps, error occurs in PHP unsrialize() function. This bug is like a bug in Jetty. jettybugsample.war is the simplest one to reproduce the problem. Copy example/webapps, and start Jetty server, and click the following URL. http://localhost:8983/jettybugsample/filter/hoge Like earlier, B is output twice. Sysout only B once. I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6. (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in web.xml. ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)
[ https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779814#action_12779814 ] Koji Sekiguchi commented on SOLR-1489: -- Ok, http://jira.codehaus.org/browse/JETTY-1122 has been marked as fixed and jetty 6.1.22 released. Ohtani-san, can you test the new jetty with your test case to see the bug is gone? Thanks. A UTF-8 character is output twice (Bug in Jetty) Key: SOLR-1489 URL: https://issues.apache.org/jira/browse/SOLR-1489 Project: Solr Issue Type: Bug Environment: Jetty-6.1.3 Jetty-6.1.21 Jetty-7.0.0RC6 Reporter: Jun Ohtani Assignee: Koji Sekiguchi Priority: Critical Attachments: error_utf8-example.xml, jettybugsample.war A UTF-8 character is output twice under particular conditions. Attach the sample data.(error_utf8-example.xml) Registered only sample data, click the following URL. http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json Sample data is only B, but response is BB. When wt=phps, error occurs in PHP unsrialize() function. This bug is like a bug in Jetty. jettybugsample.war is the simplest one to reproduce the problem. Copy example/webapps, and start Jetty server, and click the following URL. http://localhost:8983/jettybugsample/filter/hoge Like earlier, B is output twice. Sysout only B once. I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6. (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in web.xml. ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader
[ https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773213#action_12773213 ] Koji Sekiguchi commented on SOLR-1506: -- bq. Commit doesn't work because reopen isn't supported by MultiReader. Regarding MultiReader and reopen, I've set reopenReaders to false: {code:title=solrconfig.xml} reopenReadersfalse/reopenReaders : indexReaderFactory name=IndexReaderFactory class=mypackage.MultiReaderFactory/ {code} Search multiple cores using MultiReader --- Key: SOLR-1506 URL: https://issues.apache.org/jira/browse/SOLR-1506 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Trivial Fix For: 1.5 Attachments: SOLR-1506.patch, SOLR-1506.patch I need to search over multiple cores, and SOLR-1477 is more complicated than expected, so here we'll create a MultiReader over the cores to allow searching on them. Maybe in the future we can add parallel searching however SOLR-1477, if it gets completed, provides that out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer
[ https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769741#action_12769741 ] Koji Sekiguchi commented on SOLR-822: - bq. Please update the Wiki for this feature. Done. :) CharFilter - normalize characters before tokenizer -- Key: SOLR-822 URL: https://issues.apache.org/jira/browse/SOLR-822 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.3 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.4 Attachments: character-normalization.JPG, japanese-h-to-k-mapping.txt, sample_mapping_ja.txt, sample_mapping_ja.txt, SOLR-822-for-1.3.patch, SOLR-822-renameMethod.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch A new plugin which can be placed in front of tokenizer/. {code:xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping_ja.txt / tokenizer class=solr.MappingCJKTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} charFilter/ can be multiple (chained). I'll post a JPEG file to show character normalization sample soon. MOTIVATION: In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and Morphological Analyzer. When we use morphological analyzer, because the analyzer uses Japanese dictionary to detect terms, we need to normalize characters before tokenization. I'll post a patch soon, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)
[ https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-561: Component/s: (was: replication (scripts)) replication (java) change component from scripts to java Solr replication by Solr (for windows also) --- Key: SOLR-561 URL: https://issues.apache.org/jira/browse/SOLR-561 Project: Solr Issue Type: New Feature Components: replication (java) Affects Versions: 1.4 Environment: All Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 Attachments: deletion_policy.patch, SOLR-561-core.patch, SOLR-561-fixes.patch, SOLR-561-fixes.patch, SOLR-561-fixes.patch, SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch The current replication strategy in solr involves shell scripts . The following are the drawbacks with the approach * It does not work with windows * Replication works as a separate piece not integrated with solr. * Cannot control replication from solr admin/JMX * Each operation requires manual telnet to the host Doing the replication in java has the following advantages * Platform independence * Manual steps can be completely eliminated. Everything can be driven from solrconfig.xml . ** Adding the url of the master in the slaves should be good enough to enable replication. Other things like frequency of snapshoot/snappull can also be configured . All other information can be automatically obtained. * Start/stop can be triggered from solr/admin or JMX * Can get the status/progress while replication is going on. It can also abort an ongoing replication * No need to have a login into the machine * From a development perspective, we can unit test it This issue can track the implementation of solr replication in java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-551) Solr replication should include the schema also
[ https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-551: Component/s: (was: replication (scripts)) replication (java) change component from scripts to java Solr replication should include the schema also --- Key: SOLR-551 URL: https://issues.apache.org/jira/browse/SOLR-551 Project: Solr Issue Type: Improvement Components: replication (java) Affects Versions: 1.4 Reporter: Noble Paul Assignee: Shalin Shekhar Mangar Fix For: 1.4 The current Solr replication just copy the data directory . So if the schema changes and I do a re-index it will blissfully copy the index and the slaves will fail because of incompatible schema. So the steps we follow are * Stop rsync on slaves * Update the master with new schema * re-index data * forEach slave ** Kill the slave ** clean the data directory ** install the new schema ** restart ** do a manual snappull The amount of work the admin needs to do is quite significant (depending on the no:of slaves). These are manual steps and very error prone The solution : Make the replication mechanism handle the schema replication also. So all I need to do is to just change the master and the slaves synch automatically What is a good way to implement this? We have an idea along the following lines This should involve changes to the snapshooter and snappuller scripts and the snapinstaller components Everytime the snapshooter takes a snapshot it must keep the timestamps of schema.xml and elevate.xml (all the files which might affect the runtime behavior in slaves) For subsequent snapshots if the timestamps of any of them is changed it must copy the all of them also for replication. The snappuller copies the new directory as usual The snapinstaller checks if these config files are present , if yes, * It can create a temporary core * install the changed index and configuration * load it completely and swap it out with the original core -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1099. -- Resolution: Fixed Committed revision 827032. Thanks. FieldAnalysisRequestHandler --- Key: SOLR-1099 URL: https://issues.apache.org/jira/browse/SOLR-1099 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.3 Reporter: Uri Boness Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: AnalisysRequestHandler_refactored.patch, analysis_request_handlers_incl_solrj.patch, AnalysisRequestHandler_refactored1.patch, FieldAnalysisRequestHandler_incl_test.patch, SOLR-1099-ordered-TokenizerChain.patch, SOLR-1099.patch, SOLR-1099.patch, SOLR-1099.patch The FieldAnalysisRequestHandler provides the analysis functionality of the web admin page as a service. This handler accepts a filetype/fieldname parameter and a value and as a response returns a breakdown of the analysis process. It is also possible to send a query value which will use the configured query analyzer as well as a showmatch parameter which will then mark every matched token as a match. If this handler is added to the code base, I also recommend to rename the current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have them both inherit from one AnalysisRequestHandlerBase class which provides the common functionality of the analysis breakdown and its translation to named lists. This will also enhance the current AnalysisRequestHandler which right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reopened SOLR-1099: -- Assignee: Koji Sekiguchi (was: Shalin Shekhar Mangar) Hmm, I think the order of Tokenizer/TokenFilters in response is unconsidered. For example, I cannot take out Tokenizer/TokenFilters from ruby response in order... FieldAnalysisRequestHandler --- Key: SOLR-1099 URL: https://issues.apache.org/jira/browse/SOLR-1099 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.3 Reporter: Uri Boness Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: AnalisysRequestHandler_refactored.patch, analysis_request_handlers_incl_solrj.patch, AnalysisRequestHandler_refactored1.patch, FieldAnalysisRequestHandler_incl_test.patch, SOLR-1099.patch, SOLR-1099.patch, SOLR-1099.patch The FieldAnalysisRequestHandler provides the analysis functionality of the web admin page as a service. This handler accepts a filetype/fieldname parameter and a value and as a response returns a breakdown of the analysis process. It is also possible to send a query value which will use the configured query analyzer as well as a showmatch parameter which will then mark every matched token as a match. If this handler is added to the code base, I also recommend to rename the current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have them both inherit from one AnalysisRequestHandlerBase class which provides the common functionality of the analysis breakdown and its translation to named lists. This will also enhance the current AnalysisRequestHandler which right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1099: - Attachment: SOLR-1099-ordered-TokenizerChain.patch I'd like to use NamedList rather than SimpleOrderedMap. If there is no objections, I'll commit soon. All tests pass. FieldAnalysisRequestHandler --- Key: SOLR-1099 URL: https://issues.apache.org/jira/browse/SOLR-1099 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.3 Reporter: Uri Boness Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: AnalisysRequestHandler_refactored.patch, analysis_request_handlers_incl_solrj.patch, AnalysisRequestHandler_refactored1.patch, FieldAnalysisRequestHandler_incl_test.patch, SOLR-1099-ordered-TokenizerChain.patch, SOLR-1099.patch, SOLR-1099.patch, SOLR-1099.patch The FieldAnalysisRequestHandler provides the analysis functionality of the web admin page as a service. This handler accepts a filetype/fieldname parameter and a value and as a response returns a breakdown of the analysis process. It is also possible to send a query value which will use the configured query analyzer as well as a showmatch parameter which will then mark every matched token as a match. If this handler is added to the code base, I also recommend to rename the current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have them both inherit from one AnalysisRequestHandlerBase class which provides the common functionality of the analysis breakdown and its translation to named lists. This will also enhance the current AnalysisRequestHandler which right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1515) Javadoc typo in SolrQueryResponse
[ https://issues.apache.org/jira/browse/SOLR-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1515: - Fix Version/s: (was: 1.5) 1.4 Javadoc typo in SolrQueryResponse - Key: SOLR-1515 URL: https://issues.apache.org/jira/browse/SOLR-1515 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Environment: my local MacBook pro Reporter: Chris A. Mattmann Priority: Trivial Fix For: 1.4 Attachments: SOLR-1515.101709.Mattmann.patch.txt There is a minute typo in the javadoc for o.a.s.request.SolrQueryResponse.java. This patch fixes that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1515) Javadoc typo in SolrQueryResponse
[ https://issues.apache.org/jira/browse/SOLR-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1515. -- Resolution: Fixed Committed revision 826321. Thanks. Javadoc typo in SolrQueryResponse - Key: SOLR-1515 URL: https://issues.apache.org/jira/browse/SOLR-1515 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Environment: my local MacBook pro Reporter: Chris A. Mattmann Priority: Trivial Fix For: 1.4 Attachments: SOLR-1515.101709.Mattmann.patch.txt There is a minute typo in the javadoc for o.a.s.request.SolrQueryResponse.java. This patch fixes that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Code Freeze, Release Process, etc.
Grant Ingersoll wrote: OK, so we are in code freeze right now. I'm going to follow the Release process at http://wiki.apache.org/solr/HowToRelease I will put up an RC now, then people can try it out, etc. I would then like to have a goal of putting up an official set of artifacts to be voted on next Monday. In the interim, we should review docs, etc. and update the wiki where possible. How does that sound? -Grant Sounds great! Koji -- http://www.rondhuit.com/en/
Re: 1.4.0 RC
Yonik Seeley wrote: On Tue, Oct 13, 2009 at 8:12 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Oct 13, 2009 at 8:03 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : http://people.apache.org/~gsingers/solr/1.4.0-RC/ I suspect we're going to want to wait for Lucene 2.9.1 - particularly because of LUCENE-1974. I know I was lobbying for not using non-released versions of Lucene due to the increase in flux, but I really meant non-bugfix branches. Seems safe to use an unreleased 2.9.1 branch? If there are no objections, I'll update to the fixed 2.9.1 branch. We can figure out whether to wait for 2.9.1 or not later when we know the schedule. +1 to update to the fixed 2.9 branch and proceed to release RC. Koji -- http://www.rondhuit.com/en/
Re: rollback and cumulative_add
Koji Sekiguchi wrote: Hello, I found that rollback resets adds and docsPending count, but doesn't reset cumulative_adds. $ cd example/exampledocs # comment out the line of commit/ so avoid committing in post.sh $ ./post.sh *.xml = docsPending=19, adds=19, cumulative_adds=19 # do rollback $ curl http://localhost:8983/solr/update?rollback=true = rollbacks=1, docsPending=0, adds=0, cumulative_adds=19 Is this correct behavior? Koji (forwarded dev list) I think this is a bug that was introduced by me when I contributed the first patch for the rollback and the bug was inherited by the successive patches. I'll reopen SOLR-670 and attach the fix soon: https://issues.apache.org/jira/browse/SOLR-670 Koji -- http://www.rondhuit.com/
[jira] Updated: (SOLR-670) UpdateHandler must provide a rollback feature
[ https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-670: Attachment: SOLR-670-revert-cumulative-counts.patch The fix and test case. I'll commit soon. UpdateHandler must provide a rollback feature - Key: SOLR-670 URL: https://issues.apache.org/jira/browse/SOLR-670 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: SOLR-670-revert-cumulative-counts.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch Lucene IndexWriter already has a rollback method. There should be a counterpart for the same in _UpdateHandler_ so that users can do a rollback over http -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-670) UpdateHandler must provide a rollback feature
[ https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-670. - Resolution: Fixed Committed revision 824380. UpdateHandler must provide a rollback feature - Key: SOLR-670 URL: https://issues.apache.org/jira/browse/SOLR-670 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: SOLR-670-revert-cumulative-counts.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch Lucene IndexWriter already has a rollback method. There should be a counterpart for the same in _UpdateHandler_ so that users can do a rollback over http -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1504) empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.
empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co. --- Key: SOLR-1504 URL: https://issues.apache.org/jira/browse/SOLR-1504 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.4 If you have the following mapping rule in mapping.txt: {code} # destination can be empty NULL = {code} you can get AIOOBE by specifying NULL for either index or query data in the input form of analysis.jsp (and co. i.e. DocumentAnalysisRequestHandler and FieldAnalysisRequestHandler). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1504) empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.
[ https://issues.apache.org/jira/browse/SOLR-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1504: - Attachment: SOLR-1504.patch A patch for the fix. Will commit soon. empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co. --- Key: SOLR-1504 URL: https://issues.apache.org/jira/browse/SOLR-1504 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.4 Attachments: SOLR-1504.patch If you have the following mapping rule in mapping.txt: {code} # destination can be empty NULL = {code} you can get AIOOBE by specifying NULL for either index or query data in the input form of analysis.jsp (and co. i.e. DocumentAnalysisRequestHandler and FieldAnalysisRequestHandler). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-1504) empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.
[ https://issues.apache.org/jira/browse/SOLR-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1504. -- Resolution: Fixed Committed revision 824045. empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co. --- Key: SOLR-1504 URL: https://issues.apache.org/jira/browse/SOLR-1504 Project: Solr Issue Type: Bug Components: Analysis Affects Versions: 1.4 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.4 Attachments: SOLR-1504.patch If you have the following mapping rule in mapping.txt: {code} # destination can be empty NULL = {code} you can get AIOOBE by specifying NULL for either index or query data in the input form of analysis.jsp (and co. i.e. DocumentAnalysisRequestHandler and FieldAnalysisRequestHandler). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Down to 5
Hi Shalin, What about FastVectorHighlighter? https://issues.apache.org/jira/browse/SOLR-1268 If we're targeting RC in this week, I'd like to push it to 1.5 because there is no patches. But perhaps you think 13 votes is considerable? Koji
[jira] Assigned: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-1268: Assignee: Koji Sekiguchi Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter
[ https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated SOLR-1268: - Fix Version/s: 1.5 Mark it to 1.5 because there is no patches. Incorporate Lucene's FastVectorHighlighter -- Key: SOLR-1268 URL: https://issues.apache.org/jira/browse/SOLR-1268 Project: Solr Issue Type: New Feature Components: highlighter Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Down to 5
+1. Grant Ingersoll wrote: Coming along: https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true If we can finish these up this week, I can generate RCs next week. Thoughts? -Grant
[jira] Commented: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)
[ https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761900#action_12761900 ] Koji Sekiguchi commented on SOLR-1489: -- Good catch, Otani-san! I can reproduce the problem with the data and the filter you attached when running it on Jetty. And thank you for opening the JIRA ticket in Jetty. Now we are closing to releasing 1.4, I don't want this to be a blocker because this is not a Solr bug as you said. You can run Solr on arbitrary servlet containers other than Jetty if you'd like. I'd like to keep this opening, and watching http://jira.codehaus.org/browse/JETTY-1122 . Thanks. A UTF-8 character is output twice (Bug in Jetty) Key: SOLR-1489 URL: https://issues.apache.org/jira/browse/SOLR-1489 Project: Solr Issue Type: Bug Environment: Jetty-6.1.3 Jetty-6.1.21 Jetty-7.0.0RC6 Reporter: Jun Ohtani Priority: Critical Attachments: error_utf8-example.xml, jettybugsample.war A UTF-8 character is output twice under particular conditions. Attach the sample data.(error_utf8-example.xml) Registered only sample data, click the following URL. http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json Sample data is only B, but response is BB. When wt=phps, error occurs in PHP unsrialize() function. This bug is like a bug in Jetty. jettybugsample.war is the simplest one to reproduce the problem. Copy example/webapps, and start Jetty server, and click the following URL. http://localhost:8983/jettybugsample/filter/hoge Like earlier, B is output twice. Sysout only B once. I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6. (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in web.xml. ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)
[ https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi reassigned SOLR-1489: Assignee: Koji Sekiguchi A UTF-8 character is output twice (Bug in Jetty) Key: SOLR-1489 URL: https://issues.apache.org/jira/browse/SOLR-1489 Project: Solr Issue Type: Bug Environment: Jetty-6.1.3 Jetty-6.1.21 Jetty-7.0.0RC6 Reporter: Jun Ohtani Assignee: Koji Sekiguchi Priority: Critical Attachments: error_utf8-example.xml, jettybugsample.war A UTF-8 character is output twice under particular conditions. Attach the sample data.(error_utf8-example.xml) Registered only sample data, click the following URL. http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json Sample data is only B, but response is BB. When wt=phps, error occurs in PHP unsrialize() function. This bug is like a bug in Jetty. jettybugsample.war is the simplest one to reproduce the problem. Copy example/webapps, and start Jetty server, and click the following URL. http://localhost:8983/jettybugsample/filter/hoge Like earlier, B is output twice. Sysout only B once. I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6. (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in web.xml. ) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1481) phps writer ignores omitHeader parameter
phps writer ignores omitHeader parameter Key: SOLR-1481 URL: https://issues.apache.org/jira/browse/SOLR-1481 Project: Solr Issue Type: Bug Components: search Reporter: Koji Sekiguchi Priority: Trivial Fix For: 1.4 My co-worker found this one. I'm expecting a patch will be attached soon by him. :) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: svn commit: r819314 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java src/test/org/apache/solr/highlight/HighlighterTest.java
Also make both options default to true. If so, isn't this line (from HighlightComponent) needed to be also true by default? boolean rewrite = !(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER)) Boolean.valueOf(req.getParams().get(HighlightParams.HIGHLIGHT_MULTI_TERM))); I think MultiTermQueries are converted to ConstantScoreQuery by rewrite? Koji markrmil...@apache.org wrote: Author: markrmiller Date: Sun Sep 27 13:58:30 2009 New Revision: 819314 URL: http://svn.apache.org/viewvc?rev=819314view=rev Log: SOLR-1221: Change Solr Highlighting to use the SpanScorer with MultiTerm expansion by default Modified: lucene/solr/trunk/CHANGES.txt lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java Modified: lucene/solr/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt?rev=819314r1=819313r2=819314view=diff == --- lucene/solr/trunk/CHANGES.txt (original) +++ lucene/solr/trunk/CHANGES.txt Sun Sep 27 13:58:30 2009 @@ -503,8 +503,8 @@ 45. SOLR-1078: Fixes to WordDelimiterFilter to avoid splitting or dropping international non-letter characters such as non spacing marks. (yonik) -46. SOLR-825: Enables highlighting for range/wildcard/fuzzy/prefix queries if using hl.usePhraseHighlighter=true -and hl.highlightMultiTerm=true. (Mark Miller) +46. SOLR-825, SOLR-1221: Enables highlighting for range/wildcard/fuzzy/prefix queries if using hl.usePhraseHighlighter=true +and hl.highlightMultiTerm=true. Also make both options default to true. (Mark Miller) 47. SOLR-1174: Fix Logging admin form submit url for multicore. (Jacob Singh via shalin) Modified: lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java URL: http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java?rev=819314r1=819313r2=819314view=diff == --- lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java (original) +++ lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java Sun Sep 27 13:58:30 2009 @@ -144,7 +144,7 @@ */ private QueryScorer getSpanQueryScorer(Query query, String fieldName, TokenStream tokenStream, SolrQueryRequest request) throws IOException { boolean reqFieldMatch = request.getParams().getFieldBool(fieldName, HighlightParams.FIELD_MATCH, false); -Boolean highlightMultiTerm = request.getParams().getBool(HighlightParams.HIGHLIGHT_MULTI_TERM); +Boolean highlightMultiTerm = request.getParams().getBool(HighlightParams.HIGHLIGHT_MULTI_TERM, true); if(highlightMultiTerm == null) { highlightMultiTerm = false; } @@ -306,8 +306,9 @@ } Highlighter highlighter; -if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER))) { - // wrap CachingTokenFilter around TokenStream for reuse +if (Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER, true))) { + // TODO: this is not always necessary - eventually we would like to avoid this wrap + // when it is not needed. tstream = new CachingTokenFilter(tstream); // get highlighter Modified: lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java URL: http://svn.apache.org/viewvc/lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java?rev=819314r1=819313r2=819314view=diff == --- lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java (original) +++ lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java Sun Sep 27 13:58:30 2009 @@ -585,6 +585,7 @@ args.put(hl.fl, t_text); args.put(hl.fragsize, 40); args.put(hl.snippets, 10); +args.put(hl.usePhraseHighlighter, false); TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory( standard, 0, 200, args);
[jira] Resolved: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others
[ https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1423. -- Resolution: Fixed Committed revision 816502. Thanks, Uwe! Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others Key: SOLR-1423 URL: https://issues.apache.org/jira/browse/SOLR-1423 Project: Solr Issue Type: Task Components: Analysis Affects Versions: 1.4 Reporter: Uwe Schindler Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: SOLR-1423-FieldType.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter API a little bit. Tokenizer now only has a input field of type java.io.Reader (as before the CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int) method, which delegates to the CharStream (if input is subclass of CharStream), else returns an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset() to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in Solr do correct their offsets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others
[ https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756923#action_12756923 ] Koji Sekiguchi commented on SOLR-1423: -- The patch looks good! Will commit shortly. Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others Key: SOLR-1423 URL: https://issues.apache.org/jira/browse/SOLR-1423 Project: Solr Issue Type: Task Components: Analysis Affects Versions: 1.4 Reporter: Uwe Schindler Assignee: Koji Sekiguchi Fix For: 1.4 Attachments: SOLR-1423-FieldType.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter API a little bit. Tokenizer now only has a input field of type java.io.Reader (as before the CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int) method, which delegates to the CharStream (if input is subclass of CharStream), else returns an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset() to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in Solr do correct their offsets. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.