[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596925#action_12596925 ] Otis Gospodnetic commented on SOLR-556: --- Lars - could you please try the patch in SOLR-553 and see if it fixes the problem you described here? Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: solr-highlight-multivalued.patch When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-491) highlight doesn't work with range search
[ https://issues.apache.org/jira/browse/SOLR-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596926#action_12596926 ] Otis Gospodnetic commented on SOLR-491: --- Using a highlighter to determine which field(s) matched the query seems like the wrong thing to go. Maybe the highlighter brings you closer to having this information, but it feels like a hack to me. However, Xuesong, please try the patch in SOLR-553 and see if you can get by without getting the error. SOLR-553 makes use of LUCENE-794, which should handle ConstantScoreRangeQuery. If it does work for you, please close this issue or leave a comment and we'll close it. Thanks. highlight doesn't work with range search Key: SOLR-491 URL: https://issues.apache.org/jira/browse/SOLR-491 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: windows xp sp2 jboss4.0.5 Reporter: Xuesong Luo Priority: Minor I need to do range search on an integer field, which is defined as type sint. It works fine without highlight. However if I turn on highlight, I got the following error: 2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM org.apache.solr.core.SolrCore execute INFO: [xluo] /select/rows=10start=0hl.fl=bookCountindent=onq=bookCount:5hl=trueversion=2.2 0 0 2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NumberFormatException: For input string: at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:403) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.util.NumberUtils.long2sortableStr(NumberUtils.java:52) at org.apache.solr.schema.SortableLongField.toInternal(SortableLongField.java:49) at org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:315) at org.apache.solr.highlight.TokenOrderingFilter.next(SolrHighlighter.java:439) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) I also tried range strange on data field, got similr error too when use highlight. I posted the problem at solr user list, here is what Hoss said: -- I'm not sure if i really understand what it would mean to highlight a numeric field, hilighting a range query probably won't ever work because of the way range queries are implemented in Solr ... but at the very least there should be a better error message in this case. (and the case of a simple single value numeric lookup should probably work) -- The reason I need to highlight the numeric or data field is I have to loop through the search result to apply role permission check on those fields. If the searcher doesn't have permission to see the numeric/date field of the user in the search result list, that field should be set to null when returned. If the search doesn't have permission on all matching fields, then the whole record should not be returned. How can I find out which field is the matching field if searching on multiple fields? The only easy way I can think about is if the field is highlighted, it's a matching field. http://www.mail-archive.com/[EMAIL PROTECTED]/msg09239.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-255) RemoteSearchable for Solr(use RMI)
[ https://issues.apache.org/jira/browse/SOLR-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated SOLR-255: -- Assignee: Otis Gospodnetic I re-read all the comments here and from what I understand, this patch doesn't really add anything that SOLR-303 hasn't already given us. Is that correct? If that's correct, I'll close this issue. RemoteSearchable for Solr(use RMI) -- Key: SOLR-255 URL: https://issues.apache.org/jira/browse/SOLR-255 Project: Solr Issue Type: Improvement Components: search Reporter: Toru Matsuzawa Assignee: Otis Gospodnetic Attachments: solr-multi20070606.zip, solr-multi20070724..zip I experimentally implemented RemoteSearchable of Lucene for Solr. I referred to FederatedSearch and used RMI. Two or more Searchers can be referred to with SolrIndexSearcher. These query-only indexes can be specified in solrconfig.xml, enumerating the list under a searchIndex tag. searchIndex lstE:\sample\data1/lst lstE:\sample\data2/lst lstrmi://localhost/lst /searchIndex The index in the dataDir is also used as the default index of solr to update and query. When data of a document in a index specified under the searchIndex is updated, that document data in the index will be deleted and data of the updated document will be stored in the index in the dataDir. SolrRemoteSearchable (the searcher for remote access) is started from SolrCore by specifying remoteSearchertrue/remoteSearcher in solrconfig.xml.(It is registered in RMI. ) (-Djava.security.policy should be set when you start VM. ) Not all of the operational cases are tested because Solr has so many features. Moreover, TestUnit has not been made because I made this through a trial and error process. Some changes are required in Lucene to execute this. I need your comments on this although it might be hard without TestUnit. I especially worry about the followings: - Am I on the right truck about this issue? - Is the extent of modifying Lucene tolerable? - Are there any ideas to implement this feature without modifying Lucene? - Does this idea contribute for improving Solr? - This implementation may partially overlap with Multiple Solr Cores. What should be done? - Are there any other considerations about this issue, which I have overlooked? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-319) changes SynonymFilterFactoryto Analyze synonyms file
[ https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596938#action_12596938 ] Otis Gospodnetic commented on SOLR-319: --- I think patch is ripe for a commit. Koji, want to commit your own baby? :) changes SynonymFilterFactoryto Analyze synonyms file -- Key: SOLR-319 URL: https://issues.apache.org/jira/browse/SOLR-319 Project: Solr Issue Type: Improvement Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch WHAT: Currently, SynonymFilterFactory works very well with N-gram tokenizer (CJKTokenizer, for example). But we have to take care of the statement in synonyms.txt. For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want C1C2C3 maps to C4C5C6, I have to write the rule as follows: C1C2 C2C3 = C4C5 C5C6 But I want to write it C1C2C3=C4C5C6. This patch allows it. It is also helpful for sharing synonyms.txt. HOW: tokenFactory attribute is added to filter class=solr.SynonymFilterFactory/. If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory to create Tokenizer. Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in synonyms.txt file. sample-1: CJKTokenizer fieldtype name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=ngram_synonym_test_ja.txt ignoreCase=true expand=true tokenFactory=solr.CJKTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype sample-2: NGramTokenizer fieldtype name=text_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.SynonymFilterFactory synonyms=ngram_synonym_test_ngram.txt ignoreCase=true expand=true tokenFactory=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype backward compatibility: Yes. If you omit tokenFactory attribute from filter class=solr.SynonymFilterFactory/ tag, it works as usual. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-319) changes SynonymFilterFactoryto Analyze synonyms file
[ https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596938#action_12596938 ] otis edited comment on SOLR-319 at 5/14/08 1:51 PM: I think this patch is ripe for a commit. Koji, want to commit your own baby? :) was (Author: otis): I think patch is ripe for a commit. Koji, want to commit your own baby? :) changes SynonymFilterFactoryto Analyze synonyms file -- Key: SOLR-319 URL: https://issues.apache.org/jira/browse/SOLR-319 Project: Solr Issue Type: Improvement Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch WHAT: Currently, SynonymFilterFactory works very well with N-gram tokenizer (CJKTokenizer, for example). But we have to take care of the statement in synonyms.txt. For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want C1C2C3 maps to C4C5C6, I have to write the rule as follows: C1C2 C2C3 = C4C5 C5C6 But I want to write it C1C2C3=C4C5C6. This patch allows it. It is also helpful for sharing synonyms.txt. HOW: tokenFactory attribute is added to filter class=solr.SynonymFilterFactory/. If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory to create Tokenizer. Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in synonyms.txt file. sample-1: CJKTokenizer fieldtype name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=ngram_synonym_test_ja.txt ignoreCase=true expand=true tokenFactory=solr.CJKTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype sample-2: NGramTokenizer fieldtype name=text_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.SynonymFilterFactory synonyms=ngram_synonym_test_ngram.txt ignoreCase=true expand=true tokenFactory=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype backward compatibility: Yes. If you omit tokenFactory attribute from filter class=solr.SynonymFilterFactory/ tag, it works as usual. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-470) DateField throws error on iso8601 date
[ https://issues.apache.org/jira/browse/SOLR-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-470: -- Attachment: SOLR-470.patch builds on the previous patch to include fixes for the formatting issues, code clean up, better tests (LegacyDateFieldTest is now standalone, and can be droped into Solr1.2 to regress against the old DateField), and better documentation ... addresses SOLR-552 and SOLR-544 as well. I think this is ready to commit, but i'd like to see some positive feedback before proceeding -- both on the implementation, and the documentation changes. (something approximating the class docs for LegacyDateField should show up in CHANGES.txt's upgrading section as well). DateField throws error on iso8601 date -- Key: SOLR-470 URL: https://issues.apache.org/jira/browse/SOLR-470 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: patrick o'leary Assignee: Hoss Man Fix For: 1.3 Attachments: SOLR-470.patch, SOLR-470.patch, SOLR-470.patch, SOLR-470.patch A correct iso 8601 date 2006-01-01T12:01:00Z throws an error. Unparseable date: 2006-01-01T12:01:00Z at org.apache.solr.schema.DateField.toObject(DateField.java:173) at org.apache.solr.schema.DateField.toObject(DateField.java:83) The ThreadLocalDateFormat requires fractional seconds -MM-dd'T'HH:mm:ss.SSS to parse with simple date format. Where as the jdoc states their optional. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-319) changes SynonymFilterFactoryto Analyze synonyms file
[ https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596989#action_12596989 ] Koji Sekiguchi commented on SOLR-319: - Thanks, Otis. I will commit this in a week if there is no objection. BTW, I cannot assign myself on JIRA... looks like I have no permission? changes SynonymFilterFactoryto Analyze synonyms file -- Key: SOLR-319 URL: https://issues.apache.org/jira/browse/SOLR-319 Project: Solr Issue Type: Improvement Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch WHAT: Currently, SynonymFilterFactory works very well with N-gram tokenizer (CJKTokenizer, for example). But we have to take care of the statement in synonyms.txt. For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want C1C2C3 maps to C4C5C6, I have to write the rule as follows: C1C2 C2C3 = C4C5 C5C6 But I want to write it C1C2C3=C4C5C6. This patch allows it. It is also helpful for sharing synonyms.txt. HOW: tokenFactory attribute is added to filter class=solr.SynonymFilterFactory/. If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory to create Tokenizer. Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in synonyms.txt file. sample-1: CJKTokenizer fieldtype name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=ngram_synonym_test_ja.txt ignoreCase=true expand=true tokenFactory=solr.CJKTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype sample-2: NGramTokenizer fieldtype name=text_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.SynonymFilterFactory synonyms=ngram_synonym_test_ngram.txt ignoreCase=true expand=true tokenFactory=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=2/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype backward compatibility: Yes. If you omit tokenFactory attribute from filter class=solr.SynonymFilterFactory/ tag, it works as usual. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Adding Koji to JIRA
I *think* Yonik or Hoss have to manually give you JIRA privileges... Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Forwarded Message From: Koji Sekiguchi (JIRA) [EMAIL PROTECTED] Koji Sekiguchi commented on SOLR-319: - Thanks, Otis. I will commit this in a week if there is no objection. BTW, I cannot assign myself on JIRA... looks like I have no permission?
[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596999#action_12596999 ] Lars Kotthoff commented on SOLR-556: I've applied SOLR-553 and confirmed that this problem is not fixed, regardless of the setting of usePhraseHighlighter. Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: solr-highlight-multivalued.patch When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-470) DateField throws error on iso8601 date
[ https://issues.apache.org/jira/browse/SOLR-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597006#action_12597006 ] Noble Paul commented on SOLR-470: - The patch did not apply properly on trunk. SVN cound not fetc the given version of LegacyDateField The implementation looks fine DateField throws error on iso8601 date -- Key: SOLR-470 URL: https://issues.apache.org/jira/browse/SOLR-470 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: patrick o'leary Assignee: Hoss Man Fix For: 1.3 Attachments: SOLR-470.patch, SOLR-470.patch, SOLR-470.patch, SOLR-470.patch A correct iso 8601 date 2006-01-01T12:01:00Z throws an error. Unparseable date: 2006-01-01T12:01:00Z at org.apache.solr.schema.DateField.toObject(DateField.java:173) at org.apache.solr.schema.DateField.toObject(DateField.java:83) The ThreadLocalDateFormat requires fractional seconds -MM-dd'T'HH:mm:ss.SSS to parse with simple date format. Where as the jdoc states their optional. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Kotthoff updated SOLR-556: --- Attachment: solr-highlight-multivalued-example.xml Attaching test file with example document, relevant part of schema.xml and example query. Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: solr-highlight-multivalued-example.xml, solr-highlight-multivalued.patch When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.