[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-05-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596925#action_12596925
 ] 

Otis Gospodnetic commented on SOLR-556:
---

Lars - could you please try the patch in SOLR-553 and see if it fixes the 
problem you described here?


 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-491) highlight doesn't work with range search

2008-05-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596926#action_12596926
 ] 

Otis Gospodnetic commented on SOLR-491:
---

Using a highlighter to determine which field(s) matched the query seems like 
the wrong thing to go.  Maybe the highlighter brings you closer to having this 
information, but it feels like a hack to me.

However, Xuesong, please try the patch in SOLR-553 and see if you can get by 
without getting the error.  SOLR-553 makes use of LUCENE-794, which should 
handle ConstantScoreRangeQuery.  If it does work for you, please close this 
issue or leave a comment and we'll close it.  Thanks.


 highlight doesn't work with range search
 

 Key: SOLR-491
 URL: https://issues.apache.org/jira/browse/SOLR-491
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: windows xp sp2  jboss4.0.5 
Reporter: Xuesong Luo
Priority: Minor

 I need to do range search on an integer field, which is defined as type sint. 
 It works fine without highlight. However if I turn on highlight, I got the 
 following error:
 2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM 
 org.apache.solr.core.SolrCore execute
 INFO: [xluo] 
 /select/rows=10start=0hl.fl=bookCountindent=onq=bookCount:5hl=trueversion=2.2
  0 0
 2008-02-25 16:54:53,524 ERROR [STDERR] Feb 25, 2008 4:54:53 PM  
 org.apache.solr.common.SolrException log
 SEVERE: java.lang.NumberFormatException: For input string:
  at  
 java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
  at java.lang.Long.parseLong(Long.java:403)
  at java.lang.Long.parseLong(Long.java:461)
  at org.apache.solr.util.NumberUtils.long2sortableStr(NumberUtils.java:52)
  at 
 org.apache.solr.schema.SortableLongField.toInternal(SortableLongField.java:49)
  at 
 org.apache.solr.schema.FieldType$DefaultAnalyzer$1.next(FieldType.java:315)
  at 
 org.apache.solr.highlight.TokenOrderingFilter.next(SolrHighlighter.java:439)
  at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
  
 I also tried range strange on data field, got similr error too when use 
 highlight. I posted the problem at solr user list, here is what Hoss said:
 --
 I'm not sure if i really understand what it would mean to highlight a numeric 
 field,  hilighting a range query probably won't ever work 
 because of the way range queries are implemented in Solr ... but at the very 
 least there should be a better error message in this case.  (and the 
 case of a simple single value numeric lookup should probably work)
 --
 The reason I need to highlight the numeric or data field is I have to loop 
 through the search result to apply role permission check on those fields. If 
 the searcher doesn't have permission to see the numeric/date field of the 
 user in the search result list, that field should be set to null when 
 returned. If the search doesn't have permission on all matching fields, then 
 the whole record should not be returned. How can I find out which field is 
 the matching field if searching on multiple fields? The only easy way I can 
 think about is if the field is highlighted, it's a matching field.  
 http://www.mail-archive.com/[EMAIL PROTECTED]/msg09239.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-255) RemoteSearchable for Solr(use RMI)

2008-05-14 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated SOLR-255:
--

Assignee: Otis Gospodnetic

I re-read all the comments here and from what I understand, this patch doesn't 
really add anything that SOLR-303 hasn't already given us.

Is that correct?  If that's correct, I'll close this issue.


 RemoteSearchable for Solr(use RMI)
 --

 Key: SOLR-255
 URL: https://issues.apache.org/jira/browse/SOLR-255
 Project: Solr
  Issue Type: Improvement
  Components: search
Reporter: Toru Matsuzawa
Assignee: Otis Gospodnetic
 Attachments: solr-multi20070606.zip, solr-multi20070724..zip


 I experimentally implemented RemoteSearchable of Lucene for Solr.
 I referred to FederatedSearch and used RMI. 
 Two or more Searchers can be referred to with SolrIndexSearcher.
 These query-only indexes can be specified in solrconfig.xml, 
 enumerating the list under a searchIndex tag.
   searchIndex
 lstE:\sample\data1/lst
 lstE:\sample\data2/lst
 lstrmi://localhost/lst
   /searchIndex
 The index in the dataDir is also used as the default index of solr
 to update and query.
 When data of a document in a index specified under the searchIndex is
 updated, 
 that document data in the index will be deleted and data of the updated 
 document will be stored
 in the index in the dataDir.
 SolrRemoteSearchable (the searcher for remote access) is started from 
 SolrCore 
 by specifying  remoteSearchertrue/remoteSearcher  in solrconfig.xml.(It 
 is registered in RMI. )
 (-Djava.security.policy should be set when you start VM. )
 Not all of the operational cases are tested 
 because Solr has so many features. 
 Moreover, TestUnit has not been made 
 because I made this through a trial and error process. 
 Some changes are required in Lucene to execute this. 
 I need your comments on this although it might be hard without TestUnit. 
 I especially worry about the followings: 
 - Am I on the right truck about this issue?
 - Is the extent of modifying Lucene tolerable?
 - Are there any ideas to implement this feature without modifying Lucene?
 - Does this idea contribute for improving Solr?
 - This implementation may partially overlap with Multiple Solr Cores.
   What should be done?
 - Are there any other considerations about this issue, which I have 
 overlooked?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-319) changes SynonymFilterFactoryto Analyze synonyms file

2008-05-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596938#action_12596938
 ] 

Otis Gospodnetic commented on SOLR-319:
---

I think patch is ripe for a commit.  Koji, want to commit your own baby? :)


 changes SynonymFilterFactoryto Analyze synonyms file
 --

 Key: SOLR-319
 URL: https://issues.apache.org/jira/browse/SOLR-319
 Project: Solr
  Issue Type: Improvement
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch


 WHAT:
 Currently, SynonymFilterFactory works very well with N-gram tokenizer 
 (CJKTokenizer, for example).
 But we have to take care of the statement in synonyms.txt.
 For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
 C1C2C3 maps to C4C5C6,
 I have to write the rule as follows:
 C1C2 C2C3 = C4C5 C5C6
 But I want to write it C1C2C3=C4C5C6. This patch allows it. It is also 
 helpful for sharing synonyms.txt.
 HOW:
 tokenFactory attribute is added to filter 
 class=solr.SynonymFilterFactory/.
 If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
 to create Tokenizer.
 Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
 synonyms.txt file.
 sample-1: CJKTokenizer
 fieldtype name=text_cjk class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.CJKTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=ngram_synonym_test_ja.txt
   ignoreCase=true expand=true 
 tokenFactory=solr.CJKTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.CJKTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 sample-2: NGramTokenizer
 fieldtype name=text_ngram class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory minGramSize=2 
 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.NGramTokenizerFactory minGramSize=2 
 maxGramSize=2/
 filter class=solr.SynonymFilterFactory 
 synonyms=ngram_synonym_test_ngram.txt
   ignoreCase=true expand=true
   tokenFactory=solr.NGramTokenizerFactory 
 minGramSize=2 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 backward compatibility:
 Yes. If you omit tokenFactory attribute from filter 
 class=solr.SynonymFilterFactory/ tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-319) changes SynonymFilterFactoryto Analyze synonyms file

2008-05-14 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596938#action_12596938
 ] 

otis edited comment on SOLR-319 at 5/14/08 1:51 PM:


I think this patch is ripe for a commit.  Koji, want to commit your own baby? :)


  was (Author: otis):
I think patch is ripe for a commit.  Koji, want to commit your own baby? :)

  
 changes SynonymFilterFactoryto Analyze synonyms file
 --

 Key: SOLR-319
 URL: https://issues.apache.org/jira/browse/SOLR-319
 Project: Solr
  Issue Type: Improvement
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch


 WHAT:
 Currently, SynonymFilterFactory works very well with N-gram tokenizer 
 (CJKTokenizer, for example).
 But we have to take care of the statement in synonyms.txt.
 For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
 C1C2C3 maps to C4C5C6,
 I have to write the rule as follows:
 C1C2 C2C3 = C4C5 C5C6
 But I want to write it C1C2C3=C4C5C6. This patch allows it. It is also 
 helpful for sharing synonyms.txt.
 HOW:
 tokenFactory attribute is added to filter 
 class=solr.SynonymFilterFactory/.
 If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
 to create Tokenizer.
 Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
 synonyms.txt file.
 sample-1: CJKTokenizer
 fieldtype name=text_cjk class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.CJKTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=ngram_synonym_test_ja.txt
   ignoreCase=true expand=true 
 tokenFactory=solr.CJKTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.CJKTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 sample-2: NGramTokenizer
 fieldtype name=text_ngram class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory minGramSize=2 
 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.NGramTokenizerFactory minGramSize=2 
 maxGramSize=2/
 filter class=solr.SynonymFilterFactory 
 synonyms=ngram_synonym_test_ngram.txt
   ignoreCase=true expand=true
   tokenFactory=solr.NGramTokenizerFactory 
 minGramSize=2 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 backward compatibility:
 Yes. If you omit tokenFactory attribute from filter 
 class=solr.SynonymFilterFactory/ tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-470) DateField throws error on iso8601 date

2008-05-14 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-470:
--

Attachment: SOLR-470.patch

builds on the previous patch to include fixes for the formatting issues, code 
clean up, better tests (LegacyDateFieldTest is now standalone, and can be 
droped into Solr1.2 to regress against the old DateField), and better 
documentation ... addresses  SOLR-552 and SOLR-544 as well.


I think this is ready to commit, but i'd like to see some positive feedback 
before proceeding -- both on the implementation, and the documentation changes. 
(something approximating the class docs for LegacyDateField should show up in 
CHANGES.txt's  upgrading section as well).

 DateField throws error on iso8601 date
 --

 Key: SOLR-470
 URL: https://issues.apache.org/jira/browse/SOLR-470
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: SOLR-470.patch, SOLR-470.patch, SOLR-470.patch, 
 SOLR-470.patch


 A correct iso 8601 date 2006-01-01T12:01:00Z throws an error.
 Unparseable date: 2006-01-01T12:01:00Z at 
 org.apache.solr.schema.DateField.toObject(DateField.java:173) at 
 org.apache.solr.schema.DateField.toObject(DateField.java:83)
 The ThreadLocalDateFormat requires fractional seconds 
 -MM-dd'T'HH:mm:ss.SSS
 to parse with simple date format. Where as the jdoc states their optional.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-319) changes SynonymFilterFactoryto Analyze synonyms file

2008-05-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596989#action_12596989
 ] 

Koji Sekiguchi commented on SOLR-319:
-

Thanks, Otis. I will commit this in a week if there is no objection.
BTW, I cannot assign myself on JIRA... looks like I have no permission?

 changes SynonymFilterFactoryto Analyze synonyms file
 --

 Key: SOLR-319
 URL: https://issues.apache.org/jira/browse/SOLR-319
 Project: Solr
  Issue Type: Improvement
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-319.patch, SOLR-319.patch, SOLR-319.patch


 WHAT:
 Currently, SynonymFilterFactory works very well with N-gram tokenizer 
 (CJKTokenizer, for example).
 But we have to take care of the statement in synonyms.txt.
 For example, if I use CJKTokenizer (work as bi-gram for CJK chars) and want 
 C1C2C3 maps to C4C5C6,
 I have to write the rule as follows:
 C1C2 C2C3 = C4C5 C5C6
 But I want to write it C1C2C3=C4C5C6. This patch allows it. It is also 
 helpful for sharing synonyms.txt.
 HOW:
 tokenFactory attribute is added to filter 
 class=solr.SynonymFilterFactory/.
 If the attribute is specified, SynonymFilterFactory uses the TokenizerFactory 
 to create Tokenizer.
 Then SynonymFilterFactory uses the Tokenizer to get tokens from the rules in 
 synonyms.txt file.
 sample-1: CJKTokenizer
 fieldtype name=text_cjk class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.CJKTokenizerFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=ngram_synonym_test_ja.txt
   ignoreCase=true expand=true 
 tokenFactory=solr.CJKTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.CJKTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 sample-2: NGramTokenizer
 fieldtype name=text_ngram class=solr.TextField 
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory minGramSize=2 
 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.NGramTokenizerFactory minGramSize=2 
 maxGramSize=2/
 filter class=solr.SynonymFilterFactory 
 synonyms=ngram_synonym_test_ngram.txt
   ignoreCase=true expand=true
   tokenFactory=solr.NGramTokenizerFactory 
 minGramSize=2 maxGramSize=2/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 backward compatibility:
 Yes. If you omit tokenFactory attribute from filter 
 class=solr.SynonymFilterFactory/ tag, it works as usual.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Adding Koji to JIRA

2008-05-14 Thread Otis Gospodnetic
I *think* Yonik or Hoss have to manually give you JIRA privileges...


Otis 
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Forwarded Message 
 From: Koji Sekiguchi (JIRA) [EMAIL PROTECTED]

 Koji Sekiguchi commented on SOLR-319:
 -
 
 Thanks, Otis. I will commit this in a week if there is no objection.
 BTW, I cannot assign myself on JIRA... looks like I have no permission?


[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-05-14 Thread Lars Kotthoff (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12596999#action_12596999
 ] 

Lars Kotthoff commented on SOLR-556:


I've applied SOLR-553 and confirmed that this problem is not fixed, regardless 
of the setting of usePhraseHighlighter.

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-470) DateField throws error on iso8601 date

2008-05-14 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597006#action_12597006
 ] 

Noble Paul commented on SOLR-470:
-

The patch did not apply properly on trunk. SVN cound not fetc the given version 
of LegacyDateField

The implementation looks fine

 DateField throws error on iso8601 date
 --

 Key: SOLR-470
 URL: https://issues.apache.org/jira/browse/SOLR-470
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: patrick o'leary
Assignee: Hoss Man
 Fix For: 1.3

 Attachments: SOLR-470.patch, SOLR-470.patch, SOLR-470.patch, 
 SOLR-470.patch


 A correct iso 8601 date 2006-01-01T12:01:00Z throws an error.
 Unparseable date: 2006-01-01T12:01:00Z at 
 org.apache.solr.schema.DateField.toObject(DateField.java:173) at 
 org.apache.solr.schema.DateField.toObject(DateField.java:83)
 The ThreadLocalDateFormat requires fractional seconds 
 -MM-dd'T'HH:mm:ss.SSS
 to parse with simple date format. Where as the jdoc states their optional.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values

2008-05-14 Thread Lars Kotthoff (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Kotthoff updated SOLR-556:
---

Attachment: solr-highlight-multivalued-example.xml

Attaching test file with example document, relevant part of schema.xml and 
example query.

 Highlighting of multi-valued fields returns snippets which span multiple 
 different values
 -

 Key: SOLR-556
 URL: https://issues.apache.org/jira/browse/SOLR-556
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.3
 Environment: Tomcat 5.5
Reporter: Lars Kotthoff
Priority: Minor
 Attachments: solr-highlight-multivalued-example.xml, 
 solr-highlight-multivalued.patch


 When highlighting multi-valued fields, the highlighter sometimes returns 
 snippets which span multiple values, e.g. with values foo and bar and 
 search term ba the highlighter will create the snippet fooemba/emr. 
 Furthermore it sometimes returns smaller snippets than it should, e.g. with 
 value foobar and search term oo it will create the snippet emoo/em 
 regardless of hl.fragsize.
 I have been unable to determine the real cause for this, or indeed what 
 actually goes on at all. To reproduce the problem, I've used the following 
 steps:
 * create an index with multi-valued fields, one document should have at least 
 3 values for these fields (in my case strings of length between 5 and 15 
 Japanese characters -- as far as I can tell plain old ASCII should produce 
 the same effect though)
 * search for part of a value in such a field with highlighting enabled, the 
 additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, 
 hl.mergeContiguous=true (changing the parameters does not seem to have any 
 effect on the result though)
 * highlighted snippets should show effects described above

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.