[jira] Assigned: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-10 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-1213:
---

Assignee: Doron Cohen

 MultiFieldQueryParser ignores slop parameter
 

 Key: LUCENE-1213
 URL: https://issues.apache.org/jira/browse/LUCENE-1213
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Reporter: Trejkaz
Assignee: Doron Cohen
 Attachments: multifield-fix.patch


 MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
 super.getFieldQuery(String, String), thus obliterating any slop parameter 
 present in the query.
 It should probably be changed to call super.getFieldQuery(String, String, 
 int), except doing only that will result in a recursive loop which is a 
 side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
 getFieldQuery(String, String, int) is documented as delegating to 
 getFieldQuery(String, String), yet what it actually does is the exact 
 opposite.  This also causes problems for subclasses which need to override 
 getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter

2008-03-10 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-1213:


Attachment: multifield-fix.patch

Trekaj thanks for the patch. 

Attached a slightly compacted fix (refactoring slop-applying to a separate 
method).
Also added a test that fails without this fix.

All tests pass, if there are no comments I will commit this in a day or two.

 MultiFieldQueryParser ignores slop parameter
 

 Key: LUCENE-1213
 URL: https://issues.apache.org/jira/browse/LUCENE-1213
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Reporter: Trejkaz
Assignee: Doron Cohen
 Attachments: multifield-fix.patch, multifield-fix.patch


 MultiFieldQueryParser.getFieldQuery(String, String, int) calls 
 super.getFieldQuery(String, String), thus obliterating any slop parameter 
 present in the query.
 It should probably be changed to call super.getFieldQuery(String, String, 
 int), except doing only that will result in a recursive loop which is a 
 side-effect of what may be a deeper problem in MultiFieldQueryParser -- 
 getFieldQuery(String, String, int) is documented as delegating to 
 getFieldQuery(String, String), yet what it actually does is the exact 
 opposite.  This also causes problems for subclasses which need to override 
 getFieldQuery(String, String) to provide different behaviour.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

2008-03-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576927#action_12576927
 ] 

Michael McCandless commented on LUCENE-1210:


Yes, I agree.  At some point soon we should do a 2.3.2 point release, and I'll 
port this issue back for that.

 IndexWriter  ConcurrentMergeScheduler deadlock case if starting a merge hits 
 an exception
 --

 Key: LUCENE-1210
 URL: https://issues.apache.org/jira/browse/LUCENE-1210
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


 If you're using CMS (the default) and mergeInit hits an exception (eg
 OOME), we are not properly clearing IndexWriter's internal tracking of
 running merges.  This causes IW.close() to hang while it incorrectly
 waits for these non-started merges to finish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576941#action_12576941
 ] 

Michael McCandless commented on LUCENE-1208:


Agreed.  I'm thinking these issues should be ported to 2.3.2:

  LUCENE-1191
  LUCENE-1197
  LUCENE-1198
  LUCENE-1199
  LUCENE-1200
  LUCENE-1208 (this issue)
  LUCENE-1210



 Deadlock case in IndexWriter on exception just before flush
 ---

 Key: LUCENE-1208
 URL: https://issues.apache.org/jira/browse/LUCENE-1208
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4

 Attachments: LUCENE-1208.patch


 If a document hits a non-aborting exception, eg something goes wrong
 in tokenStream.next(), and, that document had triggered a flush
 (due to RAM or doc count) then DocumentsWriter will deadlock because
 that thread marks the flush as pending but fails to clear it on
 exception.
 I have a simple test case showing this, and a fix fixing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1210) IndexWriter ConcurrentMergeScheduler deadlock case if starting a merge hits an exception

2008-03-10 Thread Michele Bini (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576924#action_12576924
 ] 

Michele Bini commented on LUCENE-1210:
--

Uhm, shouldn't the patch be committed in the 2.3 branch, too, as it affects 
2.3.1?

 IndexWriter  ConcurrentMergeScheduler deadlock case if starting a merge hits 
 an exception
 --

 Key: LUCENE-1210
 URL: https://issues.apache.org/jira/browse/LUCENE-1210
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4


 If you're using CMS (the default) and mergeInit hits an exception (eg
 OOME), we are not properly clearing IndexWriter's internal tracking of
 running merges.  This causes IW.close() to hang while it incorrectly
 waits for these non-started merges to finish.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-10 Thread Michele Bini (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576933#action_12576933
 ] 

Michele Bini commented on LUCENE-1208:
--

As with LUCENE-1210, shouldn't the patch be committed in the 2.3 branch, too, 
as it affects 2.3.1? Other issues, such as the speedups in LUCENE-1211, 
although useful, can be left out as they are not bugs. But fix for deadlocks 
seem worthwhile for 2.3.x, too.

 Deadlock case in IndexWriter on exception just before flush
 ---

 Key: LUCENE-1208
 URL: https://issues.apache.org/jira/browse/LUCENE-1208
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4

 Attachments: LUCENE-1208.patch


 If a document hits a non-aborting exception, eg something goes wrong
 in tokenStream.next(), and, that document had triggered a flush
 (due to RAM or doc count) then DocumentsWriter will deadlock because
 that thread marks the flush as pending but fails to clear it on
 exception.
 I have a simple test case showing this, and a fix fixing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1026) Provide a simple way to concurrently access a Lucene index from multiple threads

2008-03-10 Thread Mark Miller

You make a good point. I think I will prob make this change.

Asgeir Frimannsson (JIRA) wrote:

 [ 
https://issues.apache.org/jira/browse/LUCENE-1026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576858#action_12576858
 ]

Asgeir Frimannsson commented on LUCENE-1026:


Is there any specific reason why this indexaccessor is limited to FSDirectory 
based indexes? I see FSDirectory.getFile() is used as a unique key in the list 
of IndexAccessors in the factory. However, it seems more natural to use 
dir.getLockID() for this purpose. Then it would be possible to use a generic 
Directory rather than the file-system specific FSDirectory.

   

Provide a simple way to concurrently access a Lucene index from multiple threads


 Key: LUCENE-1026
 URL: https://issues.apache.org/jira/browse/LUCENE-1026
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index, Search
Reporter: Mark Miller
Priority: Minor
 Attachments: DefaultIndexAccessor.java, 
DefaultMultiIndexAccessor.java, IndexAccessor-02.04.2008.zip, 
IndexAccessor-02.07.2008.zip, IndexAccessor-02.28.2008.zip, 
IndexAccessor-1.26.2008.zip, IndexAccessor-2.15.2008.zip, IndexAccessor.java, 
IndexAccessor.zip, IndexAccessorFactory.java, MultiIndexAccessor.java, 
shai-IndexAccessor-2.zip, shai-IndexAccessor.zip, shai-IndexAccessor3.zip, 
SimpleSearchServer.java, StopWatch.java, TestIndexAccessor.java


For building interactive indexes accessed through a network/internet (multiple 
threads).
This builds upon the LuceneIndexAccessor patch. That patch was not very newbie 
friendly and did not properly handle MultiSearchers (or at the least made it 
easy to get into trouble).
This patch simplifies things and provides out of the box support for sharing 
the IndexAccessors across threads. There is also a simple test class and 
example SearchServer to get you started.
Future revisions will be zipped.
Works pretty solid as is, but could use the ability to warm new Searchers.
 


   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support PhraseQuery, SpanQuery, ConstantScoreRangeQuery

2008-03-10 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: SpanHighlighter-02-10-2008.patch

Another attempt at putting this to bed.

Added the MultiPhraseQuery support patch above - thanks!
Updated some code to stop using deprecated methods.
Made highlighting ConstantScoreRangeQuerys optional, defaulting to false.

- Mark

 Extend contrib Highlighter to properly support PhraseQuery, SpanQuery,  
 ConstantScoreRangeQuery
 ---

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: MultiPhraseQueryExtraction.patch, 
 SpanHighlighter-01-26-2008.patch, SpanHighlighter-01-28-2008.patch, 
 SpanHighlighter-02-10-2008.patch, spanhighlighter.patch, 
 spanhighlighter10.patch, spanhighlighter11.patch, spanhighlighter12.patch, 
 spanhighlighter2.patch, spanhighlighter3.patch, spanhighlighter5.patch, 
 spanhighlighter6.patch, spanhighlighter7.patch, spanhighlighter8.patch, 
 spanhighlighter9.patch, spanhighlighter_24_January_2008.patch, 
 spanhighlighter_patch_4.zip


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys, PhraseQuery, and  ConstantScoreRangeQuery. New Query types are 
 easy to add. There is also a new Fragmenter that attempts to fragment without 
 breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1214) Possible hidden exception on SegmentInfos commit

2008-03-10 Thread Mark Miller (JIRA)
Possible hidden exception on SegmentInfos commit


 Key: LUCENE-1214
 URL: https://issues.apache.org/jira/browse/LUCENE-1214
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1
Reporter: Mark Miller
Priority: Trivial


I am not sure if this is that big of a deal, but I just ran into it and thought 
I might mention it.

SegmentInfos.commit removes the Segments File if it hits an exception. If it 
cannot remove the Segments file (because its not there or on Windows something 
has a hold of it), another Exception is thrown about not being able to delete 
the Segments file. Because of this, you lose the first exception, which might 
have useful info, including why the segments file might not be there to delete.

- Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-1214) Possible hidden exception on SegmentInfos commit

2008-03-10 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1214:
--

Assignee: Michael McCandless

 Possible hidden exception on SegmentInfos commit
 

 Key: LUCENE-1214
 URL: https://issues.apache.org/jira/browse/LUCENE-1214
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Trivial

 I am not sure if this is that big of a deal, but I just ran into it and 
 thought I might mention it.
 SegmentInfos.commit removes the Segments File if it hits an exception. If it 
 cannot remove the Segments file (because its not there or on Windows 
 something has a hold of it), another Exception is thrown about not being able 
 to delete the Segments file. Because of this, you lose the first exception, 
 which might have useful info, including why the segments file might not be 
 there to delete.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1214) Possible hidden exception on SegmentInfos commit

2008-03-10 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576970#action_12576970
 ] 

Michael McCandless commented on LUCENE-1214:


Good catch Mark.  It seems like we should ignore any exception while trying to 
delete the partially written segments_N file, and throw the original exception. 
 I'll do that.

How did you hit these two exceptions?

 Possible hidden exception on SegmentInfos commit
 

 Key: LUCENE-1214
 URL: https://issues.apache.org/jira/browse/LUCENE-1214
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1
Reporter: Mark Miller
Priority: Trivial

 I am not sure if this is that big of a deal, but I just ran into it and 
 thought I might mention it.
 SegmentInfos.commit removes the Segments File if it hits an exception. If it 
 cannot remove the Segments file (because its not there or on Windows 
 something has a hold of it), another Exception is thrown about not being able 
 to delete the Segments file. Because of this, you lose the first exception, 
 which might have useful info, including why the segments file might not be 
 there to delete.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread Doron Cohen
On Thu, Jan 17, 2008 at 4:01 PM, DM Smith [EMAIL PROTECTED] wrote:


 On Jan 17, 2008, at 1:38 AM, Chris Hostetter wrote:

  : I'd like to recommend that 3.0 contain the new Java 5 API changes
  and what it
  : replaces be marked deprecated. 3.0 would also remove what was
  deprecated in
  : 2.9. Then in 3.1 we remove the deprecations.
 
  FWIW: This would violate the compatibility requirements, since code
  that
  compiles against 3.0 (with deprecation warnings) wouldn't compile
  against
  3.1 -- but then again: there has been some mention of revisting the
  entire
  back compatibility commitments of Lucene, and now certainly seems
  like the time
  to discuss that before too much work is done in any particular
  direction
  in an attempt to head towards 2.9/3.0.

 Any way that it goes, my point is that it needs to be a two step
 process. The additional step needs to address the language differences.

 Maybe after 2.9, we add 2.9.5 (or whatever) that introduces the Java 5
 APIs, with appropriate deprecations. 2.9.5 would require Java 1.5.


Since going to Java 5 is a major change, I think it is not too wild to
go from 3.0 straight to 4.0..?  Main (and perhaps only) change would be
moving to Java 5. This way we don't break any back.comp requirements.


[jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-10 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577073#action_12577073
 ] 

Michael Busch commented on LUCENE-1208:
---

We had seen this deadlock problem in our tests. I reran all tests with Lucene 
2.3.1 + LUCENE-1208 and didn't see the problem again so far!

 Deadlock case in IndexWriter on exception just before flush
 ---

 Key: LUCENE-1208
 URL: https://issues.apache.org/jira/browse/LUCENE-1208
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3, 2.3.1
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.4

 Attachments: LUCENE-1208.patch


 If a document hits a non-aborting exception, eg something goes wrong
 in tokenStream.next(), and, that document had triggered a flush
 (due to RAM or doc count) then DocumentsWriter will deadlock because
 that thread marks the flush as pending but fails to clear it on
 exception.
 I have a simple test case showing this, and a fix fixing it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-10 Thread Michael Busch
Michael McCandless (JIRA) wrote:
 [ 
 https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576941#action_12576941
  ] 
 
 Michael McCandless commented on LUCENE-1208:
 
 
 Agreed.  I'm thinking these issues should be ported to 2.3.2:
 
   LUCENE-1191
   LUCENE-1197
   LUCENE-1198
   LUCENE-1199
   LUCENE-1200
   LUCENE-1208 (this issue)
   LUCENE-1210
 

+1

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush

2008-03-10 Thread Michael McCandless


OK I'll backport.

Mike

Michael Busch wrote:


Michael McCandless (JIRA) wrote:
[ https://issues.apache.org/jira/browse/LUCENE-1208? 
page=com.atlassian.jira.plugin.system.issuetabpanels:comment- 
tabpanelfocusedCommentId=12576941#action_12576941 ]


Michael McCandless commented on LUCENE-1208:


Agreed.  I'm thinking these issues should be ported to 2.3.2:

  LUCENE-1191
  LUCENE-1197
  LUCENE-1198
  LUCENE-1199
  LUCENE-1200
  LUCENE-1208 (this issue)
  LUCENE-1210



+1

-Michael

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread Grant Ingersoll
We voted to make 3.0 Java 1.5, full well knowing that it will break  
the back compat. requirements.  I don't see the point of postponing it  
or dragging it out.



On Mar 10, 2008, at 12:02 PM, Doron Cohen wrote:

On Thu, Jan 17, 2008 at 4:01 PM, DM Smith [EMAIL PROTECTED]  
wrote:




On Jan 17, 2008, at 1:38 AM, Chris Hostetter wrote:


: I'd like to recommend that 3.0 contain the new Java 5 API changes
and what it
: replaces be marked deprecated. 3.0 would also remove what was
deprecated in
: 2.9. Then in 3.1 we remove the deprecations.

FWIW: This would violate the compatibility requirements, since code
that
compiles against 3.0 (with deprecation warnings) wouldn't compile
against
3.1 -- but then again: there has been some mention of revisting the
entire
back compatibility commitments of Lucene, and now certainly seems
like the time
to discuss that before too much work is done in any particular
direction
in an attempt to head towards 2.9/3.0.


Any way that it goes, my point is that it needs to be a two step
process. The additional step needs to address the language  
differences.


Maybe after 2.9, we add 2.9.5 (or whatever) that introduces the  
Java 5

APIs, with appropriate deprecations. 2.9.5 would require Java 1.5.



Since going to Java 5 is a major change, I think it is not too wild to
go from 3.0 straight to 4.0..?  Main (and perhaps only) change would  
be

moving to Java 5. This way we don't break any back.comp requirements.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread DM Smith

Grant Ingersoll wrote:
We voted to make 3.0 Java 1.5, full well knowing that it will break 
the back compat. requirements.  I don't see the point of postponing it 
or dragging it out.


I thought his suggestion was to skip 3.0 as a designator and instead use 
4.0. If so, the schedule would not change.





On Mar 10, 2008, at 12:02 PM, Doron Cohen wrote:


On Thu, Jan 17, 2008 at 4:01 PM, DM Smith [EMAIL PROTECTED] wrote:



On Jan 17, 2008, at 1:38 AM, Chris Hostetter wrote:


: I'd like to recommend that 3.0 contain the new Java 5 API changes
and what it
: replaces be marked deprecated. 3.0 would also remove what was
deprecated in
: 2.9. Then in 3.1 we remove the deprecations.

FWIW: This would violate the compatibility requirements, since code
that
compiles against 3.0 (with deprecation warnings) wouldn't compile
against
3.1 -- but then again: there has been some mention of revisting the
entire
back compatibility commitments of Lucene, and now certainly seems
like the time
to discuss that before too much work is done in any particular
direction
in an attempt to head towards 2.9/3.0.


Any way that it goes, my point is that it needs to be a two step
process. The additional step needs to address the language differences.

Maybe after 2.9, we add 2.9.5 (or whatever) that introduces the Java 5
APIs, with appropriate deprecations. 2.9.5 would require Java 1.5.



Since going to Java 5 is a major change, I think it is not too wild to
go from 3.0 straight to 4.0..?  Main (and perhaps only) change would be
moving to Java 5. This way we don't break any back.comp requirements.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



How to add a jar to a contrib build.xml

2008-03-10 Thread fsanchez
Hi all, 

perhaps this is a simple question, but I don't know how to do it.

I'm developing on a new contrib subfolder. My development needs to use
classes in another contrib subfolder. How do I add the corresponding JAR
to the build.xml file?

thanks in advance.

-- 
Felipe 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread Doron Cohen
On Mon, Mar 10, 2008 at 9:21 PM, DM Smith [EMAIL PROTECTED] wrote:

 Grant Ingersoll wrote:
  We voted to make 3.0 Java 1.5, full well knowing that it will break
  the back compat. requirements.  I don't see the point of postponing it
  or dragging it out.

 I thought his suggestion was to skip 3.0 as a designator and instead use
 4.0. If so, the schedule would not change.


Right, that's what I meant:
  * 2.9 with deprecations,
  * 3.0 removing deprecated stuff but still Java 1.4,
  * 4.0 first Java 5 version
But I am catching up now a looong list of discussions and missed
this vote, so I am ok with taking this back and proceed as voted.
- Doron


Re: How to add a jar to a contrib build.xml

2008-03-10 Thread Mark Miller
Here is how the span highlighter I have been working on uses the Memory 
contrib (I think I copied this from another contrib that has a dependency):


?xml version=1.0?

project name=highlighter default=buildHighlighter

description
Hits highlighter
/description

import file=../contrib-build.xml/

property name=memory.jar 
location=../../build/contrib/memory/lucene-memory-${version}.jar/


path id=classpath
pathelement path=${lucene.jar}/
pathelement path=${memory.jar}/
pathelement path=${project.classpath}/
/path


target name=buildHighlighter depends=buildMemory,default /

target name=buildMemory 
echoHighlighter building dependency ${memory.jar}/echo
ant antfile=../memory/build.xml target=default inheritall=false/
/target


/project

[EMAIL PROTECTED] wrote:

Hi all,

perhaps this is a simple question, but I don't know how to do it.

I'm developing on a new contrib subfolder. My development needs to use
classes in another contrib subfolder. How do I add the corresponding JAR
to the build.xml file?

thanks in advance.

   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread Grant Ingersoll
All it takes is one line in the announcement saying Version 3.0 uses  
Java 1.5  I don't think the significance will be lost on anyone.   
Everyone knows what Java 1.5 is.  I'm -1 on calling it 4.0.  People  
will then ask where is 3.0.  I am +1 for sticking w/ the plan we voted  
for as described on http://wiki.apache.org/lucene-java/Java_1%2e5_Migration 
 (last edited 10/1/2007)  It's not like we are springing this on  
anyone.  In fact, I'd be more than happy to announce it on the user  
list to let people know ahead of time.





On Mar 10, 2008, at 3:52 PM, Doron Cohen wrote:

On Mon, Mar 10, 2008 at 9:21 PM, DM Smith [EMAIL PROTECTED]  
wrote:



Grant Ingersoll wrote:

We voted to make 3.0 Java 1.5, full well knowing that it will break
the back compat. requirements.  I don't see the point of  
postponing it

or dragging it out.


I thought his suggestion was to skip 3.0 as a designator and  
instead use

4.0. If so, the schedule would not change.



Right, that's what I meant:
 * 2.9 with deprecations,
 * 3.0 removing deprecated stuff but still Java 1.4,
 * 4.0 first Java 5 version
But I am catching up now a looong list of discussions and missed
this vote, so I am ok with taking this back and proceed as voted.
- Doron




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread DM Smith

Grant Ingersoll wrote:
All it takes is one line in the announcement saying Version 3.0 uses 
Java 1.5  I don't think the significance will be lost on anyone.  
Everyone knows what Java 1.5 is.  I'm -1 on calling it 4.0.  People 
will then ask where is 3.0.  I am +1 for sticking w/ the plan we voted 
for as described on 
http://wiki.apache.org/lucene-java/Java_1%2e5_Migration (last edited 
10/1/2007)  It's not like we are springing this on anyone.  In fact, 
I'd be more than happy to announce it on the user list to let people 
know ahead of time.


I'm fine with the plan as far as I understand it, but can you clarify 
something for me?


While 3.0 won't be backward compatible in that it requires Java 5.0, 
will it be otherwise backward compatible? That is, if I compile with 
2.9, eliminate all deprecations and use Java 5, can I drop 3.0 in and 
expect it to work without any further changes?


I think that is what I am reading wrt the plan.

DM


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-584) Decouple Filter from BitSet

2008-03-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577207#action_12577207
 ] 

Mark Miller commented on LUCENE-584:


I think there is still an issue here. The code below just broke for me.

java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast 
to java.util.BitSet
at 
org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152)
at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49)

{code}
  public void testChainedCachedQueryFilter() throws IOException, ParseException 
{
String path = c:/TestIndex;
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(path, analyzer, true);

Document doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the horrific girl, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the careful bad fox, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);

writer.addDocument(doc);

Searcher searcher = null;

searcher = new IndexSearcher(path);

QueryParser qp = new QueryParser(field, new KeywordAnalyzer());
Query query = qp.parse(content:fox);
QueryWrapperFilter queryFilter = new QueryWrapperFilter(query);
CachingWrapperFilter cwf = new CachingWrapperFilter(queryFilter);

TopDocs hits = searcher.search(query, cwf, 1);
System.out.println(hits: + hits.totalHits);

queryFilter = new QueryWrapperFilter(qp.parse(category:red));
CachingWrapperFilter fcwf = new CachingWrapperFilter(queryFilter);
Filter[] chain = new Filter[2];
chain[0] = cwf;
chain[1] = fcwf;
ChainedFilter cf = new ChainedFilter(chain, ChainedFilter.AND);

hits = searcher.search(new MatchAllDocsQuery(), cf, 1);

System.out.println(red: + hits.totalHits);

queryFilter = new QueryWrapperFilter(qp.parse(category:blue));
CachingWrapperFilter fbcwf = new CachingWrapperFilter(queryFilter);
chain = new Filter[2];
chain[0] = cwf;
chain[1] = fbcwf;
cf = new ChainedFilter(chain, ChainedFilter.AND);

hits = searcher.search(new MatchAllDocsQuery(), cf, 1);

System.out.println(blue: + hits.totalHits);

  }

{code}



 Decouple Filter from BitSet
 ---

 Key: LUCENE-584
 URL: https://issues.apache.org/jira/browse/LUCENE-584
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Affects Versions: 2.1
Reporter: Peter Schäfer
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.4

 Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, 
 ContribQueries20080111.patch, lucene-584-take2.patch, 
 lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, 
 lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, 
 lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, 
 Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, 
 Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch


 {code}
 package org.apache.lucene.search;
 public abstract class Filter implements java.io.Serializable 
 {
   public abstract AbstractBitSet bits(IndexReader reader) throws IOException;
 }
 public interface AbstractBitSet 
 {
   public boolean get(int index);
 }
 {code}
 It would be useful if the method =Filter.bits()= returned an abstract 
 interface, instead of =java.util.BitSet=.
 Use case: there is a very large index, and, depending on the user's 
 privileges, only a small portion of the index is actually visible.
 Sparsely populated =java.util.BitSet=s are not efficient and waste lots of 
 memory. It would be desirable to have an alternative BitSet implementation 
 with smaller memory footprint.
 Though it _is_ possibly to derive classes from =java.util.BitSet=, it was 
 obviously not designed for that purpose.
 That's why I propose to use an interface instead. The 

[jira] Issue Comment Edited: (LUCENE-584) Decouple Filter from BitSet

2008-03-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577207#action_12577207
 ] 

[EMAIL PROTECTED] edited comment on LUCENE-584 at 3/10/08 2:48 PM:
-

I think there is still an issue here. The code below just broke for me.

java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast 
to java.util.BitSet
at 
org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152)
at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49)

{code}
  public void testChainedCachedQueryFilter() throws IOException, ParseException 
{
String path = c:/TestIndex;
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(path, analyzer, true);

Document doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the horrific girl, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the careful bad fox, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);

writer.close();

Searcher searcher = null;

searcher = new IndexSearcher(path);

QueryParser qp = new QueryParser(field, new KeywordAnalyzer());
Query query = qp.parse(content:fox);
QueryWrapperFilter queryFilter = new QueryWrapperFilter(query);
CachingWrapperFilter cwf = new CachingWrapperFilter(queryFilter);

TopDocs hits = searcher.search(query, cwf, 1);
System.out.println(hits: + hits.totalHits);

queryFilter = new QueryWrapperFilter(qp.parse(category:red));
CachingWrapperFilter fcwf = new CachingWrapperFilter(queryFilter);
Filter[] chain = new Filter[2];
chain[0] = cwf;
chain[1] = fcwf;
ChainedFilter cf = new ChainedFilter(chain, ChainedFilter.AND);

hits = searcher.search(new MatchAllDocsQuery(), cf, 1);

System.out.println(red: + hits.totalHits);

queryFilter = new QueryWrapperFilter(qp.parse(category:blue));
CachingWrapperFilter fbcwf = new CachingWrapperFilter(queryFilter);
chain = new Filter[2];
chain[0] = cwf;
chain[1] = fbcwf;
cf = new ChainedFilter(chain, ChainedFilter.AND);

hits = searcher.search(new MatchAllDocsQuery(), cf, 1);

System.out.println(blue: + hits.totalHits);

  }

{code}



  was (Author: [EMAIL PROTECTED]):
I think there is still an issue here. The code below just broke for me.

java.lang.ClassCastException: org.apache.lucene.util.OpenBitSet cannot be cast 
to java.util.BitSet
at 
org.apache.lucene.search.CachingWrapperFilter.bits(CachingWrapperFilter.java:55)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:177)
at org.apache.lucene.misc.ChainedFilter.bits(ChainedFilter.java:152)
at org.apache.lucene.search.Filter.getDocIdSet(Filter.java:49)

{code}
  public void testChainedCachedQueryFilter() throws IOException, ParseException 
{
String path = c:/TestIndex;
Analyzer analyzer = new WhitespaceAnalyzer();
IndexWriter writer = new IndexWriter(path, analyzer, true);

Document doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad fox, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the big bad pig, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, red, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the horrific girl, Store.NO, 
Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the dirty boy, Store.NO, Index.TOKENIZED));
writer.addDocument(doc);
doc = new Document();
doc.add(new Field(category, blue, Store.YES, Index.TOKENIZED));
doc.add(new Field(content, the careful bad fox, 

[jira] Commented: (LUCENE-1214) Possible hidden exception on SegmentInfos commit

2008-03-10 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577220#action_12577220
 ] 

Mark Miller commented on LUCENE-1214:
-

I am still trying to work that out...some craziness that started after I 
updated Lucene to trunk, but also made other fundamental changes, and windows 
vista may be haunting me too...

The gist of it is that Lucene is failing when it tries to create an index file 
(creates the directory fine). I don't think its Lucene related at the moment, 
but I havnt gotten to the bottom of it either.

Oddly, if I stop using the NoLockFactory (I manually manage a single Writer), 
things work...still digging though.

 Possible hidden exception on SegmentInfos commit
 

 Key: LUCENE-1214
 URL: https://issues.apache.org/jira/browse/LUCENE-1214
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.3.1
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Trivial
 Attachments: LUCENE-1214.patch


 I am not sure if this is that big of a deal, but I just ran into it and 
 thought I might mention it.
 SegmentInfos.commit removes the Segments File if it hits an exception. If it 
 cannot remove the Segments file (because its not there or on Windows 
 something has a hold of it), another Exception is thrown about not being able 
 to delete the Segments file. Because of this, you lose the first exception, 
 which might have useful info, including why the segments file might not be 
 there to delete.
 - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Going to Java 5. Was: Re: A bit of planning

2008-03-10 Thread Chris Hostetter
: I'm fine with the plan as far as I understand it, but can you clarify
: something for me?
: 
: While 3.0 won't be backward compatible in that it requires Java 5.0, will it
: be otherwise backward compatible? That is, if I compile with 2.9, eliminate
: all deprecations and use Java 5, can I drop 3.0 in and expect it to work
: without any further changes?

I think that point is still up in the air, and will depend largely on what 
type of APIs start shapping up for 3.0.  I suspect that when the time 
comes, 2.9 may contain deprecations that refer forward to APIs that will 
be availbale in 3.0, but won't exist in 2.9 ... so true drop in 
compatibility may not be possible.

Then again: the main reason i suspect that is that i'm anticipating APIs 
that use generics.  i know that some weird things happen with generics and 
bytecode, so it may actually be possible to intruduce non generic 
(non-typesafe) versions of those APIs in 2.9 that people can compile 
against that will be bytecode compatible with 3.0 -- i'm not sure.

(similar questions may come up with enum's and other misc langauge 
features however)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1215) Support of Unicode Collation

2008-03-10 Thread Hiroaki Kawai (JIRA)
Support of Unicode Collation


 Key: LUCENE-1215
 URL: https://issues.apache.org/jira/browse/LUCENE-1215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Analysis
Reporter: Hiroaki Kawai
 Attachments: NormalizerTokenFilter.java

New in java 6, we have java.text.Normalizer that supports Unicode Standard 
Annex #15 normalization.
http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
http://www.unicode.org/unicode/reports/tr15/

The normalization defined has four variants of C, D, KC, KD. Canonical 
Decomposition or Compatibility Decomposition will be normalize the 
representation of a String, and the search result will be improved.

I'd like to submit a TokenFilter code supporting this feature! :-)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1215) Support of Unicode Collation

2008-03-10 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1215:
--

Attachment: NormalizerTokenFilter.java

 Support of Unicode Collation
 

 Key: LUCENE-1215
 URL: https://issues.apache.org/jira/browse/LUCENE-1215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Analysis
Reporter: Hiroaki Kawai
 Attachments: NormalizerTokenFilter.java


 New in java 6, we have java.text.Normalizer that supports Unicode Standard 
 Annex #15 normalization.
 http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
 http://www.unicode.org/unicode/reports/tr15/
 The normalization defined has four variants of C, D, KC, KD. Canonical 
 Decomposition or Compatibility Decomposition will be normalize the 
 representation of a String, and the search result will be improved.
 I'd like to submit a TokenFilter code supporting this feature! :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1032) CJKAnalyzer should convert half width katakana to full width katakana

2008-03-10 Thread Hiroaki Kawai (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577309#action_12577309
 ] 

Hiroaki Kawai commented on LUCENE-1032:
---

I think this feature should merged to 
https://issues.apache.org/jira/browse/LUCENE-1215

Unicode compatibility decomposition will fix this issue. :-)


 CJKAnalyzer should convert half width katakana to full width katakana
 -

 Key: LUCENE-1032
 URL: https://issues.apache.org/jira/browse/LUCENE-1032
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 2.0.0
Reporter: Andrew Lynch

 Some of our Japanese customers are reporting errors when performing searches 
 using half width characters.
 The desired behavior is that a document containing half width characters 
 should be returned when performing a search using full width equivalents or 
 when searching by the half width character itself.
 Currently, a search will not return any matches for half width characters.
 Here is a test case outlining desired behavior (this may require a new 
 Analyzer).
 {code}
 public class TestJapaneseEncodings extends TestCase
 {
 byte[] fullWidthKa = new byte[]{(byte) 0xE3, (byte) 0x82, (byte) 0xAB};
 byte[] halfWidthKa = new byte[]{(byte) 0xEF, (byte) 0xBD, (byte) 0xB6};
 public void testAnalyzerWithHalfWidth() throws IOException
 {
 Reader r1 = new StringReader(makeHalfWidthKa());
 TokenStream stream = new CJKAnalyzer().tokenStream(foo, r1);
 assertNotNull(stream);
 Token token = stream.next();
 assertNotNull(token);
 assertEquals(makeFullWidthKa(), token.termText());
 }
 public void testAnalyzerWithFullWidth() throws IOException
 {
 Reader r1 = new StringReader(makeFullWidthKa());
 TokenStream stream = new CJKAnalyzer().tokenStream(foo, r1);
 assertEquals(makeFullWidthKa(), stream.next().termText());
 }
 private String makeFullWidthKa() throws UnsupportedEncodingException
 {
 return new String(fullWidthKa, UTF-8);
 }
 private String makeHalfWidthKa() throws UnsupportedEncodingException
 {
 return new String(halfWidthKa, UTF-8);
 }
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1215) Support of Unicode Collation

2008-03-10 Thread Andrew Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12577312#action_12577312
 ] 

Andrew Lynch commented on LUCENE-1215:
--

This will be quite useful. I used the Normalizer to implement my own custom 
analyzer for https://issues.apache.org/jira/browse/LUCENE-1032. 
There is actually a Normalizer equivalent in older versions of the Sun JDK, 
sun.text.Normalizer, but this obviously wouldn't end up being portable across 
VMs. 

I ended up using reflection to determine the presence of Normalizer if it 
existed, then fell back to sun.text.Normalizer, then finally performing no 
normalization if neither could be found to preserve compatibility with non Java 
6/ Sun JDKs.

 Support of Unicode Collation
 

 Key: LUCENE-1215
 URL: https://issues.apache.org/jira/browse/LUCENE-1215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Analysis
Reporter: Hiroaki Kawai
 Attachments: NormalizerTokenFilter.java


 New in java 6, we have java.text.Normalizer that supports Unicode Standard 
 Annex #15 normalization.
 http://java.sun.com/javase/6/docs/api/java/text/Normalizer.html
 http://www.unicode.org/unicode/reports/tr15/
 The normalization defined has four variants of C, D, KC, KD. Canonical 
 Decomposition or Compatibility Decomposition will be normalize the 
 representation of a String, and the search result will be improved.
 I'd like to submit a TokenFilter code supporting this feature! :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]