[jira] Created: (LUCENE-2244) Improve StandardTokenizer's understanding of non ASCII punctuation and quotes
Improve StandardTokenizer's understanding of non ASCII punctuation and quotes - Key: LUCENE-2244 URL: https://issues.apache.org/jira/browse/LUCENE-2244 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Reporter: Andi Vajda In the vein of LUCENE-1126 and LUCENE-1390, StandardTokenizerImpl.jflex should do a better job at understanding non-ASCII punctuation characters. For example, its understanding of the single-quote character ' is currently limited to that character only. It will set a token's type to APOSTROPHE only if the ' was used. In the patch attached, I added all the characters that ASCIIFoldingFilter would change into '. I'm not sure that this is the right approach so I didn't write a complete patch for all the other hardcoded characters used in jflex rules such as ., - which have some variants in ASCIIFoldingFilter that could be used as well. Maybe a better approach would be to make it possible to have an ASCIIFoldingFilter-like reader as a character filter that could be in inserted in front of StandardTokenizer ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2244) Improve StandardTokenizer's understanding of non ASCII punctuation and quotes
[ https://issues.apache.org/jira/browse/LUCENE-2244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-2244: --- Attachment: StandardTokenizerImpl.jflex.diff A patch expanding the understanding of the single-quote character to the characters that ASCIIFoldingFilter turns into '. Improve StandardTokenizer's understanding of non ASCII punctuation and quotes - Key: LUCENE-2244 URL: https://issues.apache.org/jira/browse/LUCENE-2244 Project: Lucene - Java Issue Type: Bug Components: Analysis Affects Versions: 3.0 Reporter: Andi Vajda Attachments: StandardTokenizerImpl.jflex.diff In the vein of LUCENE-1126 and LUCENE-1390, StandardTokenizerImpl.jflex should do a better job at understanding non-ASCII punctuation characters. For example, its understanding of the single-quote character ' is currently limited to that character only. It will set a token's type to APOSTROPHE only if the ' was used. In the patch attached, I added all the characters that ASCIIFoldingFilter would change into '. I'm not sure that this is the right approach so I didn't write a complete patch for all the other hardcoded characters used in jflex rules such as ., - which have some variants in ASCIIFoldingFilter that could be used as well. Maybe a better approach would be to make it possible to have an ASCIIFoldingFilter-like reader as a character filter that could be in inserted in front of StandardTokenizer ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-1580) ISOLatin1AccentFilter does not handle Turkish (UTF-8) chars correctly.
[ https://issues.apache.org/jira/browse/LUCENE-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda resolved LUCENE-1580. Resolution: Duplicate See https://issues.apache.org/jira/browse/LUCENE-1390 ISOLatin1AccentFilter does not handle Turkish (UTF-8) chars correctly. -- Key: LUCENE-1580 URL: https://issues.apache.org/jira/browse/LUCENE-1580 Project: Lucene - Java Issue Type: Bug Reporter: Digy Priority: Minor Attachments: ISOLatin1AccentFilter.patch Below mappings are missing Ğ -- G ğ -- g İ -- I ı -- i Ş -- S ş -- s DIGY -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654160#action_12654160 ] Andi Vajda commented on LUCENE-1390: Thanks Mark ! add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652875#action_12652875 ] Andi Vajda commented on LUCENE-1390: Ah, I see now what you're asking for. Sorry about the misunderstanding. I believe I had picked 'e' for schwa because it looks closest to that letter. I have no objections to switching to using 'a' instead if that's more correct. This Wikipedia seems to agree: http://en.wikipedia.org/wiki/Schwa_(Cyrillic) This other Wikipedia http://en.wikipedia.org/wiki/Schwa is less clear about this, but it seems that using 'a' instead of 'e' doesn't contradict it. Steven, I can amend the patch but you said you had more changes coming. If that's the case, could you please add this change as well. If that's not the case, is it ok for me to add this change and call for this bug to be committed to trunk and closed ? add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652911#action_12652911 ] Andi Vajda commented on LUCENE-1390: Great, I'll include Robert's change and try to convince a committer to finalize it. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653045#action_12653045 ] Andi Vajda commented on LUCENE-1390: This class includes all of ISOLatin1AccentFilter. Still, a difference in behaviour could be seen when using the new filter with characters getting converted now that didn't before. If that sort of lack of backwards compatibility is something we don't want to impose on the 3.0 release then the ISOLatin1AccentFilter class needs to be preserved. Thanks for volunteering to finalize this bug ! add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-1390: --- Attachment: ASCIIFoldingFilter.patch This latest version supercedes the previous one and moves all schwa characters to the 'A' or 'a' depending on their case. 0259, lowercase schwa, was missing and thus added. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653123#action_12653123 ] Andi Vajda commented on LUCENE-1390: Mark, I attached a new version of the patch with Robert's change. As for the deprecation of ISOLatin1AccentFilter.java, I don't have a definite opinion on this. It's pretty much redundant with what this new class does. If the maintenance overhead is not too bad then keeping the duplication around may be worth the effort to preserve some backwards compat. Thanks for taking this from here ! Andi.. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-1390: --- Attachment: (was: ISOLatinAccentFilter.java) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653139#action_12653139 ] Andi Vajda commented on LUCENE-1390: Yep, I'm leaning that way too. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Assignee: Mark Miller Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652694#action_12652694 ] Andi Vajda commented on LUCENE-1390: Could you please attach a patch for the change you requested, I'm not sure it's displaying correctly here. You seem to asking about a change for the mapping of AE and E+acute which is unexpected. Thanks ! add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12643152#action_12643152 ] Andi Vajda commented on LUCENE-1390: Wow, Steve, I'm impressed. This is quite an improvement over my earlier patches and even more of an improvement over ISOLatin1AccentFilter. Thank you for doing this ! What's next ? Does any Lucene committer watching this bug have objections in checking this in ? One (minor) missing piece to the patch is the deprecation of ISOLatin1AccentFilter itself. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Priority: Minor Fix For: 2.9 Attachments: ASCIIFoldingFilter.patch, ASCIIFoldingFilter.patch, ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-1390: --- Attachment: ISOLatinAccentFilter.java ISOLatinAccentFilter.java again, now with Unicode Latin Extended B as well. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Attachments: ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12632458#action_12632458 ] Andi Vajda commented on LUCENE-1390: I think that would be a whole lot of typing :) Not a bad idea, still. I'm in the process of entering the 1E00 - 1EFF range. The Extended-C and D blocks also have relevant things to include but I'm hoping to stop at the Extended Additional block currently in progress. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Priority: Minor Fix For: 2.9 Attachments: ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-1390: --- Attachment: ISOLatinAccentFilter.java The new ISOLatinAccentFilter class, superceding ISOLatin1AccentFilter. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Attachments: ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1390) add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter
[ https://issues.apache.org/jira/browse/LUCENE-1390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12631946#action_12631946 ] Andi Vajda commented on LUCENE-1390: Makes sense. I did look at that block and it looked much more remote from the purpose of this class. But you're right, many of these could be handled as well. And I agree that they should be handled to be able to claim to be doing a complete job. So far, I've claimed that this class handles Latin 1 and Latin Extended A which should cover most, if not all, european/turkish languages using latin script and thus goes much farther than the ISOLatin1AccentFilter in that respect. add ISOLatinAccentFilter and deprecate ISOLatin1AccentFilter Key: LUCENE-1390 URL: https://issues.apache.org/jira/browse/LUCENE-1390 Project: Lucene - Java Issue Type: Improvement Components: Analysis Environment: any Reporter: Andi Vajda Attachments: ISOLatinAccentFilter.java The ISOLatin1AccentFilter is removing accents from accented characters in the ISO Latin 1 character set. It does what it does and there is no bug with it. It would be nicer, though, if there was a more comprehensive version of this code that included not just ISO-Latin-1 (ISO-8859-1) but the entire Latin 1 and Latin Extended A unicode blocks. See: http://en.wikipedia.org/wiki/Latin-1_Supplement_unicode_block See: http://en.wikipedia.org/wiki/Latin_Extended-A_unicode_block That way, all languages using roman characters are covered. A new class, ISOLatinAccentFilter is attached. It is intended to supercede ISOLatin1AccentFilter which should get deprecated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1339) Add IndexReader.acquire() and release() methods using IndexReader's ref counting
[ https://issues.apache.org/jira/browse/LUCENE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12615001#action_12615001 ] Andi Vajda commented on LUCENE-1339: That would work just as well ! Andi.. Add IndexReader.acquire() and release() methods using IndexReader's ref counting Key: LUCENE-1339 URL: https://issues.apache.org/jira/browse/LUCENE-1339 Project: Lucene - Java Issue Type: New Feature Reporter: Andi Vajda Fix For: 2.3.2 Attachments: lucene-1339.patch From: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/[EMAIL PROTECTED] I have a server where a bunch of threads are handling search requests. I have a another process that updates the index used by the search server and that asks the searcher server to reopen its index reader after the updates completed. When I reopen() the index reader, I also close the old one (if the reopen() yielded a new instance). This causes problems for the other threads that are currently in the middle of a search request. I'd like to propose the addition of two methods, acquire() and release() (attached to this bug report), that increment/decrement the ref count that IndexReader instances currently maintain for related purposes. That ref count prevents the index reader from being actually closed until it reaches zero. My server's search threads, thus acquiring and releasing the index reader can be sure that the index reader they're currently using is good until they're done with the current request, ie, until they release() it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1339) Add IndexReader.acquire() and release() methods using IndexReader's ref counting
Add IndexReader.acquire() and release() methods using IndexReader's ref counting Key: LUCENE-1339 URL: https://issues.apache.org/jira/browse/LUCENE-1339 Project: Lucene - Java Issue Type: New Feature Reporter: Andi Vajda Fix For: 2.3.2 From: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/[EMAIL PROTECTED] I have a server where a bunch of threads are handling search requests. I have a another process that updates the index used by the search server and that asks the searcher server to reopen its index reader after the updates completed. When I reopen() the index reader, I also close the old one (if the reopen() yielded a new instance). This causes problems for the other threads that are currently in the middle of a search request. I'd like to propose the addition of two methods, acquire() and release() (attached to this bug report), that increment/decrement the ref count that IndexReader instances currently maintain for related purposes. That ref count prevents the index reader from being actually closed until it reaches zero. My server's search threads, thus acquiring and releasing the index reader can be sure that the index reader they're currently using is good until they're done with the current request, ie, until they release() it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1339) Add IndexReader.acquire() and release() methods using IndexReader's ref counting
[ https://issues.apache.org/jira/browse/LUCENE-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-1339: --- Attachment: lucene-1339.patch Add IndexReader.acquire() and release() methods using IndexReader's ref counting Key: LUCENE-1339 URL: https://issues.apache.org/jira/browse/LUCENE-1339 Project: Lucene - Java Issue Type: New Feature Reporter: Andi Vajda Fix For: 2.3.2 Attachments: lucene-1339.patch From: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200807.mbox/[EMAIL PROTECTED] I have a server where a bunch of threads are handling search requests. I have a another process that updates the index used by the search server and that asks the searcher server to reopen its index reader after the updates completed. When I reopen() the index reader, I also close the old one (if the reopen() yielded a new instance). This causes problems for the other threads that are currently in the middle of a search request. I'd like to propose the addition of two methods, acquire() and release() (attached to this bug report), that increment/decrement the ref count that IndexReader instances currently maintain for related purposes. That ref count prevents the index reader from being actually closed until it reaches zero. My server's search threads, thus acquiring and releasing the index reader can be sure that the index reader they're currently using is good until they're done with the current request, ie, until they release() it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access
BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access -- Key: LUCENE-1234 URL: https://issues.apache.org/jira/browse/LUCENE-1234 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3.1 Reporter: Andi Vajda Priority: Trivial Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not accessible from outside the search.payloads making it difficult to write an extension of BoostingTermQuery. The other inner classes are protected already, as they should be. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access
[ https://issues.apache.org/jira/browse/LUCENE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andi Vajda updated LUCENE-1234: --- Attachment: patches-lucene-2.3.1 patch against lucene-2.3.1 sources BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access -- Key: LUCENE-1234 URL: https://issues.apache.org/jira/browse/LUCENE-1234 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3.1 Reporter: Andi Vajda Priority: Trivial Attachments: patches-lucene-2.3.1 Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not accessible from outside the search.payloads making it difficult to write an extension of BoostingTermQuery. The other inner classes are protected already, as they should be. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1234) BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access
[ https://issues.apache.org/jira/browse/LUCENE-1234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12578976#action_12578976 ] Andi Vajda commented on LUCENE-1234: The inaccessible class is called BoostingSpanScorer. The method I'd to override there is the score() method. BoostingTermQuery's BoostingSpanScorer class should be protected instead of package access -- Key: LUCENE-1234 URL: https://issues.apache.org/jira/browse/LUCENE-1234 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3.1 Reporter: Andi Vajda Priority: Trivial Attachments: patches-lucene-2.3.1 Currently, BoostingTermScorer, an inner class of BoostingTermQuery is not accessible from outside the search.payloads making it difficult to write an extension of BoostingTermQuery. The other inner classes are protected already, as they should be. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1182) SimilarityDelegator is missing a delegating scorePayload() method
[ https://issues.apache.org/jira/browse/LUCENE-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12570903#action_12570903 ] Andi Vajda commented on LUCENE-1182: Err, I meant to say the handy SimilarityDelegator class SimilarityDelegator is missing a delegating scorePayload() method - Key: LUCENE-1182 URL: https://issues.apache.org/jira/browse/LUCENE-1182 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3 Reporter: Andi Vajda Priority: Minor The handy SimilarityDelegator method is missing a scoreDelegator() delegating method. The fix is trivial, add the code below at the end of the class: public float scorePayload(String fieldName, byte [] payload, int offset, int length) { return delegee.scorePayload(fieldName, payload, offset, length); } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1182) SimilarityDelegator is missing a delegating scorePayload() method
SimilarityDelegator is missing a delegating scorePayload() method - Key: LUCENE-1182 URL: https://issues.apache.org/jira/browse/LUCENE-1182 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.3 Reporter: Andi Vajda Priority: Minor The handy SimilarityDelegator method is missing a scoreDelegator() delegating method. The fix is trivial, add the code below at the end of the class: public float scorePayload(String fieldName, byte [] payload, int offset, int length) { return delegee.scorePayload(fieldName, payload, offset, length); } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-722) DEFAULT spelled DEFALT in MoreLikeThis.java
[ http://issues.apache.org/jira/browse/LUCENE-722?page=comments#action_12451809 ] Andi Vajda commented on LUCENE-722: --- Yes, you fixed it in one place but this file is actually duplicated in the Lucene source tree. The bug I filed was about the other occurrence, in the 'queries' contrib module since it seems to be the one that is current as implied in the 'queries' module readme.txt file. DEFAULT spelled DEFALT in MoreLikeThis.java --- Key: LUCENE-722 URL: http://issues.apache.org/jira/browse/LUCENE-722 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.0.0 Environment: all Reporter: Andi Vajda Priority: Minor Fix For: 2.1 DEFAULT is spelled DEFALT in contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Reopened: (LUCENE-722) DEFAULT spelled DEFALT in MoreLikeThis.java
[ http://issues.apache.org/jira/browse/LUCENE-722?page=all ] Andi Vajda reopened LUCENE-722: --- contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java is still wrong. DEFAULT spelled DEFALT in MoreLikeThis.java --- Key: LUCENE-722 URL: http://issues.apache.org/jira/browse/LUCENE-722 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.0.0 Environment: all Reporter: Andi Vajda Priority: Minor Fix For: 2.1 DEFAULT is spelled DEFALT in contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-722) DEFAULT spelled DEFALT in MoreLikeThis.java
DEFAULT spelled DEFALT in MoreLikeThis.java --- Key: LUCENE-722 URL: http://issues.apache.org/jira/browse/LUCENE-722 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.0.0 Environment: all Reporter: Andi Vajda Priority: Minor DEFAULT is spelled DEFALT in contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-722) DEFAULT spelled DEFALT in MoreLikeThis.java
[ http://issues.apache.org/jira/browse/LUCENE-722?page=comments#action_12451697 ] Andi Vajda commented on LUCENE-722: --- http://svn.osafoundation.org/pylucene/trunk/patches.lucene contains a patch (among others) to fix this. DEFAULT spelled DEFALT in MoreLikeThis.java --- Key: LUCENE-722 URL: http://issues.apache.org/jira/browse/LUCENE-722 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.0.0 Environment: all Reporter: Andi Vajda Priority: Minor DEFAULT is spelled DEFALT in contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-676) Promote solr's PrefixFilter into Java Lucene's core
[ http://issues.apache.org/jira/browse/LUCENE-676?page=all ] Andi Vajda updated LUCENE-676: -- Attachment: TestPrefixFilter.java Here is another attachment by Yura providing the request unit test. Promote solr's PrefixFilter into Java Lucene's core --- Key: LUCENE-676 URL: http://issues.apache.org/jira/browse/LUCENE-676 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.1 Reporter: Andi Vajda Priority: Trivial Attachments: PrefixFilter.java, TestPrefixFilter.java Solr's PrefixFilter class is not specific to Solr and seems to be of interest to core lucene users (PyLucene in this case). Promoting it into the Lucene core would be helpful. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-676) Promote solr's PrefixFilter into Java Lucene's core
[ http://issues.apache.org/jira/browse/LUCENE-676?page=all ] Andi Vajda updated LUCENE-676: -- Attachment: PrefixFilter.java Attached is a version of PrefixFilter that could be added to the Lucene core as submitted by Yura Smolsky, a PyLucene user. Promote solr's PrefixFilter into Java Lucene's core --- Key: LUCENE-676 URL: http://issues.apache.org/jira/browse/LUCENE-676 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.0.1 Reporter: Andi Vajda Priority: Trivial Attachments: PrefixFilter.java Solr's PrefixFilter class is not specific to Solr and seems to be of interest to core lucene users (PyLucene in this case). Promoting it into the Lucene core would be helpful. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-507) CLONE -[PATCH] remove unused variables
[ http://issues.apache.org/jira/browse/LUCENE-507?page=comments#action_12376874 ] Andi Vajda commented on LUCENE-507: --- My apologies, I didn't notice this until it was mentioned today. The //required by gcj comment is not something I added or need. The few patches for gcj support that were added at my request are listed as such in the Lucene sources. The main one has to do with gcj's bug 15411 in Searcher.java, the other with naming a method 'delete'. In general, it is easier to use javac or jikes to compile the .java sources to .class files and then use gcj on the resulting .class (or .jar) files to produce native binaries. Thus, one runs around a number of bugs in the gcj java compiler front-end. Still, there are some patches I need to apply to Lucene in order for it to run when compiled with gcj. Some are in QueryParser.java and the first of those could be applied to the actual .jj file instead, see here: http://svn.osafoundation.org/pylucene/trunk/patches.lucene The next patches in the file above are because of limitations in gcjh (the Java to C++ header file generator) or because exception catching doesn't seem to work well with gcj on Windows. Throwing and catching exceptions in Java is not such an efficient coding practice when there isn't an actual error, maybe the code in FieldInfos.java could be changed then (see patch file above) ? As for the last patch, well, the java runtime that comes with gcj 3.x doesn't implement regex, so PyLucene calls into python's regex support instead. CLONE -[PATCH] remove unused variables -- Key: LUCENE-507 URL: http://issues.apache.org/jira/browse/LUCENE-507 Project: Lucene - Java Type: Improvement Components: Search Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Steven Tamm Assignee: Lucene Developers Priority: Minor Attachments: Unused.patch Seems I'm the only person who has the unused variable warning turned on in Eclipse :-) This patch removes those unused variables and imports (for now only in the search package). This doesn't introduce changes in functionality, but it should be reviewed anyway: there might be cases where the variables *should* be used, but they are not because of a bug. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-555) Index Corruption
[ http://issues.apache.org/jira/browse/LUCENE-555?page=comments#action_12376319 ] Andi Vajda commented on LUCENE-555: --- There is an implementation of the Lucene index store that is backed up by Berkeley DB. Take a look at the 'db' contrib area: http://svn.apache.org/repos/asf/lucene/java/trunk/contrib/db/ Using this you can bracket index changes with transaction. Should the cord be pulled, you can use Berkeley DB's recovery mechanisms. Index Corruption Key: LUCENE-555 URL: http://issues.apache.org/jira/browse/LUCENE-555 Project: Lucene - Java Type: Bug Components: Index Versions: 1.9 Environment: Linux FC4, Java 1.4.9 Reporter: dan Priority: Critical Index Corruption output java.io.FileNotFoundException: ../_aki.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:204) at org.apache.lucene.store.FSIndexInput$Descriptor.init(FSDirectory.java:425) at org.apache.lucene.store.FSIndexInput.init(FSDirectory.java:434) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324) at org.apache.lucene.index.FieldInfos.init(FieldInfos.java:56) at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:144) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:129) at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:110) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:674) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) input - I open an index, I read, I write, I optimize, and eventually the above happens. The index is unusable. - This has happened to me somewhere between 20 and 30 times now - on indexes of different shapes and sizes. - I don't know the reason. But, the following requirement applies regardless. requirement - Like all modern database programs, there has to be a way to repair an index. Period. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-536) JEDirectory delete issue
[ http://issues.apache.org/jira/browse/LUCENE-536?page=all ] Andi Vajda resolved LUCENE-536: --- Resolution: Fixed Your changes were integrated and committed (rev 394214). Please, please, please, in the future when sending fixes in, send a proper patch as generated by svn diff. Thanks. JEDirectory delete issue Key: LUCENE-536 URL: http://issues.apache.org/jira/browse/LUCENE-536 Project: Lucene - Java Type: Bug Components: Store Reporter: Aaron Donovan Priority: Minor Attachments: File.java, File.java, JEStoreTest.java, JEStoreTest.java JEDirectory is not deleting files properly. Blocks are left behind due to an error in cursor operations. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-482) JE Directory Implementation
[ http://issues.apache.org/jira/browse/LUCENE-482?page=all ] Andi Vajda resolved LUCENE-482: --- Fix Version: 1.9 Resolution: Fixed Assign To: Andi Vajda fixed in rev 366041, 'db' contrib area structure was rearranged to accomodate multiple implementations and new Berkeley DB JE contribution by Aaron Donovan was added. JE Directory Implementation --- Key: LUCENE-482 URL: http://issues.apache.org/jira/browse/LUCENE-482 Project: Lucene - Java Type: New Feature Components: Store Versions: 1.9 Reporter: Aaron Donovan Assignee: Andi Vajda Priority: Minor Fix For: 1.9 Attachments: contrib.zip I've created a port of DbDirectory to JE -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]