Re: Index Optimization
thanks hossman. will post it at java-user hossman wrote: > > > 1) > http://wiki.apache.org/lucene-java/LuceneFAQ#head-adee7c1d869aa20101733944da79e15a1a2e7dfa > > FAQ: "Why do I have a deletable file (and old segment files remain) after > running optimize?" > > 2) http://people.apache.org/~hossman/#java-dev > > Please Use "[EMAIL PROTECTED]" Not "[EMAIL PROTECTED]" > > Your question is better suited for the [EMAIL PROTECTED] mailing list ... > not the [EMAIL PROTECTED] list. java-dev is for discussing development of > the internals of the Lucene Java library ... it is *not* the appropriate > place to ask questions about how to use the Lucene Java library when > developing your own applications. > > If you have further questions about this topic, please send them to the > java-user mailing list, where you are likely to get more/better responses > since that list also has a larger number of subscribers. > > : I managed to optimize my index successfully. The problem that I'm having > now > : is when I check the index using Lucene Index Toolbox there are a few > files > : in the index itself is deletable. I understand that optimize method will > : merge the index files but How come there is still deletable index files > in > : it? What I do now is delete it manually. Is there by any chance that I > can > : delete it automatically? Any code that I can refer to? > > > > -Hoss > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Index-Optimization-tp15996107p15996877.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Index Optimization
1) http://wiki.apache.org/lucene-java/LuceneFAQ#head-adee7c1d869aa20101733944da79e15a1a2e7dfa FAQ: "Why do I have a deletable file (and old segment files remain) after running optimize?" 2) http://people.apache.org/~hossman/#java-dev Please Use "[EMAIL PROTECTED]" Not "[EMAIL PROTECTED]" Your question is better suited for the [EMAIL PROTECTED] mailing list ... not the [EMAIL PROTECTED] list. java-dev is for discussing development of the internals of the Lucene Java library ... it is *not* the appropriate place to ask questions about how to use the Lucene Java library when developing your own applications. If you have further questions about this topic, please send them to the java-user mailing list, where you are likely to get more/better responses since that list also has a larger number of subscribers. : I managed to optimize my index successfully. The problem that I'm having now : is when I check the index using Lucene Index Toolbox there are a few files : in the index itself is deletable. I understand that optimize method will : merge the index files but How come there is still deletable index files in : it? What I do now is delete it manually. Is there by any chance that I can : delete it automatically? Any code that I can refer to? -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Index Optimization
I managed to optimize my index successfully. The problem that I'm having now is when I check the index using Lucene Index Toolbox there are a few files in the index itself is deletable. I understand that optimize method will merge the index files but How come there is still deletable index files in it? What I do now is delete it manually. Is there by any chance that I can delete it automatically? Any code that I can refer to? -- View this message in context: http://www.nabble.com/Index-Optimization-tp15996107p15996107.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: How to add a jar to a contrib build.xml
: Here is how the span highlighter I have been working on uses the Memory : contrib (I think I copied this from another contrib that has a dependency): You might want to take a look at contrib/xml-query-parser/build.xml as a slightly better example of this. It uses to test if the dependency has already been built to save some overhead. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1223) lazy fields don't enforce binary vs string value
[ https://issues.apache.org/jira/browse/LUCENE-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1223: --- Attachment: LUCENE-1223.patch Attached patch that just propagates the "binary" value from when we scanned the fields, into the LazyField, recording it as isBinary. Then I enforce isBinary before returning a binaryValue() and !isBinary before returning a stringValue(). I'll commit in a day or two. > lazy fields don't enforce binary vs string value > > > Key: LUCENE-1223 > URL: https://issues.apache.org/jira/browse/LUCENE-1223 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1223.patch > > > If you have a binary field, and load it lazy, and then ask that field > for its stringValue, it will incorrectly give you a String back (and > then will refuse to give a binaryValue). And, vice-versa. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1223) lazy fields don't enforce binary vs string value
lazy fields don't enforce binary vs string value Key: LUCENE-1223 URL: https://issues.apache.org/jira/browse/LUCENE-1223 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 If you have a binary field, and load it lazy, and then ask that field for its stringValue, it will incorrectly give you a String back (and then will refuse to give a binaryValue). And, vice-versa. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577637#action_12577637 ] Michael McCandless commented on LUCENE-1217: OK the new patch passes all tests -- thanks! One unrelated thing I noticed: it looks like you can get a binary LazyField and then ask for its stringValue(), and vice-versa. Ie we are failing to check in binaryValue() that the field is in fact binary even though when we create the LazyField we know whether it is. I'll open a separate issue for this. > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: Lucene-1217-take1.patch, LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.patch this one keeps addition of new methods localized to AbstractField, does not change Fieldable interface... it looks like it could work done this way with a few instanceof checks in FieldsWriter, This one has dependency on LUCENE-1217 it will not give you any benefit if you directly implement your Fieldable without extending AbstractField, therefore I would suggest to eventually change Fieldable to support all these methods that operate with offset/length. Or someone clever finds some way to change an interface without braking backwards compatibility :) > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1035) Optional Buffer Pool to Improve Search Performance
[ https://issues.apache.org/jira/browse/LUCENE-1035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ning Li updated LUCENE-1035: Attachment: LUCENE-1035.contrib.patch Re-do as a contrib package. Creating BufferPooledDirectory with your customized file name filter for readers allows you to decide which files you want to use the caching layer for. The package includes some tests. I also modified and tested the core tests with the caching layer in a private setting and all tests passed. > Optional Buffer Pool to Improve Search Performance > -- > > Key: LUCENE-1035 > URL: https://issues.apache.org/jira/browse/LUCENE-1035 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Reporter: Ning Li > Attachments: LUCENE-1035.contrib.patch, LUCENE-1035.patch > > > Index in RAMDirectory provides better performance over that in FSDirectory. > But many indexes cannot fit in memory or applications cannot afford to > spend that much memory on index. On the other hand, because of locality, > a reasonably sized buffer pool may provide good improvement over FSDirectory. > This issue aims at providing such an optional buffer pool layer. In cases > where it fits, i.e. a reasonable hit ratio can be achieved, it should provide > a good improvement over FSDirectory. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
: I think, if you give it the same name, it just grays out the old ones. See : https://issues.apache.org/jira/browse/LUCENE-550 for an example. : : Thus, I prefer #3, but am fine with #2 as well. #3 makes it easier, IMO, to : find the latest. use the same name if the patch serves the same purpose (in the majority of issues, there is a linear evolution of a single patch). when doing this Jira recognizes that the patches "superceed" eachother, and allways prsents the latest at the top of the list with the others greyed out. use differnet names for patches that serve differnet purposes (ie: one patch which may go through several iterations using one approach, someone may then post a differnet patch with a differnet name which attempts to solve the same problem with a completely differnet approach, someone else may then post a third patch with a third name which provides unit tests that work against both of the other patches ... at which point all three different" patches" may be updated many times as they evolve in attempting to find the best ultimate solution. if you use differnet names for differnet iterations of the same "logical patch" it's very not easy to see in jira which one is the "newest" because jira orders patches with differnet names lexigraphically. you have to go to the "Manage Attachemnts" screen or view the full history of the issue to get any sense of when each differently name patch was added. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1217: Attachment: Lucene-1217-take1.patch new patch, fixes isBinary status in LazyField > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: Lucene-1217-take1.patch, LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577601#action_12577601 ] Eks Dev commented on LUCENE-1217: - hah, this bug just justified this patch :) sorry, I should have run tests before... nothing is trivial enough. The problem was indeed isBinary that went out of sync in LazyField, new patch follows > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577598#action_12577598 ] Michael McCandless commented on LUCENE-1217: Actually seeing a test failure with this: [junit] Testcase: testLazyFields(org.apache.lucene.index.TestFieldsReader): FAILED [junit] bytes is null and it shouldn't be [junit] junit.framework.AssertionFailedError: bytes is null and it shouldn't be [junit] at org.apache.lucene.index.TestFieldsReader.testLazyFields(TestFieldsReader.java:132) > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577597#action_12577597 ] Eks Dev commented on LUCENE-1219: - I do not know for sure if this is something we could not live with. Adding new interface sounds equally bad, would work nicely, but I do not like it as it makes code harder to follow with too many interfaces ... I'll have another look at it to see if there is a way to do it without interface changes. Any ideas? > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1219.patch, LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577591#action_12577591 ] Eks Dev commented on LUCENE-1217: - thanks fof looking into it! Subclassing now with backwards compatibility would be clumsy, I was thinking about it but could not find clean way to make it. >>Or we could wait until Java 5 (3.0) and use real enums? yes, that is ultimate solution, but my line of thoughts was that "poor man's enum"->java 5 enum migration would be trivial later... but do not change working code kicks-in here :) > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Updated: (LUCENE-1198) Exception in DocumentsWriter.ThreadState.init leads to corruption
: Thanks Hoss! I did the easy book-keeping part ... you're the guy fixing the bugs and merging them into the release branches :) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1222) IndexWriter.doAfterFlush not being called when there are no deletions flushed
[ https://issues.apache.org/jira/browse/LUCENE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1222: - Fix Version/s: 2.4 2.3.2 targeted for 2.3.2 bug fix release > IndexWriter.doAfterFlush not being called when there are no deletions flushed > - > > Key: LUCENE-1222 > URL: https://issues.apache.org/jira/browse/LUCENE-1222 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > > It should be called when flushing either added docs or deletions. The fix is > trivial. I'll commit shortly to trunk & 2.3.2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1199) NullPointerException in IndexModifier.close()
[ https://issues.apache.org/jira/browse/LUCENE-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1199: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > NullPointerException in IndexModifier.close() > - > > Key: LUCENE-1199 > URL: https://issues.apache.org/jira/browse/LUCENE-1199 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.0.0, 2.3.1 >Reporter: James William Dumay > Fix For: 2.3.2, 2.4 > > > We upgraded from Lucene 2.0.0. to 2.3.1 hoping this would resolve this issue. > http://jira.codehaus.org/browse/MRM-715 > Trace is as below for Lucene 2.3.1: > java.lang.NullPointerException > at org.apache.lucene.index.IndexModifier.close(IndexModifier.java:576) > at > org.apache.maven.archiva.indexer.lucene.LuceneRepositoryContentIndex.closeQuietly(LuceneRepositoryContentIndex.java:416) > at > org.apache.maven.archiva.indexer.lucene.LuceneRepositoryContentIndex.modifyRecord(LuceneRepositoryContentIndex.java:152) > at > org.apache.maven.archiva.consumers.lucene.IndexContentConsumer.processFile(IndexContentConsumer.java:169) > at > org.apache.maven.archiva.repository.scanner.functors.ConsumerProcessFileClosure.execute(ConsumerProcessFileClosure.java:51) > at > org.apache.commons.collections.functors.IfClosure.execute(IfClosure.java:117) > at > org.apache.commons.collections.CollectionUtils.forAllDo(CollectionUtils.java:388) > at > org.apache.maven.archiva.repository.scanner.RepositoryContentConsumers.executeConsumers(RepositoryContentConsumers.java:283) > at > org.apache.maven.archiva.proxy.DefaultRepositoryProxyConnectors.transferFile(DefaultRepositoryProxyConnectors.java:597) > at > org.apache.maven.archiva.proxy.DefaultRepositoryProxyConnectors.fetchFromProxies(DefaultRepositoryProxyConnectors.java:157) > at > org.apache.maven.archiva.web.repository.ProxiedDavServer.applyServerSideRelocation(ProxiedDavServer.java:447) > at > org.apache.maven.archiva.web.repository.ProxiedDavServer.fetchContentFromProxies(ProxiedDavServer.java:354) > at > org.apache.maven.archiva.web.repository.ProxiedDavServer.process(ProxiedDavServer.java:189) > at > org.codehaus.plexus.webdav.servlet.multiplexed.MultiplexedWebDavServlet.service(MultiplexedWebDavServlet.java:119) > at > org.apache.maven.archiva.web.repository.RepositoryServlet.service(RepositoryServlet.java:155) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:803) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1210) IndexWriter & ConcurrentMergeScheduler deadlock case if starting a merge hits an exception
[ https://issues.apache.org/jira/browse/LUCENE-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1210: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > IndexWriter & ConcurrentMergeScheduler deadlock case if starting a merge hits > an exception > -- > > Key: LUCENE-1210 > URL: https://issues.apache.org/jira/browse/LUCENE-1210 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > > If you're using CMS (the default) and mergeInit hits an exception (eg > OOME), we are not properly clearing IndexWriter's internal tracking of > running merges. This causes IW.close() to hang while it incorrectly > waits for these non-started merges to finish. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1200) IndexWriter.addIndexes* can deadlock in rare cases
[ https://issues.apache.org/jira/browse/LUCENE-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1200: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > IndexWriter.addIndexes* can deadlock in rare cases > -- > > Key: LUCENE-1200 > URL: https://issues.apache.org/jira/browse/LUCENE-1200 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.4 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > Attachments: LUCENE-1200.patch > > > In somewhat rare cases it's possible for addIndexes to deadlock > because it is a synchronized method. > Normally the merges that are necessary for addIndexes are done > serially (with the primary thread) because they involve segments from > an external directory. However, if mergeFactor of these merges > complete then a merge becomes necessary for the merged segments, which > are not external, and so it can run in the background. If too many BG > threads need to run (currently > 4) then the "pause primary thread" > approach adopted in LUCENE-1164 will deadlock, because the addIndexes > method is holding a lock on IndexWriter. > This was appearing as a intermittant deadlock in the > TestIndexWriterMerging test case. > This issue is not present in 2.3 (it was caused by LUCENE-1164). > The solution is to shrink the scope of synchronization: don't > synchronize on the whole method & wrap synchronized(this) in the right > places inside the methods. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
[ https://issues.apache.org/jira/browse/LUCENE-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1208: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > Deadlock case in IndexWriter on exception just before flush > --- > > Key: LUCENE-1208 > URL: https://issues.apache.org/jira/browse/LUCENE-1208 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > Attachments: LUCENE-1208.patch > > > If a document hits a non-aborting exception, eg something goes wrong > in tokenStream.next(), and, that document had triggered a flush > (due to RAM or doc count) then DocumentsWriter will deadlock because > that thread marks the flush as pending but fails to clear it on > exception. > I have a simple test case showing this, and a fix fixing it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Updated: (LUCENE-1198) Exception in DocumentsWriter.ThreadState.init leads to corruption
Thanks Hoss! Mike On Mar 11, 2008, at 3:28 PM, Hoss Man (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1198? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1198: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release Exception in DocumentsWriter.ThreadState.init leads to corruption - Key: LUCENE-1198 URL: https://issues.apache.org/jira/browse/ LUCENE-1198 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.3.2, 2.4 Attachments: LUCENE-1198.patch If an exception is hit in the init method, DocumentsWriter incorrectly increments numDocsInRAM when in fact the document is not added. Spinoff of this thread: http://markmail.org/message/e76hgkgldxhakuaa The root cause that led to the exception in init was actually due to incorrect use of Lucene's APIs (one thread still modifying the Document while IndexWriter.addDocument is adding it) but still we should protect against any exceptions coming out of init. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1198) Exception in DocumentsWriter.ThreadState.init leads to corruption
[ https://issues.apache.org/jira/browse/LUCENE-1198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1198: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > Exception in DocumentsWriter.ThreadState.init leads to corruption > - > > Key: LUCENE-1198 > URL: https://issues.apache.org/jira/browse/LUCENE-1198 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > Attachments: LUCENE-1198.patch > > > If an exception is hit in the init method, DocumentsWriter incorrectly > increments numDocsInRAM when in fact the document is not added. > Spinoff of this thread: > http://markmail.org/message/e76hgkgldxhakuaa > The root cause that led to the exception in init was actually due to > incorrect use of Lucene's APIs (one thread still modifying the > Document while IndexWriter.addDocument is adding it) but still we > should protect against any exceptions coming out of init. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1197) IndexWriter can flush too early when flushing by RAM usage
[ https://issues.apache.org/jira/browse/LUCENE-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1197: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > IndexWriter can flush too early when flushing by RAM usage > -- > > Key: LUCENE-1197 > URL: https://issues.apache.org/jira/browse/LUCENE-1197 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > > There is a silly bug in how DocumentsWriter tracks its RAM usage: > whenever term vectors are enabled, it incorrectly counts the space > used by term vectors towards flushing, when in fact this space is > recycled per document. > This is not a functionality bug. All it causes is flushes to happen > too frequently, and, IndexWriter will use less RAM than you asked it > to. To work around it you can simply give it a bigger RAM buffer. > I will commit a fix shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1191) If IndexWriter hits OutOfMemoryError it should not commit
[ https://issues.apache.org/jira/browse/LUCENE-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1191: - Fix Version/s: 2.3.2 targeted for 2.3.2 bug fix release > If IndexWriter hits OutOfMemoryError it should not commit > - > > Key: LUCENE-1191 > URL: https://issues.apache.org/jira/browse/LUCENE-1191 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.4 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.3.2, 2.4 > > Attachments: LUCENE-1191.patch > > > While progress has been made making IndexWriter robust to OOME, I > think there is still a real risk that an OOME at a bad time could put > IndexWriter into a bad state such that if close() is called and > somehow it succeeds without hitting another OOME, it risks > introducing messing up the index. > I'd like to detect if OOME has been hit in any of the methods that > alter IW's state, and if so, do not commit changes to the index. If > close is called after hitting OOME, I think writer should instead > abort. > Attached patch just adds try/catch clauses to catch OOME, note that > it was hit, and re-throw it. Then, sync() refuses to commit a new > segments_N if OOME was hit, and close instead calls abort when OOME > was hit. All tests pass. I plan to commit in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1207) Allow spell check input to be part of the results
[ https://issues.apache.org/jira/browse/LUCENE-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated LUCENE-1207: - Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Fix Version/s: (was: 2.3.1) this was not actually part of (the already released) 2.3.1 -- removing "Fix Version" > Allow spell check input to be part of the results > - > > Key: LUCENE-1207 > URL: https://issues.apache.org/jira/browse/LUCENE-1207 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Karl Wettin >Priority: Trivial > Attachments: canSuggestSelf.patch > > > As a threadshold marker, to see if the word seems to exist at all, or what > not. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577563#action_12577563 ] Yonik Seeley commented on LUCENE-1221: -- If there is a real character that doesn't appear in a property name, it would be much safer to use that. Using non-unicode chars or reserved chars is pretty dicey since you never know what methods might throw an exception because of it. > DocumentsWriter truncates term text at \u > - > > Key: LUCENE-1221 > URL: https://issues.apache.org/jira/browse/LUCENE-1221 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Marcel Reutegger >Priority: Minor > Attachments: OddTermTest.java > > > When a Term text contains the unicode 'character' \u, DocumentsWriter > will truncate the text and only write the text up to the \u character. > This has been introduces with changes for LUCENE-843 to reduce memory usage > and improve performance. > This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577561#action_12577561 ] Marcel Reutegger commented on LUCENE-1221: -- I'll see if I can build some kind of filter index reader that translates existing terms on the fly to use a new separator, while new terms are written with the new separator. > DocumentsWriter truncates term text at \u > - > > Key: LUCENE-1221 > URL: https://issues.apache.org/jira/browse/LUCENE-1221 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Marcel Reutegger >Priority: Minor > Attachments: OddTermTest.java > > > When a Term text contains the unicode 'character' \u, DocumentsWriter > will truncate the text and only write the text up to the \u character. > This has been introduces with changes for LUCENE-843 to reduce memory usage > and improve performance. > This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577551#action_12577551 ] Michael McCandless commented on LUCENE-1219: Hmm ... one problem is Fieldable is an interface, and this patch adds methods to the interface, which I believe breaks our backwards compatibility requirement. > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1219.patch, LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577547#action_12577547 ] Michael McCandless commented on LUCENE-1217: Patch looks good. I will commit shortly. Thanks Eks Dev. {quote} Would it not make sense to maintain type with some integer/byte"poor man's enum" (Interface with a couple of constants) {quote} Or we could wait until Java 5 (3.0) and use real enums? Or ... maybe we should have subclasses of Field (TextField, BinaryField, ReaderField, TokenStreamField) which override the corresponding method (and the base Field.java would still implement these methods but return null)? Though this would be a rather large change... > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577544#action_12577544 ] Marcel Reutegger commented on LUCENE-1221: -- > How/why are you seeing/using this character in Jackrabbit To avoid an excessive amount of Lucene fields we prefix term values with the JCR property name and put everything under the same Lucene field name. The 0x separates the property name from the property value. See: JCR-106. That was before Lucene 2.1, when each field had a separate norm file. > DocumentsWriter truncates term text at \u > - > > Key: LUCENE-1221 > URL: https://issues.apache.org/jira/browse/LUCENE-1221 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Marcel Reutegger >Priority: Minor > Attachments: OddTermTest.java > > > When a Term text contains the unicode 'character' \u, DocumentsWriter > will truncate the text and only write the text up to the \u character. > This has been introduces with changes for LUCENE-843 to reduce memory usage > and improve performance. > This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1222) IndexWriter.doAfterFlush not being called when there are no deletions flushed
[ https://issues.apache.org/jira/browse/LUCENE-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1222. Resolution: Fixed > IndexWriter.doAfterFlush not being called when there are no deletions flushed > - > > Key: LUCENE-1222 > URL: https://issues.apache.org/jira/browse/LUCENE-1222 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > > It should be called when flushing either added docs or deletions. The fix is > trivial. I'll commit shortly to trunk & 2.3.2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1222) IndexWriter.doAfterFlush not being called when there are no deletions flushed
IndexWriter.doAfterFlush not being called when there are no deletions flushed - Key: LUCENE-1222 URL: https://issues.apache.org/jira/browse/LUCENE-1222 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor It should be called when flushing either added docs or deletions. The fix is trivial. I'll commit shortly to trunk & 2.3.2. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577513#action_12577513 ] Michael McCandless commented on LUCENE-1221: Hmmm ... 0x is one of the "invalid for interchange but may freely be used internal to an implementation" UTF-16 characters (from http://unicode.org/faq/utf_bom.html#6), so I assumed it was safe to use internally in DocumentsWriter. But apparently you are using it. How/why are you seeing/using this character in Jackrabbit? Note that with LUCENE-510 (not yet fixed but in progress), there may be similar issues whereby the treatment of other kinds of invalid UTF-16 strings changes. > DocumentsWriter truncates term text at \u > - > > Key: LUCENE-1221 > URL: https://issues.apache.org/jira/browse/LUCENE-1221 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Marcel Reutegger >Priority: Minor > Attachments: OddTermTest.java > > > When a Term text contains the unicode 'character' \u, DocumentsWriter > will truncate the text and only write the text up to the \u character. > This has been introduces with changes for LUCENE-843 to reduce memory usage > and improve performance. > This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcel Reutegger updated LUCENE-1221: - Attachment: OddTermTest.java Test to reproduce the issue. > DocumentsWriter truncates term text at \u > - > > Key: LUCENE-1221 > URL: https://issues.apache.org/jira/browse/LUCENE-1221 > Project: Lucene - Java > Issue Type: Bug > Components: Index >Affects Versions: 2.3, 2.3.1 >Reporter: Marcel Reutegger >Priority: Minor > Attachments: OddTermTest.java > > > When a Term text contains the unicode 'character' \u, DocumentsWriter > will truncate the text and only write the text up to the \u character. > This has been introduces with changes for LUCENE-843 to reduce memory usage > and improve performance. > This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF
DocumentsWriter truncates term text at \u - Key: LUCENE-1221 URL: https://issues.apache.org/jira/browse/LUCENE-1221 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1, 2.3 Reporter: Marcel Reutegger Priority: Minor When a Term text contains the unicode 'character' \u, DocumentsWriter will truncate the text and only write the text up to the \u character. This has been introduces with changes for LUCENE-843 to reduce memory usage and improve performance. This change in behavior prevents us (Jackrabbit) from upgrading to Lucene 2.3. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
thanks, I get it now, matter of taste :) I would opt for, #3 if you fix bugs from previous patch, decorate javadoc..., but you leave things mainly as they are #2 is better to mark interface, approach change or something more substantial - Original Message From: Grant Ingersoll <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, 11 March, 2008 4:47:16 PM Subject: Re: Ideas to refactor Filed I think, if you give it the same name, it just grays out the old ones. See https://issues.apache.org/jira/browse/LUCENE-550 for an example.. Thus, I prefer #3, but am fine with #2 as well. #3 makes it easier, IMO, to find the latest. -Grant On Mar 11, 2008, at 10:26 AM, Michael McCandless wrote: > > I like #2. > > I don't think we should delete/replace attachments in Jira. The > history can be useful.. > > Mike > > eks dev wrote: > >> Michael, others >> >> what is Lucene/Jira best practice for new versions of the same patch: >> >> 1. delete existing / add new patch wit the same name >> 2. add new patch with some funky version e.g. "Jira-1219-take3.patch" >> 3. just add new patch with the same name >> >> ? >> >> >> >> >> >> >> __ >> Sent from Yahoo! Mail. >> The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Grant Ingersoll http://www.lucenebootcamp.com Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1220) PDF search is not working
[ https://issues.apache.org/jira/browse/LUCENE-1220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1220. - Resolution: Invalid Lucene knows nothing about PDFs. It is up to your application to handle PDFs. See Tika or PDFBox or other tools for how to do that. > PDF search is not working > - > > Key: LUCENE-1220 > URL: https://issues.apache.org/jira/browse/LUCENE-1220 > Project: Lucene - Java > Issue Type: Test >Reporter: Akshya kumar > > I uploaded pdf file in my repository and try for full text search.Its not > able to search in PDF,MS powerpoint,HTML files while it is able to search in > Ms Word,text,MS Excel files.Can u suggest me any solution how to get result. > Following is my XPapth Query. > String str = "Documentum"; > String sQuery = "//element(*,nt:unstructured)[jcr:contains(jcr:content,' " + > str + " ')]/rep:excerpt(.)"; > Query q =qm.createQuery(sQuery, Query.XPATH); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
I think, if you give it the same name, it just grays out the old ones. See https://issues.apache.org/jira/browse/LUCENE-550 for an example. Thus, I prefer #3, but am fine with #2 as well. #3 makes it easier, IMO, to find the latest. -Grant On Mar 11, 2008, at 10:26 AM, Michael McCandless wrote: I like #2. I don't think we should delete/replace attachments in Jira. The history can be useful. Mike eks dev wrote: Michael, others what is Lucene/Jira best practice for new versions of the same patch: 1. delete existing / add new patch wit the same name 2. add new patch with some funky version e.g. "Jira-1219-take3.patch" 3. just add new patch with the same name ? __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Grant Ingersoll http://www.lucenebootcamp.com Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1220) PDF search is not working
PDF search is not working - Key: LUCENE-1220 URL: https://issues.apache.org/jira/browse/LUCENE-1220 Project: Lucene - Java Issue Type: Test Reporter: Akshya kumar I uploaded pdf file in my repository and try for full text search.Its not able to search in PDF,MS powerpoint,HTML files while it is able to search in Ms Word,text,MS Excel files.Can u suggest me any solution how to get result. Following is my XPapth Query. String str = "Documentum"; String sQuery = "//element(*,nt:unstructured)[jcr:contains(jcr:content,' " + str + " ')]/rep:excerpt(.)"; Query q =qm.createQuery(sQuery, Query.XPATH); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
I like #2. I don't think we should delete/replace attachments in Jira. The history can be useful. Mike eks dev wrote: Michael, others what is Lucene/Jira best practice for new versions of the same patch: 1. delete existing / add new patch wit the same name 2. add new patch with some funky version e.g. "Jira-1219-take3.patch" 3. just add new patch with the same name ? __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
Michael, others what is Lucene/Jira best practice for new versions of the same patch: 1. delete existing / add new patch wit the same name 2. add new patch with some funky version e.g. "Jira-1219-take3.patch" 3. just add new patch with the same name ? __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.patch Michael McCandless had some nice ideas on how to make getValue() change performance penalty for legacy usage negligible, this patch includes them: - deprecates getValue() method - returns direct reference if offset==0 && length == data.length > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1219.patch, LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
tip with extra checks is good, deprecate even better, I will update patch - Original Message From: Michael McCandless <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, 11 March, 2008 2:45:56 PM Subject: Re: Ideas to refactor Filed Hello! Responses below: eks dev wrote: > Moin Moin Michael, > > for the first issue I have crated LUCENE-1217, and for the second > one I have some questions. > > if we maintain length and offset internally in Field than we have > one, imo, theoretical "legacy performance problem" as we need to > create new byte[length] and copy in order to preserve compatibility > (users expect this method to return compact array with 0 offset) > I am talking about. > public byte[] binaryValue(); Actually, if offset==0 and dataLength==array.length, can't we return the array itself? This way legacy apps, which will pass both these checks, would see tiny (because of these added checks) performance loss? Also, in a search setting, where doc was created from stored fields, I think both those checks would be true as well (unless FieldsReader is changed to share byte[] arrays between fields). I think we should then deprecate binaryValue() in favor of getBinaryValue()? > would that be acceptable, it is very small penalty and there will > be a way to avoid it? Anyhow, if one is using > public void setValue(byte[] value), it is to be expected that this > user allready has a reference to value. This makes this > question rather theoretical, no? > > we could than create new methods, getOffset() getLength() > getBinaryValue() that enable full spectrum and replace all uses > that expect 0-offset array. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
Hello! Responses below: eks dev wrote: Moin Moin Michael, for the first issue I have crated LUCENE-1217, and for the second one I have some questions. if we maintain length and offset internally in Field than we have one, imo, theoretical "legacy performance problem" as we need to create new byte[length] and copy in order to preserve compatibility (users expect this method to return compact array with 0 offset) I am talking about. public byte[] binaryValue(); Actually, if offset==0 and dataLength==array.length, can't we return the array itself? This way legacy apps, which will pass both these checks, would see tiny (because of these added checks) performance loss? Also, in a search setting, where doc was created from stored fields, I think both those checks would be true as well (unless FieldsReader is changed to share byte[] arrays between fields). I think we should then deprecate binaryValue() in favor of getBinaryValue()? would that be acceptable, it is very small penalty and there will be a way to avoid it? Anyhow, if one is using public void setValue(byte[] value), it is to be expected that this user allready has a reference to value. This makes this question rather theoretical, no? we could than create new methods, getOffset() getLength() getBinaryValue() that enable full spectrum and replace all uses that expect 0-offset array. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1217: -- Assignee: Michael McCandless > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Trivial > Attachments: LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1219: -- Assignee: Michael McCandless > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Assignee: Michael McCandless >Priority: Minor > Attachments: LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.patch > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Priority: Minor > Attachments: LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: (was: LUCENE-1219.patch) > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Priority: Minor > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.patch all tests pass with this patch. some polish needed and probably more testing, TODOs: - someone pedantic should check if these new set / get methods should be named better - check if there are more places where this new feature cold/should be used, I think I have changed all of them but one place, direct subclass FieldForMerge in FieldsReader, this is the code I do not know so I did not touch it... - javadoc is poor should be enough to get us started. the only "pseudo-issue" I see is that public byte[] binaryValue(); now creates byte[] and copies content into it, reference to original array can be now fetched via getBinaryValue() method... this is to preserve compatibility as users expect compact, zero based array from this method and we keep offset/length in Field now this is "pseudo issue" as users already should have a reference to this array, so this method is rather superfluous for end users. > support array/offset/ length setters for Field with binary data > --- > > Key: LUCENE-1219 > URL: https://issues.apache.org/jira/browse/LUCENE-1219 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Reporter: Eks Dev >Priority: Minor > Attachments: LUCENE-1219.patch > > > currently Field/Fieldable interface supports only compact, zero based byte > arrays. This forces end users to create and copy content of new objects > before passing them to Lucene as such fields are often of variable size. > Depending on use case, this can bring far from negligible performance > improvement. > this approach extends Fieldable interface with 3 new methods > getOffset(); gettLenght(); and getBinaryValue() (this only returns reference > to the array) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1208) Deadlock case in IndexWriter on exception just before flush
OK I've backported fixes for these issues to the 2.3 branch! Mike Michael Busch wrote: Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1208? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel&focusedCommentId=12576941#action_12576941 ] Michael McCandless commented on LUCENE-1208: Agreed. I'm thinking these issues should be ported to 2.3.2: LUCENE-1191 LUCENE-1197 LUCENE-1198 LUCENE-1199 LUCENE-1200 LUCENE-1208 (this issue) LUCENE-1210 +1 -Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1219) support array/offset/ length setters for Field with binary data
support array/offset/ length setters for Field with binary data --- Key: LUCENE-1219 URL: https://issues.apache.org/jira/browse/LUCENE-1219 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Eks Dev Priority: Minor currently Field/Fieldable interface supports only compact, zero based byte arrays. This forces end users to create and copy content of new objects before passing them to Lucene as such fields are often of variable size. Depending on use case, this can bring far from negligible performance improvement. this approach extends Fieldable interface with 3 new methods getOffset(); gettLenght(); and getBinaryValue() (this only returns reference to the array) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Ideas to refactor Filed
Moin Moin Michael, for the first issue I have crated LUCENE-1217, and for the second one I have some questions. if we maintain length and offset internally in Field than we have one, imo, theoretical "legacy performance problem" as we need to create new byte[length] and copy in order to preserve compatibility (users expect this method to return compact array with 0 offset) I am talking about. public byte[] binaryValue(); would that be acceptable, it is very small penalty and there will be a way to avoid it? Anyhow, if one is using public void setValue(byte[] value), it is to be expected that this user allready has a reference to value. This makes this question rather theoretical, no? we could than create new methods, getOffset() getLength() getBinaryValue() that enable full spectrum and replace all uses that expect 0-offset array. - Original Message From: Michael McCandless <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Wednesday, 5 March, 2008 10:09:26 AM Subject: Re: Ideas to refactor Filed Good morning! eks dev wrote: > I have noticed the two potential enhancements in Field, and I am > not sure if I read it correctly, so better to ask before crating > Jira issue :) > > 1.. Field uses two methods to determine type of fieldsData, > sometimes with boolean isBinary; and sometimes with instanceof byt[] > The proposal is to reduce it to one method, ether by removing > isBinary and using instance of byte[] or to replace one instanceof > with isBinary. I do not know which one should be faster? This makes sense. Is this for the binaryValue() method? I would expect the explicit isBinary would be fastest. > 2. Second enhancement would be to add length of char[]/byte[], to > setValue(...) methods e.g. > public void setValue(byte[] value, int length) //maybe offset as > well? > This would enable users to save some allocations This also makes sense. I think adding offset and length makes sense. Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ Rise to the challenge for Sport Relief with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1218) PassTokenizerFilter that pass text in a Token
[ https://issues.apache.org/jira/browse/LUCENE-1218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1218: -- Attachment: PassTokenizer.java > PassTokenizerFilter that pass text in a Token > - > > Key: LUCENE-1218 > URL: https://issues.apache.org/jira/browse/LUCENE-1218 > Project: Lucene - Java > Issue Type: New Feature > Components: Analysis >Reporter: Hiroaki Kawai >Priority: Minor > Attachments: PassTokenizer.java > > > The PassTokenizer passes a text in a TokenStream that has a single token in > its stream. > This Tokenizer(attached) is very useful for debugging. You may test > TokenFilter that will do sub-tokenization in a token. This Tokenizer is also > useful for debugging the tokenization process dependent on the TokenFilter > order. > This Tokenizer is not suitable for processing a large text field. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1218) PassTokenizerFilter that pass text in a Token
PassTokenizerFilter that pass text in a Token - Key: LUCENE-1218 URL: https://issues.apache.org/jira/browse/LUCENE-1218 Project: Lucene - Java Issue Type: New Feature Components: Analysis Reporter: Hiroaki Kawai Priority: Minor Attachments: PassTokenizer.java The PassTokenizer passes a text in a TokenStream that has a single token in its stream. This Tokenizer(attached) is very useful for debugging. You may test TokenFilter that will do sub-tokenization in a token. This Tokenizer is also useful for debugging the tokenization process dependent on the TokenFilter order. This Tokenizer is not suitable for processing a large text field. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1217: Attachment: LUCENE-1217.patch > use isBinary cached variable instead of instanceof in Filed > --- > > Key: LUCENE-1217 > URL: https://issues.apache.org/jira/browse/LUCENE-1217 > Project: Lucene - Java > Issue Type: Improvement > Components: Other >Reporter: Eks Dev >Priority: Trivial > Attachments: LUCENE-1217.patch > > > Filed class can hold three types of values, > See: AbstractField.java protected Object fieldsData = null; > currently, mainly RTTI (instanceof) is used to determine the type of the > value stored in particular instance of the Field, but for binary value we > have mixed RTTI and cached variable "boolean isBinary" > This patch makes consistent use of cached variable isBinary. > Benefit: consistent usage of method to determine run-time type for binary > case (reduces chance to get out of sync on cached variable). It should be > slightly faster as well. > Thinking aloud: > Would it not make sense to maintain type with some integer/byte"poor man's > enum" (Interface with a couple of constants) > code:java{ > public static final interface Type{ > public static final byte BOOLEAN = 0; > public static final byte STRING = 1; > public static final byte READER = 2; > > } > } > and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed
use isBinary cached variable instead of instanceof in Filed --- Key: LUCENE-1217 URL: https://issues.apache.org/jira/browse/LUCENE-1217 Project: Lucene - Java Issue Type: Improvement Components: Other Reporter: Eks Dev Priority: Trivial Filed class can hold three types of values, See: AbstractField.java protected Object fieldsData = null; currently, mainly RTTI (instanceof) is used to determine the type of the value stored in particular instance of the Field, but for binary value we have mixed RTTI and cached variable "boolean isBinary" This patch makes consistent use of cached variable isBinary. Benefit: consistent usage of method to determine run-time type for binary case (reduces chance to get out of sync on cached variable). It should be slightly faster as well. Thinking aloud: Would it not make sense to maintain type with some integer/byte"poor man's enum" (Interface with a couple of constants) code:java{ public static final interface Type{ public static final byte BOOLEAN = 0; public static final byte STRING = 1; public static final byte READER = 2; } } and use that instead of isBinary + instanceof? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-1213. - Resolution: Fixed Lucene Fields: [Patch Available] (was: [New]) Committed, thanks Trejkaz! > MultiFieldQueryParser ignores slop parameter > > > Key: LUCENE-1213 > URL: https://issues.apache.org/jira/browse/LUCENE-1213 > Project: Lucene - Java > Issue Type: Bug > Components: QueryParser >Reporter: Trejkaz >Assignee: Doron Cohen > Attachments: multifield-fix.patch, multifield-fix.patch > > > MultiFieldQueryParser.getFieldQuery(String, String, int) calls > super.getFieldQuery(String, String), thus obliterating any slop parameter > present in the query. > It should probably be changed to call super.getFieldQuery(String, String, > int), except doing only that will result in a recursive loop which is a > side-effect of what may be a deeper problem in MultiFieldQueryParser -- > getFieldQuery(String, String, int) is documented as delegating to > getFieldQuery(String, String), yet what it actually does is the exact > opposite. This also causes problems for subclasses which need to override > getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-1213) MultiFieldQueryParser ignores slop parameter
[ https://issues.apache.org/jira/browse/LUCENE-1213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576880#action_12576880 ] doronc edited comment on LUCENE-1213 at 3/11/08 1:00 AM: -- Trejkaz thanks for the patch. Attached a slightly compacted fix (refactoring slop-applying to a separate method). Also added a test that fails without this fix. All tests pass, if there are no comments I will commit this in a day or two. was (Author: doronc): Trekaj thanks for the patch. Attached a slightly compacted fix (refactoring slop-applying to a separate method). Also added a test that fails without this fix. All tests pass, if there are no comments I will commit this in a day or two. > MultiFieldQueryParser ignores slop parameter > > > Key: LUCENE-1213 > URL: https://issues.apache.org/jira/browse/LUCENE-1213 > Project: Lucene - Java > Issue Type: Bug > Components: QueryParser >Reporter: Trejkaz >Assignee: Doron Cohen > Attachments: multifield-fix.patch, multifield-fix.patch > > > MultiFieldQueryParser.getFieldQuery(String, String, int) calls > super.getFieldQuery(String, String), thus obliterating any slop parameter > present in the query. > It should probably be changed to call super.getFieldQuery(String, String, > int), except doing only that will result in a recursive loop which is a > side-effect of what may be a deeper problem in MultiFieldQueryParser -- > getFieldQuery(String, String, int) is documented as delegating to > getFieldQuery(String, String), yet what it actually does is the exact > opposite. This also causes problems for subclasses which need to override > getFieldQuery(String, String) to provide different behaviour. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-584) Decouple Filter from BitSet
[ https://issues.apache.org/jira/browse/LUCENE-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577326#action_12577326 ] Paul Elschot commented on LUCENE-584: - >From the traceback I suppose this happened at the end, using the ChainedFilter? Iirc ChainedFilter is from contrib/..., and it is mentioned at LUCENE-1187 as one of the things to be done. Could you contribute this code as a contrib/... test case there? Sorry, I don't remember exactly from which contrib module ChainedFilter is. > Decouple Filter from BitSet > --- > > Key: LUCENE-584 > URL: https://issues.apache.org/jira/browse/LUCENE-584 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >Affects Versions: 2.1 >Reporter: Peter Schäfer >Assignee: Michael Busch >Priority: Minor > Fix For: 2.4 > > Attachments: bench-diff.txt, bench-diff.txt, CHANGES.txt.patch, > ContribQueries20080111.patch, lucene-584-take2.patch, > lucene-584-take3-part1.patch, lucene-584-take3-part2.patch, > lucene-584-take4-part1.patch, lucene-584-take4-part2.patch, > lucene-584-take5-part1.patch, lucene-584-take5-part2.patch, lucene-584.patch, > Matcher-20070905-2default.patch, Matcher-20070905-3core.patch, > Matcher-20071122-1ground.patch, Some Matchers.zip, Test20080111.patch > > > {code} > package org.apache.lucene.search; > public abstract class Filter implements java.io.Serializable > { > public abstract AbstractBitSet bits(IndexReader reader) throws IOException; > } > public interface AbstractBitSet > { > public boolean get(int index); > } > {code} > It would be useful if the method =Filter.bits()= returned an abstract > interface, instead of =java.util.BitSet=. > Use case: there is a very large index, and, depending on the user's > privileges, only a small portion of the index is actually visible. > Sparsely populated =java.util.BitSet=s are not efficient and waste lots of > memory. It would be desirable to have an alternative BitSet implementation > with smaller memory footprint. > Though it _is_ possibly to derive classes from =java.util.BitSet=, it was > obviously not designed for that purpose. > That's why I propose to use an interface instead. The default implementation > could still delegate to =java.util.BitSet=. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]