[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563746#action_12563746 ] Doron Cohen commented on LUCENE-1157: - {quote} So, wouldn't it work to have Changes.html (and the stylesheets too) live in trunk/docs/ ? {quote} Yes I agree, they should move so that Grant's job copies them. But I would like to make them part of the javadocs, so that there's no need recompile with each change and no need to check-in Changes.html. I'll revert this and continue tomorrow. > Formatable changes log (CHANGES.txt is easy to edit but not so friendly to > read by Lucene users) > - > > Key: LUCENE-1157 > URL: https://issues.apache.org/jira/browse/LUCENE-1157 > Project: Lucene - Java > Issue Type: Improvement > Components: Website >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 2.4 > > Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, > lucene-1157.patch > > > Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563743#action_12563743 ] Steven Rowe commented on LUCENE-1157: - bq. there are unidentifiable characters in Changes.html. They are also in CHANGES.txt. I'm sure I read something about why they are added but cannot find it now. The first three bytes of CHANGES.txt are a UTF-8 BOM (byte-order mark). In Unicode's fixed-width encodings, e.g. UTF-16, the character U+FEFF is reserved for the beginnings of streams to denote the endian-ness of the character serialization. UTF-8 is non-endian (invariant byte order given a character); the use of the BOM in UTF-8, where it is serialized as three bytes, is solely to indicate that the encoding of the stream is UTF-8. Microsoft's tools like to put BOMs at the beginnings of UTF-8 encoded files. > Formatable changes log (CHANGES.txt is easy to edit but not so friendly to > read by Lucene users) > - > > Key: LUCENE-1157 > URL: https://issues.apache.org/jira/browse/LUCENE-1157 > Project: Lucene - Java > Issue Type: Improvement > Components: Website >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 2.4 > > Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, > lucene-1157.patch > > > Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563725#action_12563725 ] Doron Cohen commented on LUCENE-1157: - Seems that Changes.html sould not be in svn at all. Instead, it should have same status as javadocs - both are generated documentation. Instead of creating it as part of compile-core I'll create it as part of javadocs-core. Instead of created as part of committing, it would be created as part of nightly build, and copied to the site by Grant's scripts. I'll go on with this tomorrow. > Formatable changes log (CHANGES.txt is easy to edit but not so friendly to > read by Lucene users) > - > > Key: LUCENE-1157 > URL: https://issues.apache.org/jira/browse/LUCENE-1157 > Project: Lucene - Java > Issue Type: Improvement > Components: Website >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 2.4 > > Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, > lucene-1157.patch > > > Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563724#action_12563724 ] Steven Rowe commented on LUCENE-1157: - According to http://wiki.apache.org/lucene-java/HowToUpdateTheWebsite , anything checked into trunk/docs/ will be automatically mirrored to the live website by a cron job running under Grant's account. So, wouldn't it work to have Changes.html (and the stylesheets too) live in trunk/docs/ ? > Formatable changes log (CHANGES.txt is easy to edit but not so friendly to > read by Lucene users) > - > > Key: LUCENE-1157 > URL: https://issues.apache.org/jira/browse/LUCENE-1157 > Project: Lucene - Java > Issue Type: Improvement > Components: Website >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 2.4 > > Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, > lucene-1157.patch > > > Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563720#action_12563720 ] Doron Cohen commented on LUCENE-1157: - Ok I checked in the creation of Changes.html from changes.txt. thanks Steven! The Web site update part seems trickier than I thought. - Adding a link in the site to http://svn.apache.org/viewvc/lucene/java/trunk/Changes.html?view=co does not work so well, because of the way that page is served by ViewVC. - Linking to http://svn.apache.org/repos/asf/lucene/java/trunk/Changes.html isn't working either because svn returns the source of that file. - In addition there are unidentifiable characters in Changes.html. They are also in CHANGES.txt. I'm sure I read something about why they are added but cannot find it now. Ideas? > Formatable changes log (CHANGES.txt is easy to edit but not so friendly to > read by Lucene users) > - > > Key: LUCENE-1157 > URL: https://issues.apache.org/jira/browse/LUCENE-1157 > Project: Lucene - Java > Issue Type: Improvement > Components: Website >Reporter: Doron Cohen >Assignee: Doron Cohen > Fix For: 2.4 > > Attachments: lucene-1157-take2.patch, lucene-1157-take3.patch, > lucene-1157.patch > > > Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JBoss Cache as a store
: Is there a set of tests in the Lucene sources I could use to test the : "JBCDirectory", as I call it? Perhaps something way I could change the "index : store provider" and re-run some existing tests, and perhaps add some clustered : tests specific to my plugin? I think most of the existing tests have the Directory impl hardcoded in them ... the best thing to do might be to refactor the existing tests so Directory creation comes from an overridable function in a subclass... come ot think of it, Karl may have already done this as part of his InstantiatedIndex patch (check jira) but i'm not sure ... the conversation sounds familiar, but i think he was looking at facading the entire IndexReader impl not just the directory, so any refactoring approach he might have taken may not have gone far enough to work in this case. It would certianly be nice if there was an easy way to run every test in the test suite against an arbitrary Directory implementation. : Finally, regarding hosting, I am happy to contribute this to Lucene (alongside : the JEDirectory, etc) but if licensing (JBoss Cache is LGPL, although the : plugin code can be ASL if need be) or language levels (the plugin depends on : JBoss Cache 2.x, which requires JDK 5) then I'm happy to host the plugin : externally. contribs can run require 1.5 already ... an soon the trunk will move to 1.5 so that's not really an issue, the licensing may be, but it depends on how the integration with JBoss winds up working (ie: i don't know if having the build scripts download JBoss at build time to compile against them is allowed or not) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-997) Add search timeout support to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563698#action_12563698 ] Paul Elschot commented on LUCENE-997: - The idea of System.currentTimeMillis() is to guard against misbehaviour of the java wait() method and against unexpected delays because of thread scheduling during the jump back for the loop around the wait() call. One way to reduce such misbehaviour under heavy load is by increasing the scheduling priority of the timing thread, but I don't think that is necessary. Also System.currentTimeMillis() is obviously correct, whereas timeout += resolution is never more than an assumption about correct wait() behaviour. Clock changes by NTP are normally so slow that they don't really matter for query time outs. > Add search timeout support to Lucene > > > Key: LUCENE-997 > URL: https://issues.apache.org/jira/browse/LUCENE-997 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Sean Timm >Priority: Minor > Attachments: HitCollectorTimeoutDecorator.java, > LuceneTimeoutTest.java, LuceneTimeoutTest.java, MyHitCollector.java, > timeout.patch, timeout.patch, timeout.patch, timeout.patch, > TimerThreadTest.java > > > This patch is based on Nutch-308. > This patch adds support for a maximum search time limit. After this time is > exceeded, the search thread is stopped, partial results (if any) are returned > and the total number of results is estimated. > This patch tries to minimize the overhead related to time-keeping by using a > version of safe unsynchronized timer. > This was also discussed in an e-mail thread. > http://www.nabble.com/search-timeout-tf3410206.html#a9501029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1158) DateTools UTC/GMT mismatch
[ https://issues.apache.org/jira/browse/LUCENE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber resolved LUCENE-1158. -- Resolution: Fixed Fix Version/s: 2.4 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Patch applied. > DateTools UTC/GMT mismatch > -- > > Key: LUCENE-1158 > URL: https://issues.apache.org/jira/browse/LUCENE-1158 > Project: Lucene - Java > Issue Type: Bug > Components: Javadocs >Affects Versions: 2.3 >Reporter: Daniel Naber >Priority: Minor > Fix For: 2.4 > > Attachments: datetools.diff > > > Post from Antony Bowesman on java-user: > - > I just noticed that although the Javadocs for Lucene 2.2 state that the dates > for DateTools use UTC as a timezone, they are actually using GMT. > Should either the Javadocs be corrected or the code corrected to use UTC > instead. > - > I'm attaching a patch that changes the javadoc and will commit it, unless > someone knows a reason the javadoc is correct and the code should be changed > to UTC. To my understanding, there's no significant difference between UTC > and GMT. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-997) Add search timeout support to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563632#action_12563632 ] Sean Timm commented on LUCENE-997: -- I could go either way on the System.currentTimeMillis() versus a TimerThread issue. I agree nanoTime is the correct implementation when using 1.5. It doesn't seem like on Linux running ntp it matters much either way. NTP tries to perform smoothing and makes clock changes slowly over a longer period of time when it can rather than have an abrupt change, but YMMV if your system is having clock issues. On a really overloaded Windows box, the TimerThread implementation will not behave well as demonstrated above. I can't speak to any other platforms. > Add search timeout support to Lucene > > > Key: LUCENE-997 > URL: https://issues.apache.org/jira/browse/LUCENE-997 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Sean Timm >Priority: Minor > Attachments: HitCollectorTimeoutDecorator.java, > LuceneTimeoutTest.java, LuceneTimeoutTest.java, MyHitCollector.java, > timeout.patch, timeout.patch, timeout.patch, timeout.patch, > TimerThreadTest.java > > > This patch is based on Nutch-308. > This patch adds support for a maximum search time limit. After this time is > exceeded, the search thread is stopped, partial results (if any) are returned > and the total number of results is estimated. > This patch tries to minimize the overhead related to time-keeping by using a > version of safe unsynchronized timer. > This was also discussed in an e-mail thread. > http://www.nabble.com/search-timeout-tf3410206.html#a9501029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1151) Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default
[ https://issues.apache.org/jira/browse/LUCENE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563625#action_12563625 ] Grant Ingersoll commented on LUCENE-1151: - Here's the thread on JFlex for completeness, not that it it effects this patch: http://sourceforge.net/mailarchive/forum.php?thread_name=272037D7-6EA1-4D19-902F-B425A5309C2A%40apache.org&forum_name=jflex-users > Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default > --- > > Key: LUCENE-1151 > URL: https://issues.apache.org/jira/browse/LUCENE-1151 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1151.patch > > > Coming out of the discussion around back compatibility, it seems best to > default StandardAnalyzer to properly fix LUCENE-1068, while preserving the > ability to get the back-compatible behavior in the rare event that it's > desired. > This just means changing the replaceInvalidAcronym = false with = true, and, > adding a clear entry to CHANGES.txt that this very slight non back compatible > change took place. > Spinoff from here: > http://www.gossamer-threads.com/lists/lucene/java-dev/57517#57517 > I'll commit that change in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-997) Add search timeout support to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Timm updated LUCENE-997: - Attachment: timeout.patch This is a minor update to timeout.patch which fixes the comment about updates to 32-bit-sized variables being atomic and instead talks about volatile longs, as pointed out by Andrzej. It also computes the time out moment up front to save a subtraction on each document collection as suggested by Paul. > Add search timeout support to Lucene > > > Key: LUCENE-997 > URL: https://issues.apache.org/jira/browse/LUCENE-997 > Project: Lucene - Java > Issue Type: New Feature >Reporter: Sean Timm >Priority: Minor > Attachments: HitCollectorTimeoutDecorator.java, > LuceneTimeoutTest.java, LuceneTimeoutTest.java, MyHitCollector.java, > timeout.patch, timeout.patch, timeout.patch, timeout.patch, > TimerThreadTest.java > > > This patch is based on Nutch-308. > This patch adds support for a maximum search time limit. After this time is > exceeded, the search thread is stopped, partial results (if any) are returned > and the total number of results is estimated. > This patch tries to minimize the overhead related to time-keeping by using a > version of safe unsynchronized timer. > This was also discussed in an e-mail thread. > http://www.nabble.com/search-timeout-tf3410206.html#a9501029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JBoss Cache as a store
Hi Manik, >>> Is there a set of tests in the Lucene sources I could use to test the "JBCDirectory", as I call it? You would probably need to adapt existing Junit tests in contrib/benchmark and src/test for performance and functionality testing, respectively. They use the existing RAMDirectory and FSDirectory Directory implementations so you'll need to change the test code to use your JBCDirectory instead. Cheers, Mark - Original Message From: Manik Surtani <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, 29 January, 2008 3:38:17 PM Subject: Re: JBoss Cache as a store Bump. Anyone? On 24 Jan 2008, at 14:07, Manik Surtani wrote: > Hi guys > > I've just written a plugin for Lucene to use JBoss Cache as an index > store. The benefits of something like this are: > > 1. Faster access to indexes as they will be in memory > 2. Indexes replicated across a cluster of servers > 3. Indexes "persisted" in clustered memory - faster that > persistence to disk > > The implementation I have is pretty basic for now. > > Is there a set of tests in the Lucene sources I could use to test > the "JBCDirectory", as I call it? Perhaps something way I could > change the "index store provider" and re-run some existing tests, > and perhaps add some clustered tests specific to my plugin? > > Finally, regarding hosting, I am happy to contribute this to Lucene > (alongside the JEDirectory, etc) but if licensing (JBoss Cache is > LGPL, although the plugin code can be ASL if need be) or language > levels (the plugin depends on JBoss Cache 2.x, which requires JDK 5) > then I'm happy to host the plugin externally. > > Cheers, > -- > Manik Surtani > Lead, JBoss Cache > [EMAIL PROTECTED] > > > > > > -- Manik Surtani Lead, JBoss Cache [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ Support the World Aids Awareness campaign this month with Yahoo! For Good http://uk.promotions.yahoo.com/forgood/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1161) Punctuation handling in StandardTokenizer (and WikipediaTokenizer)
Punctuation handling in StandardTokenizer (and WikipediaTokenizer) -- Key: LUCENE-1161 URL: https://issues.apache.org/jira/browse/LUCENE-1161 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Grant Ingersoll Priority: Minor It would be useful, in the StandardTokenizer, to be able to have more control over in-word punctuation is handled. For instance, it is not always desirable to split on dashes or other punctuation. In other cases, one may want to output the split tokens plus a collapsed version of the token that removes the punctuation. For example, Solr's WordDelimiterFilter provides some nice capabilities here, but it can't do it's job when using the StandardTokenizer because the StandardTokenizer already makes the decision on how to handle it without giving the user any choice. I think, in JFlex, we can have a back-compatible way of letting users make decisions about punctuation that occurs inside of a token. Such as e-bay or i-pod, thus allowing for matches on iPod and eBay. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Commented: (LUCENE-1151) Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default
On Jan 29, 2008, at 12:10 PM, Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563576 #action_12563576 ] Michael McCandless commented on LUCENE-1151: Very good question ... I don't know. It would be awesome (and, amazing) if JFlex enabled some kind of inheritance. I asked on the JFlex user list (http://sourceforge.net/mailarchive/forum.php?forum_name=jflex-users ) but I don't see it in the docs anywhere. Since WikipediaTokenizer doesn't have the backwards compat requirement of StandardTokenizer, you can presumably just fix ACRONYM in WikipediaTokenizer to not match host names (ie hardwire to "true")? Yes. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1151) Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default
[ https://issues.apache.org/jira/browse/LUCENE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563576#action_12563576 ] Michael McCandless commented on LUCENE-1151: Very good question ... I don't know. It would be awesome (and, amazing) if JFlex enabled some kind of inheritance. Since WikipediaTokenizer doesn't have the backwards compat requirement of StandardTokenizer, you can presumably just fix ACRONYM in WikipediaTokenizer to not match host names (ie hardwire to "true")? > Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default > --- > > Key: LUCENE-1151 > URL: https://issues.apache.org/jira/browse/LUCENE-1151 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1151.patch > > > Coming out of the discussion around back compatibility, it seems best to > default StandardAnalyzer to properly fix LUCENE-1068, while preserving the > ability to get the back-compatible behavior in the rare event that it's > desired. > This just means changing the replaceInvalidAcronym = false with = true, and, > adding a clear entry to CHANGES.txt that this very slight non back compatible > change took place. > Spinoff from here: > http://www.gossamer-threads.com/lists/lucene/java-dev/57517#57517 > I'll commit that change in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: JBoss Cache as a store
Bump. Anyone? On 24 Jan 2008, at 14:07, Manik Surtani wrote: Hi guys I've just written a plugin for Lucene to use JBoss Cache as an index store. The benefits of something like this are: 1. Faster access to indexes as they will be in memory 2. Indexes replicated across a cluster of servers 3. Indexes "persisted" in clustered memory - faster that persistence to disk The implementation I have is pretty basic for now. Is there a set of tests in the Lucene sources I could use to test the "JBCDirectory", as I call it? Perhaps something way I could change the "index store provider" and re-run some existing tests, and perhaps add some clustered tests specific to my plugin? Finally, regarding hosting, I am happy to contribute this to Lucene (alongside the JEDirectory, etc) but if licensing (JBoss Cache is LGPL, although the plugin code can be ASL if need be) or language levels (the plugin depends on JBoss Cache 2.x, which requires JDK 5) then I'm happy to host the plugin externally. Cheers, -- Manik Surtani Lead, JBoss Cache [EMAIL PROTECTED] -- Manik Surtani Lead, JBoss Cache [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1151) Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default
[ https://issues.apache.org/jira/browse/LUCENE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12563548#action_12563548 ] Grant Ingersoll commented on LUCENE-1151: - Not necessarily related, but can you think of a way that we can keep WikipediaTokenizer and StandardTokenizer in sync for these kind of things. I guess I need to go look in JFlex to see if there is a way to do inheritance. Essentially, I want the WikiTokenizer to be StandardTokenizer plus handle the Wiki syntax appropriately. > Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default > --- > > Key: LUCENE-1151 > URL: https://issues.apache.org/jira/browse/LUCENE-1151 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1151.patch > > > Coming out of the discussion around back compatibility, it seems best to > default StandardAnalyzer to properly fix LUCENE-1068, while preserving the > ability to get the back-compatible behavior in the rare event that it's > desired. > This just means changing the replaceInvalidAcronym = false with = true, and, > adding a clear entry to CHANGES.txt that this very slight non back compatible > change took place. > Spinoff from here: > http://www.gossamer-threads.com/lists/lucene/java-dev/57517#57517 > I'll commit that change in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1156) Wikipedia Document Generation Changes
[ https://issues.apache.org/jira/browse/LUCENE-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1156: Attachment: LUCENE-1156.patch This patch fixes the redirect problem and the adds an option to discard image only documents (default is to keep them). It does not strip the template pages, nor does it expand them. Patch applies from contrib/benchmark > Wikipedia Document Generation Changes > - > > Key: LUCENE-1156 > URL: https://issues.apache.org/jira/browse/LUCENE-1156 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/benchmark, contrib/wikipedia >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-1156.patch > > > The EnwikiDocMaker currently produces a fair number of documents that are in > the download, but are of dubious use in terms of both benchmarking and > indexing. > These issues are: > # Redirect (it currently only handles REDIRECT and redirect, but there are > documents as Redirect > # Template files appear to be useless. These are marked by the term > Template: at the beginning of the body. See for example: > http://en.wikipedia.org/wiki/Template:=) > # Image only pages, as in > http://en.wikipedia.org/wiki/Image:Sciencefieldnewark.jpg.jpg These are > about as useful as the Redirects and Templates > # Files pending deletion: This one is a bit trickier to handle, but they are > generally marked by "Wikipedia:Votes for deletion" or some variation of that > depending where along it is in being deleted > I think I can implement this such that it is backward compatible, if there is > such a need when it comes to the contrib/benchmark suite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1154) System Reqs page should be release specific
[ https://issues.apache.org/jira/browse/LUCENE-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1154: Fix Version/s: 3.0 We can keep the existing one until 3.0 is released. > System Reqs page should be release specific > --- > > Key: LUCENE-1154 > URL: https://issues.apache.org/jira/browse/LUCENE-1154 > Project: Lucene - Java > Issue Type: Bug > Components: Website >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Trivial > Fix For: 3.0 > > > The System Requirements page, currently under the Main->Resources section of > the website should be part of a given version's documentation, since it will > be changing for a given release. > I will "deprecate" the existing one, but leave it in place(with a message) to > cover the existing releases that don't have this, but will also add it to the > release docs for future releases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1132) Highlighter Documentation updates
[ https://issues.apache.org/jira/browse/LUCENE-1132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1132. - Resolution: Fixed Lucene Fields: (was: [New]) Committed revision 616305. > Highlighter Documentation updates > - > > Key: LUCENE-1132 > URL: https://issues.apache.org/jira/browse/LUCENE-1132 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Trivial > Attachments: LUCENE-1132.patch > > > Various places in the Highlighter documentation refer to bytes (i.e. > SimpleFragmenter) when it should be chars. See > http://www.gossamer-threads.com/lists/lucene/java-user/56986 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1151) Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default
[ https://issues.apache.org/jira/browse/LUCENE-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1151: --- Attachment: LUCENE-1151.patch Attached patch that fixes the original bug (LUCENE-1068) by default, but offers system property & static method to keep backwards compatible yet buggy behavior. I'll commit in a day or two. > Fix StandardAnalyzer to not mis-identify HOST as ACRONYM by default > --- > > Key: LUCENE-1151 > URL: https://issues.apache.org/jira/browse/LUCENE-1151 > Project: Lucene - Java > Issue Type: Improvement > Components: Analysis >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1151.patch > > > Coming out of the discussion around back compatibility, it seems best to > default StandardAnalyzer to properly fix LUCENE-1068, while preserving the > ability to get the back-compatible behavior in the rare event that it's > desired. > This just means changing the replaceInvalidAcronym = false with = true, and, > adding a clear entry to CHANGES.txt that this very slight non back compatible > change took place. > Spinoff from here: > http://www.gossamer-threads.com/lists/lucene/java-dev/57517#57517 > I'll commit that change in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1150) The token types of the standard tokenizer is not accessible
[ https://issues.apache.org/jira/browse/LUCENE-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1150. Resolution: Fixed Fix Version/s: 2.4 I just committed this. Thanks for opening this Nicolas! > The token types of the standard tokenizer is not accessible > --- > > Key: LUCENE-1150 > URL: https://issues.apache.org/jira/browse/LUCENE-1150 > Project: Lucene - Java > Issue Type: Bug > Components: Analysis >Affects Versions: 2.3 >Reporter: Nicolas Lalevée >Assignee: Michael McCandless > Fix For: 2.4 > > Attachments: LUCENE-1150.patch, LUCENE-1150.take2.patch > > > The StandardTokenizerImpl not being public, these token types are not > accessible : > {code:java} > public static final int ALPHANUM = 0; > public static final int APOSTROPHE= 1; > public static final int ACRONYM = 2; > public static final int COMPANY = 3; > public static final int EMAIL = 4; > public static final int HOST = 5; > public static final int NUM = 6; > public static final int CJ= 7; > /** > * @deprecated this solves a bug where HOSTs that end with '.' are identified > * as ACRONYMs. It is deprecated and will be removed in the next > * release. > */ > public static final int ACRONYM_DEP = 8; > public static final String [] TOKEN_TYPES = new String [] { > "", > "", > "", > "", > "", > "", > "", > "", > "" > }; > {code} > So no custom TokenFilter can be based of the token type. Actually even the > StandardFilter cannot be writen outside the > org.apache.lucene.analysis.standard package. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1160) MergeException from CMS threads should record the Directory
[ https://issues.apache.org/jira/browse/LUCENE-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1160: --- Attachment: LUCENE-1160.patch > MergeException from CMS threads should record the Directory > --- > > Key: LUCENE-1160 > URL: https://issues.apache.org/jira/browse/LUCENE-1160 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1160.patch > > > When you hit an unhandled exception in ConcurrentMergeScheduler, it > throws a MergePolicy.MergeException, but there's no easy way to figure > out which index caused this (if you have more than one). > I plan to add the Directory to the MergeException. I also made a few > other small changes to ConcurrentMergeScheduler: > * Added handleMergeException method, which is called on exception, > so that you can subclass ConcurrentMergeScheduler to do something > when an exception occurs. > * Added getMergeThread() method so you can override how the threads > are created (eg, if you want to make them in a different thread > group, use a pool, change priorities, etc.). > * Added doMerge(...) to actually do this merge, so you can do > something before starting and after finishing a merge. > * Changed private -> protected on a few attrs > I plan to commit in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1160) MergeException from CMS threads should record the Directory
[ https://issues.apache.org/jira/browse/LUCENE-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1160: --- Attachment: (was: LUCENE-1150.patch) > MergeException from CMS threads should record the Directory > --- > > Key: LUCENE-1160 > URL: https://issues.apache.org/jira/browse/LUCENE-1160 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1160.patch > > > When you hit an unhandled exception in ConcurrentMergeScheduler, it > throws a MergePolicy.MergeException, but there's no easy way to figure > out which index caused this (if you have more than one). > I plan to add the Directory to the MergeException. I also made a few > other small changes to ConcurrentMergeScheduler: > * Added handleMergeException method, which is called on exception, > so that you can subclass ConcurrentMergeScheduler to do something > when an exception occurs. > * Added getMergeThread() method so you can override how the threads > are created (eg, if you want to make them in a different thread > group, use a pool, change priorities, etc.). > * Added doMerge(...) to actually do this merge, so you can do > something before starting and after finishing a merge. > * Changed private -> protected on a few attrs > I plan to commit in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1160) MergeException from CMS threads should record the Directory
[ https://issues.apache.org/jira/browse/LUCENE-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1160: --- Attachment: LUCENE-1150.patch > MergeException from CMS threads should record the Directory > --- > > Key: LUCENE-1160 > URL: https://issues.apache.org/jira/browse/LUCENE-1160 > Project: Lucene - Java > Issue Type: Improvement > Components: Index >Affects Versions: 2.3 >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 2.4 > > Attachments: LUCENE-1150.patch > > > When you hit an unhandled exception in ConcurrentMergeScheduler, it > throws a MergePolicy.MergeException, but there's no easy way to figure > out which index caused this (if you have more than one). > I plan to add the Directory to the MergeException. I also made a few > other small changes to ConcurrentMergeScheduler: > * Added handleMergeException method, which is called on exception, > so that you can subclass ConcurrentMergeScheduler to do something > when an exception occurs. > * Added getMergeThread() method so you can override how the threads > are created (eg, if you want to make them in a different thread > group, use a pool, change priorities, etc.). > * Added doMerge(...) to actually do this merge, so you can do > something before starting and after finishing a merge. > * Changed private -> protected on a few attrs > I plan to commit in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1160) MergeException from CMS threads should record the Directory
MergeException from CMS threads should record the Directory --- Key: LUCENE-1160 URL: https://issues.apache.org/jira/browse/LUCENE-1160 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.3 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Fix For: 2.4 Attachments: LUCENE-1150.patch When you hit an unhandled exception in ConcurrentMergeScheduler, it throws a MergePolicy.MergeException, but there's no easy way to figure out which index caused this (if you have more than one). I plan to add the Directory to the MergeException. I also made a few other small changes to ConcurrentMergeScheduler: * Added handleMergeException method, which is called on exception, so that you can subclass ConcurrentMergeScheduler to do something when an exception occurs. * Added getMergeThread() method so you can override how the threads are created (eg, if you want to make them in a different thread group, use a pool, change priorities, etc.). * Added doMerge(...) to actually do this merge, so you can do something before starting and after finishing a merge. * Changed private -> protected on a few attrs I plan to commit in a day or two. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]