Re: [Lucene.Net] 3.0.3
I can only guarantee that these 31 bugs here (in the 3.0.3 version): https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/CHANGES.txt are part of the code. I mean, it's possible that other's are, but we'd really need to check the others listed there to be sure that they are also included. However, that's only a difference of 9 bugs, so I think we're very close to a 3.0.3 release, depending on how many issues we want to get done that related to changing the API. Thanks, Christopher On Sat, Feb 4, 2012 at 10:03 AM, Prescott Nasser geobmx...@hotmail.comwrote: So, Chris if you did this as a direct port of the java version ( https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/), Does that mean that all of the LUCENE JIRA issues ( https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+LUCENE+AND+fixVersion+%3D+%223.0.3%22+AND+status+%3D+Closed+ORDER+BY+priority+DESCmode=hide) are part of this code already? That would make 3.0.3 well on it's way to release... ~P From: bode...@apache.org To: lucene-net-...@incubator.apache.org Date: Wed, 25 Jan 2012 12:35:25 +0100 Subject: Re: [Lucene.Net] 3.0.3 On 2012-01-25, Michael Herndon wrote: Do we have a standard of copy or tag of Java's version source that we're doing a compare against? I only see the 3_1 and above in the tags. Likely because the svn location has changed in between. I think it must be https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/ Stefan
Re: [Lucene.Net] 3.0.3
2653, 2055, 2776, 2732, 2688, 2616, 2524, 2398, 2284, 2278, 2277, and 2249 are all on JIRA that aren't on that list in the CHANGES.txt file. It looks like that file in SVN has some issues that aren't listen in JIRA. Anyway, it's possible that those issues listed here have already been ported as part of that changeset. I'm basing that on the fact that the last time these bugs were updated was Dec 1st 2010, which was before the code was released. However, we should still check to make sure. Thanks, Christopher On Sun, Feb 5, 2012 at 11:37 AM, Christopher Currens currens.ch...@gmail.com wrote: I can only guarantee that these 31 bugs here (in the 3.0.3 version): https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/CHANGES.txt are part of the code. I mean, it's possible that other's are, but we'd really need to check the others listed there to be sure that they are also included. However, that's only a difference of 9 bugs, so I think we're very close to a 3.0.3 release, depending on how many issues we want to get done that related to changing the API. Thanks, Christopher On Sat, Feb 4, 2012 at 10:03 AM, Prescott Nasser geobmx...@hotmail.comwrote: So, Chris if you did this as a direct port of the java version ( https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/), Does that mean that all of the LUCENE JIRA issues ( https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+LUCENE+AND+fixVersion+%3D+%223.0.3%22+AND+status+%3D+Closed+ORDER+BY+priority+DESCmode=hide) are part of this code already? That would make 3.0.3 well on it's way to release... ~P From: bode...@apache.org To: lucene-net-...@incubator.apache.org Date: Wed, 25 Jan 2012 12:35:25 +0100 Subject: Re: [Lucene.Net] 3.0.3 On 2012-01-25, Michael Herndon wrote: Do we have a standard of copy or tag of Java's version source that we're doing a compare against? I only see the 3_1 and above in the tags. Likely because the svn location has changed in between. I think it must be https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/ Stefan
[jira] [Updated] (SOLR-3056) Introduce Japanese field type in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated SOLR-3056: - Attachment: SOLR-3056_schema40.patch Introduce Japanese field type in schema.xml --- Key: SOLR-3056 URL: https://issues.apache.org/jira/browse/SOLR-3056 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3056_move.patch, SOLR-3056_schema40.patch, SOLR-3056_schema40.patch, SOLR-3056_schema40.patch Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again Robert, Uwe and Simon). It would be very good to get a default field type defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box support in Solr. I've been playing with the below configuration today, which I think is a reasonable starting point for Japanese. There's lot to be said about various considerations necessary when searching Japanese, but perhaps a wiki page is more suitable to cover the wider topic? In order to make the below {{text_ja}} field type work, Kuromoji itself and its analyzers need to be seen by the Solr classloader. However, these are currently in contrib and I'm wondering if we should consider moving them to core to make them directly available. If there are concerns with additional memory usage, etc. for non-Japanese users, we can make sure resources are loaded lazily and only when needed in factory-land. Any thoughts? {code:xml} !-- Text field type is suitable for Japanese text using morphological analysis NOTE: Please copy files contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar dist/apache-solr-analysis-extras-x.y.z.jar to your Solr lib directory (i.e. example/solr/lib) before before starting Solr. (x.y.z refers to a version number) If you would like to optimize for precision, default operator AND with solrQueryParser defaultOperator=AND/ below (this file). Use OR if you would like to optimize for recall (default). -- fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer !-- Kuromoji Japanese morphological analyzer/tokenizer Use search-mode to get a noun-decompounding effect useful for search. Example: 関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 (International) 空港 (airport) so we get a match for 空港 (airport) as we would expect from a good search engine Valid values for mode are: normal: default segmentation search: segmentation useful for search (extra compound splitting) extended: search mode with unigramming of unknown words (experimental) NOTE: Search mode improves segmentation for search at the expense of part-of-speech accuracy -- tokenizer class=solr.KuromojiTokenizerFactory mode=search/ !-- Reduces inflected verbs and adjectives to their base/dectionary forms (辞書形) -- filter class=solr.KuromojiBaseFormFilterFactory/ !-- Optionally remove tokens with certain part-of-speeches filter class=solr.KuromojiPartOfSpeechStopFilterFactory tags=stopTags.txt enablePositionIncrements=true/ -- !-- Normalizes full-width romaji to half-with and half-width kana to full-width (Unicode NFKC subset) -- filter class=solr.CJKWidthFilterFactory/ !-- Lower-case romaji characters -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3056) Introduce Japanese field type in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200688#comment-13200688 ] Christian Moen commented on SOLR-3056: -- Stopwords and stoptags for Solr are now tracked in SOLR-3097 and a patch is available. Introduce Japanese field type in schema.xml --- Key: SOLR-3056 URL: https://issues.apache.org/jira/browse/SOLR-3056 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3056_move.patch, SOLR-3056_schema40.patch, SOLR-3056_schema40.patch, SOLR-3056_schema40.patch Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again Robert, Uwe and Simon). It would be very good to get a default field type defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box support in Solr. I've been playing with the below configuration today, which I think is a reasonable starting point for Japanese. There's lot to be said about various considerations necessary when searching Japanese, but perhaps a wiki page is more suitable to cover the wider topic? In order to make the below {{text_ja}} field type work, Kuromoji itself and its analyzers need to be seen by the Solr classloader. However, these are currently in contrib and I'm wondering if we should consider moving them to core to make them directly available. If there are concerns with additional memory usage, etc. for non-Japanese users, we can make sure resources are loaded lazily and only when needed in factory-land. Any thoughts? {code:xml} !-- Text field type is suitable for Japanese text using morphological analysis NOTE: Please copy files contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar dist/apache-solr-analysis-extras-x.y.z.jar to your Solr lib directory (i.e. example/solr/lib) before before starting Solr. (x.y.z refers to a version number) If you would like to optimize for precision, default operator AND with solrQueryParser defaultOperator=AND/ below (this file). Use OR if you would like to optimize for recall (default). -- fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer !-- Kuromoji Japanese morphological analyzer/tokenizer Use search-mode to get a noun-decompounding effect useful for search. Example: 関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 (International) 空港 (airport) so we get a match for 空港 (airport) as we would expect from a good search engine Valid values for mode are: normal: default segmentation search: segmentation useful for search (extra compound splitting) extended: search mode with unigramming of unknown words (experimental) NOTE: Search mode improves segmentation for search at the expense of part-of-speech accuracy -- tokenizer class=solr.KuromojiTokenizerFactory mode=search/ !-- Reduces inflected verbs and adjectives to their base/dectionary forms (辞書形) -- filter class=solr.KuromojiBaseFormFilterFactory/ !-- Optionally remove tokens with certain part-of-speeches filter class=solr.KuromojiPartOfSpeechStopFilterFactory tags=stopTags.txt enablePositionIncrements=true/ -- !-- Normalizes full-width romaji to half-with and half-width kana to full-width (Unicode NFKC subset) -- filter class=solr.CJKWidthFilterFactory/ !-- Lower-case romaji characters -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3056) Introduce Japanese field type in schema.xml
[ https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200689#comment-13200689 ] Christian Moen commented on SOLR-3056: -- Updated patch for {{schema.xml}} on {{trunk}}. The field type {{text_ja}} now uses a {{KuromojiPartOfSpeechStopFilter}} and {{StopFilter}} for stopping and their configuration uses the stop sets in the SOLR-3097 patch. Hence, SOLR-3097 should be applied before or at the same time as this patch. Introduce Japanese field type in schema.xml --- Key: SOLR-3056 URL: https://issues.apache.org/jira/browse/SOLR-3056 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3056_move.patch, SOLR-3056_schema40.patch, SOLR-3056_schema40.patch, SOLR-3056_schema40.patch Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again Robert, Uwe and Simon). It would be very good to get a default field type defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box support in Solr. I've been playing with the below configuration today, which I think is a reasonable starting point for Japanese. There's lot to be said about various considerations necessary when searching Japanese, but perhaps a wiki page is more suitable to cover the wider topic? In order to make the below {{text_ja}} field type work, Kuromoji itself and its analyzers need to be seen by the Solr classloader. However, these are currently in contrib and I'm wondering if we should consider moving them to core to make them directly available. If there are concerns with additional memory usage, etc. for non-Japanese users, we can make sure resources are loaded lazily and only when needed in factory-land. Any thoughts? {code:xml} !-- Text field type is suitable for Japanese text using morphological analysis NOTE: Please copy files contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar dist/apache-solr-analysis-extras-x.y.z.jar to your Solr lib directory (i.e. example/solr/lib) before before starting Solr. (x.y.z refers to a version number) If you would like to optimize for precision, default operator AND with solrQueryParser defaultOperator=AND/ below (this file). Use OR if you would like to optimize for recall (default). -- fieldType name=text_ja class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=false analyzer !-- Kuromoji Japanese morphological analyzer/tokenizer Use search-mode to get a noun-decompounding effect useful for search. Example: 関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 (International) 空港 (airport) so we get a match for 空港 (airport) as we would expect from a good search engine Valid values for mode are: normal: default segmentation search: segmentation useful for search (extra compound splitting) extended: search mode with unigramming of unknown words (experimental) NOTE: Search mode improves segmentation for search at the expense of part-of-speech accuracy -- tokenizer class=solr.KuromojiTokenizerFactory mode=search/ !-- Reduces inflected verbs and adjectives to their base/dectionary forms (辞書形) -- filter class=solr.KuromojiBaseFormFilterFactory/ !-- Optionally remove tokens with certain part-of-speeches filter class=solr.KuromojiPartOfSpeechStopFilterFactory tags=stopTags.txt enablePositionIncrements=true/ -- !-- Normalizes full-width romaji to half-with and half-width kana to full-width (Unicode NFKC subset) -- filter class=solr.CJKWidthFilterFactory/ !-- Lower-case romaji characters -- filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3751) Align default Japanese configurations for Lucene and Solr
[ https://issues.apache.org/jira/browse/LUCENE-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen updated LUCENE-3751: --- Attachment: LUCENE-3751.patch Align default Japanese configurations for Lucene and Solr - Key: LUCENE-3751 URL: https://issues.apache.org/jira/browse/LUCENE-3751 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: LUCENE-3751.patch The {{KuromojiAnalyzer}} in Lucene shoud have the same default configuration as the {{text_ja}} field type introduced in {{schema.xml}} by SOLR-3056. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3751) Align default Japanese configurations for Lucene and Solr
[ https://issues.apache.org/jira/browse/LUCENE-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200690#comment-13200690 ] Christian Moen commented on LUCENE-3751: Patch for {{trunk}} is attached. The behavior or {{KuromojiAnalyzer}} is now the same as field type {{text_ja}} in Solr's example {{schema.xml}} (see SOLR-3056), including the order of the filters. I think it makes sense to have the {{LowerCaseFilter}} late in the chain as it might make sense to use a case-based {{StopFilter}}. It doesn't perhaps matter much in {{KuromojiAnalyzer}}'s case since the defaults don't do this anyway, but I thought it was good to practice to align configuration anyway. I've also clarified an error message and a javadoc. Align default Japanese configurations for Lucene and Solr - Key: LUCENE-3751 URL: https://issues.apache.org/jira/browse/LUCENE-3751 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: LUCENE-3751.patch The {{KuromojiAnalyzer}} in Lucene shoud have the same default configuration as the {{text_ja}} field type introduced in {{schema.xml}} by SOLR-3056. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200732#comment-13200732 ] Robert Muir commented on LUCENE-3745: - Thanks for doing this, it will be much nicer to have a properly built configuration here! I agree with the overall approach of leaning towards the conservative side: if someone wants they can always be more aggressive (and use the data on this issue as a guide). Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3726) Default KuromojiAnalyzer to use search mode
[ https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3726. - Resolution: Fixed Fix Version/s: 4.0 3.6 Assignee: Robert Muir Thanks Christian: I committed this. Default KuromojiAnalyzer to use search mode --- Key: LUCENE-3726 URL: https://issues.apache.org/jira/browse/LUCENE-3726 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.6, 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.6, 4.0 Attachments: LUCENE-3726.patch, LUCENE-3726.patch, LUCENE-3726.patch, kuromojieval.tar.gz Kuromoji supports an option to segment text in a way more suitable for search, by preventing long compound nouns as indexing terms. In general 'how you segment' can be important depending on the application (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this in chinese) The current algorithm punishes the cost based on some parameters (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc) for long runs of kanji. Some questions (these can be separate future issues if any useful ideas come out): * should these parameters continue to be static-final, or configurable? * should POS also play a role in the algorithm (can/should we refine exactly what we decompound)? * is the Tokenizer the best place to do this, or should we do it in a tokenfilter? or both? with a tokenfilter, one idea would be to also preserve the original indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0) from my understanding this tends to help with noun compounds in other languages, because IDF of the original term boosts 'exact' compound matches. but does a tokenfilter provide the segmenter enough 'context' to do this properly? Either way, I think as a start we should turn on what we have by default: its likely a very easy win. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200747#comment-13200747 ] Christian Moen commented on LUCENE-3745: Thanks a lot for looking at this, Robert. This was the thinking. (I've referred to the issue in the stopwords and stoptags files.) Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
[ https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200749#comment-13200749 ] Robert Muir commented on SOLR-3097: --- {quote} (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) {quote} It is a larger issue... in general we should make it easier to keep the two synchronized, but off the top of my head an idea for a plan was: * add 'snowball format' support to solr stopfilter so it can read all the lucene stopwords directly * add an ant task to synchronize the solr example from lucene's resources. * (of course) add fieldtypes that actually use all these files. On the other hand, realistically these resources are pretty static (don't change once added). So for now I don't think its a huge risk that we don't have an auto-sync process... but we need to tackle these problems to easily integrate european languages anyway. So I dont think this should block this issue, lets get japanese up and going for now. Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3097.patch SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
[ https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200752#comment-13200752 ] Christian Moen commented on SOLR-3097: -- Thanks a lot, Robert. Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3097.patch SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200753#comment-13200753 ] Robert Muir commented on LUCENE-3745: - Lets get my previous ad-hoc lists out of there :) I'll commit this for now and if there are any concerns we can reopen or refine in further issues. Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration
[ https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3745. - Resolution: Fixed Fix Version/s: 4.0 3.6 Thanks Christian! Need stopwords and stoptags lists for default Japanese configuration Key: LUCENE-3745 URL: https://issues.apache.org/jira/browse/LUCENE-3745 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Christian Moen Fix For: 3.6, 4.0 Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, top-100-pos.txt, top-pos.txt Stopwords and stoptags lists for Japanese needs to be developed, tested and integrated into Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
[ https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-3097: -- Attachment: SOLR-3097.patch ok this ant task was easy enough to write... here's my first stab at it. Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3097.patch, SOLR-3097.patch SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Welcome David Smiley
I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome David! Happy hacking, Mike McCandless http://blog.mikemccandless.com On Sun, Feb 5, 2012 at 8:46 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3752) move preflexrw to lucene3x package
[ https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200769#comment-13200769 ] Michael McCandless commented on LUCENE-3752: +1 move preflexrw to lucene3x package -- Key: LUCENE-3752 URL: https://issues.apache.org/jira/browse/LUCENE-3752 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Currently there are a lot of things made public in lucene3x codec, but all marked internal/experimental/deprecated. A lot of this is just so our test codec (preflexrw) can subclass it. I think we should just move it to the same package, then it call all be package-private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome! On Sun, Feb 5, 2012 at 8:46 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3749) Similarity.java javadocs and simplifications for 4.0
[ https://issues.apache.org/jira/browse/LUCENE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200771#comment-13200771 ] Michael McCandless commented on LUCENE-3749: +1 Similarity.java javadocs and simplifications for 4.0 Key: LUCENE-3749 URL: https://issues.apache.org/jira/browse/LUCENE-3749 Project: Lucene - Java Issue Type: Task Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3749.patch, LUCENE-3749_part2.patch As part of adding additional scoring systems to lucene, we made a lower-level Similarity and the existing stuff became e.g. TFIDFSimilarity which extends it. However, I always feel bad about the complexity introduced here (though I do feel there are some excuses, that its a difficult challenge). In order to try to mitigate this, we also exposed an easier API (SimilarityBase) on top of it that makes some assumptions (and trades off some performance) to try to provide something consumable for e.g. experiments. Still, we can cleanup a few things with the low-level api: fix outdated documentation and shoot for better/clearer naming etc. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
welcome! :) simon On Sun, Feb 5, 2012 at 3:03 PM, Robert Muir rcm...@gmail.com wrote: Welcome! On Sun, Feb 5, 2012 at 8:46 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3752) move preflexrw to lucene3x package
[ https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200784#comment-13200784 ] Simon Willnauer commented on LUCENE-3752: - +1 move preflexrw to lucene3x package -- Key: LUCENE-3752 URL: https://issues.apache.org/jira/browse/LUCENE-3752 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Currently there are a lot of things made public in lucene3x codec, but all marked internal/experimental/deprecated. A lot of this is just so our test codec (preflexrw) can subclass it. I think we should just move it to the same package, then it call all be package-private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Welcome David Smiley
Welcome! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Sunday, February 05, 2012 2:46 PM To: dev@lucene.apache.org Subject: Welcome David Smiley I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome David! Dawid On Sun, Feb 5, 2012 at 3:50 PM, Uwe Schindler u...@thetaphi.de wrote: Welcome! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Sunday, February 05, 2012 2:46 PM To: dev@lucene.apache.org Subject: Welcome David Smiley I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome! On 5 February 2012 16:01, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Welcome David! Dawid On Sun, Feb 5, 2012 at 3:50 PM, Uwe Schindler u...@thetaphi.de wrote: Welcome! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Sunday, February 05, 2012 2:46 PM To: dev@lucene.apache.org Subject: Welcome David Smiley I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen
[jira] [Updated] (SOLR-1860) improve stopwords list handling
[ https://issues.apache.org/jira/browse/SOLR-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-1860: -- Attachment: SOLR-1860.patch Now that Simon cleaned up wordlistloader, this is easy. Attached is a patch to support the snowball format (format=snowball) in StopFilterFactory and the common-grams factories. Along with something like the ant task in SOLR-3097, we should be able to move forwards with having some default configurations for other languages out-of-box. improve stopwords list handling --- Key: SOLR-1860 URL: https://issues.apache.org/jira/browse/SOLR-1860 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Attachments: SOLR-1860.patch, SOLR-1860.patch Currently Solr makes it easy to use english stopwords for StopFilter or CommonGramsFilter. Recently in lucene, we added stopwords lists (mostly, but not all from snowball) to all the language analyzers. So it would be nice if a user can easily specify that they want to use a french stopword list, and use it for StopFilter or CommonGrams. The ones from snowball, are however formatted in a different manner than the others (although in Lucene we have parsers to deal with this). Additionally, we abstract this from Lucene users by adding a static getDefaultStopSet to all analyzers. There are two approaches, the first one I think I prefer the most, but I'm not sure it matters as long as we have good examples (maybe a foreign language example schema?) 1. The user would specify something like: filter class=solr.StopFilterFactory fromAnalyzer=org.apache.lucene.analysis.FrenchAnalyzer .../ This would just grab the CharArraySet from the FrenchAnalyzer's getDefaultStopSet method, who cares where it comes from or how its loaded. 2. We add support for snowball-formatted stopwords lists, and the user could something like: filter class=solr.StopFilterFactory words=org/apache/lucene/analysis/snowball/french_stop.txt format=snowball ... / The disadvantage to this is they have to know where the list is, what format its in, etc. For example: snowball doesn't provide Romanian or Turkish stopword lists to go along with their stemmers, so we had to add our own. Let me know what you guys think, and I will create a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Good to have you aboard Erick On Sun, Feb 5, 2012 at 10:20 AM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: Welcome! On 5 February 2012 16:01, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Welcome David! Dawid On Sun, Feb 5, 2012 at 3:50 PM, Uwe Schindler u...@thetaphi.de wrote: Welcome! - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Sunday, February 05, 2012 2:46 PM To: dev@lucene.apache.org Subject: Welcome David Smiley I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3752) move preflexrw to lucene3x package
[ https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200792#comment-13200792 ] Robert Muir commented on LUCENE-3752: - Thanks for the comments guys, Ill do the svn moves and make it all package-private (Except the codec). I think it was especially confusing to see SegmentTerm[Enum/Docs/Positions] that resemble 3.x apis as public classes in 4.0 (even if they are deprecated/experimental/internal/full of warnings)... they are really internal implementation details :) move preflexrw to lucene3x package -- Key: LUCENE-3752 URL: https://issues.apache.org/jira/browse/LUCENE-3752 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Currently there are a lot of things made public in lucene3x codec, but all marked internal/experimental/deprecated. A lot of this is just so our test codec (preflexrw) can subclass it. I think we should just move it to the same package, then it call all be package-private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3752) move preflexrw to lucene3x package
[ https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3752. - Resolution: Fixed Committed revision 1240750. move preflexrw to lucene3x package -- Key: LUCENE-3752 URL: https://issues.apache.org/jira/browse/LUCENE-3752 Project: Lucene - Java Issue Type: Task Reporter: Robert Muir Fix For: 4.0 Currently there are a lot of things made public in lucene3x codec, but all marked internal/experimental/deprecated. A lot of this is just so our test codec (preflexrw) can subclass it. I think we should just move it to the same package, then it call all be package-private. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Website Update
I'm hitting a few snags w/ the build system related to bringing over the old content, but am otherwise ready to do the move. Trying to get some help from infra, but it is Sunday morning, so... -Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()
[ https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3746: Attachment: LUCENE-3746.patch Updated patch using ManagementFactory.getMemoryMXBean().getHeapMemoryUsage(). Javadocs are not explicit about this call being atomic, but from the wording it seems almost certain to conclude that each call returns a new Usage instance. In this patch this is (Java) asserted and the assert passes (-ea) in two different JVMs - IBM and Oracle - so this might be correct. I searched some more explicit info on this with no success. Annoyingly though, in IBM JDK, running the tests like this produces the nice warning: {noformat} WARNING: test class left thread running: Thread[MemoryPoolMXBean notification dispatcher,6,main] RESOURCE LEAK: test class left 1 thread(s) running {noformat} This makes me reluctant to use the memory bean - I did not find a way to prevent that thread leak. So perhaps a better approach would be to be conservative about the sequence of calls when using Runtime? something like this: {code} long free = rt.freeMemory(); if (free is sufficient) return decideBy(free); long max = rt.maxMemory(); long total = rt.totalMemory(); return decideBy(max - total) {code} This is conservative in that 'total' is computed last, and in that total-free is not added to the computed available bytes. In both approaches, even if atomicity is guaranteed, it is possible that more heap is allocated in another thread between the time that the size is computed, to the time that the bytes are actually allocated, so not sure how safe this check can be made. suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory() -- Key: LUCENE-3746 URL: https://issues.apache.org/jira/browse/LUCENE-3746 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Reporter: Doron Cohen Fix For: 3.6, 4.0 Attachments: LUCENE-3746.patch, LUCENE-3746.patch Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()
[ https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3746: Attachment: LUCENE-3746.patch Updated patch - without MemoryMXBean - computing 'max, total, free' (in that order) and deciding by 'free' or falling to 'max-free'. This is more conservative, than MemoryMxBean but since the latter is not full proof either, I prefer the simpler approach. suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory() -- Key: LUCENE-3746 URL: https://issues.apache.org/jira/browse/LUCENE-3746 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Reporter: Doron Cohen Fix For: 3.6, 4.0 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2667) Finish Solr Admin UI
[ https://issues.apache.org/jira/browse/SOLR-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200809#comment-13200809 ] Erick Erickson commented on SOLR-2667: -- re: SOLR-3094. If someone with javascript skills has the time/energy to help out with SOLR-3094, it would be awesome. I'm flying blind here. I can handle the LukeRequestHandler stuff, but it'll take a long time for me to figure out the javascript side. Essentially, this problem makes the new UI unusable for any large index. Finish Solr Admin UI Key: SOLR-2667 URL: https://issues.apache.org/jira/browse/SOLR-2667 Project: Solr Issue Type: Improvement Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 4.0 Attachments: SOLR-2667-110722.patch In SOLR-2399, we added a new admin UI. The issue has gotten too long to follow, so this is a new issue to track remaining tasks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome aboard David! On Feb 5, 2012, at 8:46 AM, Grant Ingersoll wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome David! Cheers, Tommaso 2012/2/5 Grant Ingersoll gsing...@apache.org I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()
[ https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200834#comment-13200834 ] Dawid Weiss commented on LUCENE-3746: - As for spawning MemoryPoolMXBean -- I wouldn't be worried about it, it's probably a system daemon thread for sending memory threshold notifications (didn't check though). I will peek at openjdk sources and see how the mx is implemented to verify if it's atomic or not (not a guarantee, just curiosity). suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory() -- Key: LUCENE-3746 URL: https://issues.apache.org/jira/browse/LUCENE-3746 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Reporter: Doron Cohen Fix For: 3.6, 4.0 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-1860) improve stopwords list handling
[ https://issues.apache.org/jira/browse/SOLR-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-1860. --- Resolution: Fixed Fix Version/s: 4.0 3.6 I committed this. Ill open up a new issue (related to SOLR-3097), to provide setups for other languages. improve stopwords list handling --- Key: SOLR-1860 URL: https://issues.apache.org/jira/browse/SOLR-1860 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.1 Reporter: Robert Muir Assignee: Robert Muir Priority: Minor Fix For: 3.6, 4.0 Attachments: SOLR-1860.patch, SOLR-1860.patch Currently Solr makes it easy to use english stopwords for StopFilter or CommonGramsFilter. Recently in lucene, we added stopwords lists (mostly, but not all from snowball) to all the language analyzers. So it would be nice if a user can easily specify that they want to use a french stopword list, and use it for StopFilter or CommonGrams. The ones from snowball, are however formatted in a different manner than the others (although in Lucene we have parsers to deal with this). Additionally, we abstract this from Lucene users by adding a static getDefaultStopSet to all analyzers. There are two approaches, the first one I think I prefer the most, but I'm not sure it matters as long as we have good examples (maybe a foreign language example schema?) 1. The user would specify something like: filter class=solr.StopFilterFactory fromAnalyzer=org.apache.lucene.analysis.FrenchAnalyzer .../ This would just grab the CharArraySet from the FrenchAnalyzer's getDefaultStopSet method, who cares where it comes from or how its loaded. 2. We add support for snowball-formatted stopwords lists, and the user could something like: filter class=solr.StopFilterFactory words=org/apache/lucene/analysis/snowball/french_stop.txt format=snowball ... / The disadvantage to this is they have to know where the list is, what format its in, etc. For example: snowball doesn't provide Romanian or Turkish stopword lists to go along with their stemmers, so we had to add our own. Let me know what you guys think, and I will create a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()
[ https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200843#comment-13200843 ] Dawid Weiss commented on LUCENE-3746: - Just checked and it seems to be that within a single memory pool the results will be atomic. Unfortunately that call aggregates all memory pools and (depending on the GC used) this may result in inconsistencies if the calculation happens to be interwoven with garbage collector activity. As stated in the sources of G1, for example: {noformat} // 4) Now, there is a very subtle issue with all the above. The // framework will call get_memory_usage() on the three pools // asynchronously. As a result, each call might get a different value // for, say, survivor_num which will yield inconsistent values for // eden_used, survivor_used, and old_gen_used (as survivor_num is used // in the calculation of all three). This would normally be // ok. However, it's possible that this might cause the sum of // eden_used, survivor_used, and old_gen_used to go over the max heap // size and this seems to sometimes cause JConsole (and maybe other // clients) to get confused. There's not a really an easy / clean // solution to this problem, due to the asynchrounous nature of the // framework. {noformat} Makes sense to me. I wouldn't bother with management interface then and just use the Runtime.* heuristic you proposed. suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory() -- Key: LUCENE-3746 URL: https://issues.apache.org/jira/browse/LUCENE-3746 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Reporter: Doron Cohen Fix For: 3.6, 4.0 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
ToParentBlockJoinQuery vs filtered search
Hello, I'd like to contribute BlockJoinQParserPlugin for Solr. It's not a very big deal, but I'm stuck during writing filtered search test cases. At the first glance it looks like deja vu for another join https://issues.apache.org/jira/browse/SOLR-3062 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java?r1=1238085r2=1239355. But then I realized that it's a question about requirements: What is the expected functionality for ToParentBlockJoinQuery for filtered search IndexSearcher.search(Query, *Filter*, Collector)? whether the given filter is applied to children documents or to the parent documents? Considering Solr's fq= I suppose that there is more sense to apply that filter to parent documents. WDYT? I'm attaching the small amendments to TestBlockJoin to get you my understanding. Thanks in advance. -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com Index: modules/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java === --- modules/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java (revision 1237200) +++ modules/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java (working copy) @@ -155,6 +155,63 @@ public class TestBlockJoin extends LuceneTestCase { dir.close(); } + public void testSimpleFilter() throws Exception { + + final Directory dir = newDirectory(); + final RandomIndexWriter w = new RandomIndexWriter(random, dir); + + final ListDocument docs = new ArrayListDocument(); + + docs.add(makeJob(java, 2007)); + docs.add(makeJob(python, 2010)); + docs.add(makeResume(Lisa, United Kingdom)); + w.addDocuments(docs); + + docs.clear(); + docs.add(makeJob(ruby, 2005)); + docs.add(makeJob(java, 2006)); + docs.add(makeResume(Frank, United States)); + w.addDocuments(docs); + + IndexReader r = w.getReader(); + w.close(); + IndexSearcher s = newSearcher(r); + + // Create a filter that defines parent documents in the index - in this case resumes + Filter parentsFilter = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term(docType, resume; + + // Define child document criteria (finds an example of relevant work experience) + BooleanQuery childQuery = new BooleanQuery(); + childQuery.add(new BooleanClause(new TermQuery(new Term(skill, java)), Occur.MUST)); + childQuery.add(new BooleanClause(NumericRangeQuery.newIntRange(year, 2006, 2011, true, true), Occur.MUST)); + + // Define parent document criteria (find a resident in the UK) + Query parentQuery = new TermQuery(new Term(country, United Kingdom)); + + // Wrap the child document query to 'join' any matches + // up to corresponding parent: + ToParentBlockJoinQuery childJoinQuery = new ToParentBlockJoinQuery(childQuery, parentsFilter, ToParentBlockJoinQuery.ScoreMode.Avg); + + assertEquals(no filter - both passed,s.search(childJoinQuery, 10).totalHits, 2); + assertEquals(dummy filter passes everyone ,s.search(childJoinQuery, parentsFilter, 10).totalHits, 2); + + // not found test + TopDocs ozHabitants = s.search(childJoinQuery , new CachingWrapperFilter( new QueryWrapperFilter(new TermQuery(new Term(country, Oz, 10); + assertEquals(noone live there,0, ozHabitants.totalHits); + + // apply the UK filter by the searcher + TopDocs ukOnly = s.search(childJoinQuery, new CachingWrapperFilter(new QueryWrapperFilter(parentQuery)), 10); + //TopDocs ukOnly = s.search(childJoinQuery, new QueryWrapperFilter(parentQuery), 10); + assertEquals(has filter - single passed,1, ukOnly.totalHits); + assertEquals( Lisa, r.document(ukOnly.scoreDocs[0].doc).get(name)); + // looking for US candidates + TopDocs usThen = s.search(childJoinQuery , new CachingWrapperFilter( new QueryWrapperFilter(new TermQuery(new Term(country, United States, 10); + assertEquals(has filter - single passed, 1,usThen.totalHits); + assertEquals(Frank, r.document(usThen.scoreDocs[0].doc).get(name)); + r.close(); + dir.close(); + } + private Document getParentDoc(IndexReader reader, Filter parents, int childDocID) throws IOException { final AtomicReaderContext[] leaves = ReaderUtil.leaves(reader.getTopReaderContext()); final int subIndex = ReaderUtil.subIndex(childDocID, leaves); - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Welcome David Smiley
Welcome David! -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Sunday, February 05, 2012 8:46 AM To: dev@lucene.apache.org Subject: Welcome David Smiley I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3753) Restructure the Lucene build system
Restructure the Lucene build system --- Key: LUCENE-3753 URL: https://issues.apache.org/jira/browse/LUCENE-3753 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 3.6, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Split out separate core/, test-framework/, and tools/ modules, each with its own build.xml, under the lucene/ directory, similar to the Solr restructuring done in SOLR-2452. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3753) Restructure the Lucene build system
[ https://issues.apache.org/jira/browse/LUCENE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3753: Attachment: LUCENE-3753.patch Patch implementing the idea, along with a script to fix existing patches against the old structure to be against the new structure. Run this svn move script before applying the patch: {noformat} svn mv --parents lucene/src/java lucene/core/src/java svn mv --parents lucene/src/test lucene/core/src/test svn mv --parents lucene/src/resources lucene/core/src/resources svn mv lucene/src/site lucene/site svn mv --parents lucene/src/test-framework/java lucene/test-framework/src/java svn mv --parents lucene/src/test-framework/resources lucene/test-framework/src/resources svn mv --parents lucene/src/tools/java lucene/tools/src/java svn mv --parents lucene/src/tools/javadoc lucene/tools/javadoc svn mv --parents lucene/src/tools/prettify lucene/tools/prettify svn rm lucene/src svn mv --parents dev-tools/maven/lucene/src/pom.xml.template dev-tools/maven/lucene/core/pom.xml.template svn mv --parents dev-tools/maven/lucene/src/test-framework/pom.xml.template dev-tools/maven/lucene/test-framework/pom.xml.template svn rm dev-tools/maven/lucene/src {noformat} I think this is ready to go. Restructure the Lucene build system --- Key: LUCENE-3753 URL: https://issues.apache.org/jira/browse/LUCENE-3753 Project: Lucene - Java Issue Type: Improvement Components: general/build Affects Versions: 3.6, 4.0 Reporter: Steven Rowe Assignee: Steven Rowe Attachments: LUCENE-3753.patch Split out separate core/, test-framework/, and tools/ modules, each with its own build.xml, under the lucene/ directory, similar to the Solr restructuring done in SOLR-2452. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3602) Add join query to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200913#comment-13200913 ] Martijn van Groningen commented on LUCENE-3602: --- Jason: Better late then never... BRH is used to collect the matching from terms. The DTI just contains all terms / ords for a field. Comparing DTI ords isn't going to work when a term is in more than one segment or appears in a different field (fromField / toField). So I think the BRH can't be replaced by the DTI. The BRH could be cached per query. Add join query to Lucene Key: LUCENE-3602 URL: https://issues.apache.org/jira/browse/LUCENE-3602 Project: Lucene - Java Issue Type: New Feature Components: modules/join Reporter: Martijn van Groningen Fix For: 3.6, 4.0 Attachments: LUCENE-3602-3x.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch Solr has (psuedo) join query for a while now. I think this should also be available in Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3602) Add join query to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated LUCENE-3602: -- Attachment: LUCENE-3602-3x.patch Attached updated version of query time joining for 3x branch. Instead of doing a binary search for each term comparison it seeks / iterates forward. It can't do seeking like we do in trunk, so it isn't as fast as in trunk. However I do think this can be committed to at least have query time join support in 3x. Back porting per segment filtering and the MTQ that is in trunk is quite some work... Add join query to Lucene Key: LUCENE-3602 URL: https://issues.apache.org/jira/browse/LUCENE-3602 Project: Lucene - Java Issue Type: New Feature Components: modules/join Reporter: Martijn van Groningen Fix For: 3.6, 4.0 Attachments: LUCENE-3602-3x.patch, LUCENE-3602-3x.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch Solr has (psuedo) join query for a while now. I think this should also be available in Lucene. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1758) schema definition for configuration files (validation, XSD)
[ https://issues.apache.org/jira/browse/SOLR-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200933#comment-13200933 ] Mike Sokolov commented on SOLR-1758: Yes - the schema will have to evolve as the config files evolve, so there will be a need for a new version, probably with each release. I think matching using LuceneMatchVersion makes a lot of sense. For versions that are old enough (eg a different major release), the validator could still run, but produce warnings only. Or else it could simply produce a message saying: warning; stale config version, not validating or something to that effect. I'm not clear on how reasonable it is to maintain an old config version: isn't this the kind of thing that users will *want* to be prompted to revisit? schema definition for configuration files (validation, XSD) --- Key: SOLR-1758 URL: https://issues.apache.org/jira/browse/SOLR-1758 Project: Solr Issue Type: New Feature Reporter: Jorg Heymans Labels: configuration, schema.xml, solrconfig.xml, validation, xsd Fix For: 4.0 Attachments: config-validation-20110523.patch It is too easy to make configuration errors in Solr without getting warnings. We should explore ways of validation configurations. See mailing list discussion at http://search-lucene.com/m/h6xKf1EShE6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3098) analysis gui hangs if no tokens are output
analysis gui hangs if no tokens are output -- Key: SOLR-3098 URL: https://issues.apache.org/jira/browse/SOLR-3098 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir try entering the for text_en -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ToParentBlockJoinQuery vs filtered search
Hi Mikhail, BlockJoinQParserPlugin sounds cool! I think you're right: the incoming filter will apply to the to document space. So, for ToParentBJQ it's parent docs, and ToChildBJQ it's child docs. The filter only needs to define the bits for docs in that to space... the other bits will not be used. It looks like that's what your test case is testing for...? Does it pass? Mike McCandless http://blog.mikemccandless.com On Sun, Feb 5, 2012 at 3:25 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, I'd like to contribute BlockJoinQParserPlugin for Solr. It's not a very big deal, but I'm stuck during writing filtered search test cases. At the first glance it looks like deja vu for another join https://issues.apache.org/jira/browse/SOLR-3062 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java?r1=1238085r2=1239355. But then I realized that it's a question about requirements: What is the expected functionality for ToParentBlockJoinQuery for filtered search IndexSearcher.search(Query, *Filter*, Collector)? whether the given filter is applied to children documents or to the parent documents? Considering Solr's fq= I suppose that there is more sense to apply that filter to parent documents. WDYT? I'm attaching the small amendments to TestBlockJoin to get you my understanding. Thanks in advance. -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3754) Store generated archive manifests in per-module output directories
Store generated archive manifests in per-module output directories -- Key: LUCENE-3754 URL: https://issues.apache.org/jira/browse/LUCENE-3754 Project: Lucene - Java Issue Type: Improvement Reporter: Steven Rowe Assignee: Steven Rowe Priority: Minor Currently, generated archive manifests are all stored in the same location, so each module's build overwrites the previously built module's manifest. Locating these files in the per-module build dirs will allow them to be rebuilt only when necessary, rather than every time a module's {{jar}} target is called. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
welcome david! On Sun, Feb 5, 2012 at 5:46 AM, Grant Ingersoll gsing...@apache.org wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way
[ https://issues.apache.org/jira/browse/LUCENE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3736: -- Attachment: LUCENE-3736.patch Attached is a patch implementing the above proposal using the builder pattern. The builder pattern (sorry Robert), is the only nice setup that allows to set properties like ignroing stored fields on the parallel readers, but make the built reader unmodifiable! ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way -- Key: LUCENE-3736 URL: https://issues.apache.org/jira/browse/LUCENE-3736 Project: Lucene - Java Issue Type: Sub-task Components: core/index Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-3736.patch, LUCENE-3736.patch ParallelReader is now atomic. We should add a sugar wrapper method to allow synchronized composite readers (with same segment sizes) to be aligned with MultiReaders or wrapped by Slow: - one ParallelReader with Slow wrapped parallel readers, they only need same maxDoc() (and deletions) - a MultiReader containing all sub-ParallelReaders. This needs CompositeReaders with same docStarts[] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way
[ https://issues.apache.org/jira/browse/LUCENE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198337#comment-13198337 ] Uwe Schindler edited comment on LUCENE-3736 at 2/5/12 11:56 PM: Here just my cleanup work in ParallelReader, nothing new. It's as before, only bugs (missing open checks) fixed and code violations (synthetic accessors, final fields). The next step will be to remove the add() methods, as IndexReaders should not be changed after create. Will work more tomorrow. The plan is: - Move all subreaders to ctor (builder-like API. First build reader-set, then call build) - Rename ParallelReader to ParallelAtomicReader - Add a ParallelCompositeReader with same builder API, but taking any CompositeReader-set and checks them that they are aligned (docStarts identical). The subreaders are ParallelAtomicReaders. was (Author: thetaphi): Here just my cleanup work in ParallelReader, nothing new. It's as before, only bugs (missing open checks) fixed and code violations (synthetic accessors, final fields). The next step will be to remove the add() methods, as IndexReaders should not be changed after create. Will work more tomorrow. The plan is: - Move all subreaders to ctor (builder-like API. First build reader-set, then call build) - Rename ParallelReader to AtomicParallelReader - Add a CompositeParallelReader with same builder API, but taking any CompositeReader-set and checks them that they are aligned (docStarts identical). The subreaders are AtomicParallelReaders. ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way -- Key: LUCENE-3736 URL: https://issues.apache.org/jira/browse/LUCENE-3736 Project: Lucene - Java Issue Type: Sub-task Components: core/index Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-3736.patch, LUCENE-3736.patch ParallelReader is now atomic. We should add a sugar wrapper method to allow synchronized composite readers (with same segment sizes) to be aligned with MultiReaders or wrapped by Slow: - one ParallelReader with Slow wrapped parallel readers, they only need same maxDoc() (and deletions) - a MultiReader containing all sub-ParallelReaders. This needs CompositeReaders with same docStarts[] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way
[ https://issues.apache.org/jira/browse/LUCENE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200975#comment-13200975 ] Uwe Schindler commented on LUCENE-3736: --- There are som test todos: The tests for parallel readers are very simplistic and have only 2 documents (which is especially stupid for composite readers to test them). We should raise number of documents. ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way -- Key: LUCENE-3736 URL: https://issues.apache.org/jira/browse/LUCENE-3736 Project: Lucene - Java Issue Type: Sub-task Components: core/index Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-3736.patch, LUCENE-3736.patch ParallelReader is now atomic. We should add a sugar wrapper method to allow synchronized composite readers (with same segment sizes) to be aligned with MultiReaders or wrapped by Slow: - one ParallelReader with Slow wrapped parallel readers, they only need same maxDoc() (and deletions) - a MultiReader containing all sub-ParallelReaders. This needs CompositeReaders with same docStarts[] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Welcome David! (12/02/05 22:46), Grant Ingersoll wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- http://www.rondhuit.com/en/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1758) schema definition for configuration files (validation, XSD)
[ https://issues.apache.org/jira/browse/SOLR-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200979#comment-13200979 ] Jan Høydahl commented on SOLR-1758: --- A not validating warning due to pre Solr4.x version sounds good to me. Would it make more sense to keep the xsd file(s) inside the WAR instead of in example/solr/conf? This way you don't need to copy all the schemas (for v4.0, 4.1, 4.2 etc) around with your solr config. Then add a JavaOpt which can disable validation -{{Dsolr.validate=false}} schema definition for configuration files (validation, XSD) --- Key: SOLR-1758 URL: https://issues.apache.org/jira/browse/SOLR-1758 Project: Solr Issue Type: New Feature Reporter: Jorg Heymans Labels: configuration, schema.xml, solrconfig.xml, validation, xsd Fix For: 4.0 Attachments: config-validation-20110523.patch It is too easy to make configuration errors in Solr without getting warnings. We should explore ways of validation configurations. See mailing list discussion at http://search-lucene.com/m/h6xKf1EShE6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Heartly welcome, we need the committing bandwidth you add to the project! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 5. feb. 2012, at 14:46, Grant Ingersoll wrote: I'm pleased to announce that the Lucene PMC has elected to add David Smiley as a committer to the Lucene/Solr project in recognition of his ongoing contributions. David, custom is to say a little bit about yourself, so feel free to give a little background on yourself. Welcome aboard, Grant - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3099) Add query operator, index structure, and analyzer for exact match searching
Add query operator, index structure, and analyzer for exact match searching - Key: SOLR-3099 URL: https://issues.apache.org/jira/browse/SOLR-3099 Project: Solr Issue Type: New Feature Components: Schema and Analysis Reporter: Mike A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, runs, and not the stemmed word run. In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. What I'd like instead is two things: 1. An analyzer that says, store the original form of all words in the index along with the stemmed variations. If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise. 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me. This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2866) Marked synonym filter for selective token expansion
[ https://issues.apache.org/jira/browse/SOLR-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201002#comment-13201002 ] Mike commented on SOLR-2866: Hi. FYI, I've created a new issue, SOLR-3099, that is requesting that this feature be supported in the index and the edismax parser. I don't *think* the overlap is huge, but that seemed like a better approach to me, so I've created a branch of the conversation over there. Marked synonym filter for selective token expansion --- Key: SOLR-2866 URL: https://issues.apache.org/jira/browse/SOLR-2866 Project: Solr Issue Type: Improvement Components: Schema and Analysis Environment: Solr 3.4 Reporter: Victor van der Wolf Priority: Minor Labels: stemming, synonyms Fix For: 3.6 Attachments: MarkedSynonymFilterFactory.java, SlowMarkedSynonymFilter.java, SlowMarkedSynonymFilterFactory.java Hi everybody, My name is Victor van der Wolf and since recently I work for the Royal Library in the Netherlands. One of my first assignments here was to see if I could implement some stemming algorithm for our websites. Our search engine is solr/lucene 3.4. Basically I had 2 requirements to work with: 1) It should be possible to switch the stemming functionality on and off in the front end 2) No extra storage should be required (no extra indexing). I shortly came to the conclusion that it would be practical to use the SynonymFilter to do that. I got hold of a dutch library and used a stemming algorithm to generate a synonym file on that. Then I thought that I could maybe use 2 different query analyzers under the field type and then call one or the other depending if I want stemming or not, like this q=field:analyzer:search term. Unfortunately this did not seem possible. Then, after some discussions with Erick Erickson, it became clear that a good approach could be to write my own SynonymFilter and apply some kind of token marking to decide it that token should be synonymized or not. Well, I did just that and it works like a charm. I would like to contribute this MarkedSynonymFilter class to the project. I used the SynonymFilter class as a starting point and added some extra functionality to that. First of all, I added 3 new parameters called lookup, preMark and postmark. The preMark and postmark parameters contain some kind of pre- and suffix to recognize if a token should be synonymized or not. A simple regex is used to determine this. Then the lookup parameter determines the behaviour of the MarkedSynonymFilter: lookup=marked - marked tokens will be synonymized lookup=unmarked - unmarked tokens will be synonymized lookup=all - all tokens should be synonymized lookup=none - none of the tokens should be synonymized I started out writing this based on version 3.3, later I discovered that we were using 3.4 and I had to upgrade it. Unfortunately the whole SynonymFilter code has been revised and for the moment there is the Slow and the Fast synonym filter where the Slow one if depricated. My addition is based on the slow version I am afraid. Anyway, I am curious about your comments. Please let me know if I should go forward with this and create a JIRA issue + my code as a patch. Cheers, Victor van der Wolf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3100) Add an operator to edismax for term quorum
[ https://issues.apache.org/jira/browse/SOLR-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201003#comment-13201003 ] Mike commented on SOLR-3100: Oops. Please ignore the bit about stemming above. Poor copy/paste on my behalf. Add an operator to edismax for term quorum -- Key: SOLR-3100 URL: https://issues.apache.org/jira/browse/SOLR-3100 Project: Solr Issue Type: New Feature Components: search Reporter: Mike Original Estimate: 2h Remaining Estimate: 2h A project I'm working on requires *term quorum* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ A AND (B C D)/2 ] to return only documents that contain A or at least two of B, C or D. So this document would match: a b c But this one wouldn't: a b This can be a useful form of fuzzy searching, and I think we support it via the MM parameter, but we lack a user-facing operator for this. It would be great to add it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request)
[ https://issues.apache.org/jira/browse/SOLR-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201006#comment-13201006 ] Mike commented on SOLR-3028: Agreed - now that we're talking through three threads simultaneously, it seems obvious we need three tickets. This one can serve as a meta ticket, I suppose. Therefore: 1. I split off *exact match* into SOLR-3099, and made a comment in SOLR-2866. I think they're different enough to warrant separate issues. 2. I split off *quorum search* into SOLR-3100. 3. I split off *word order* to issue SOLR-3101.. And I'll set depends on flags shortly here, assuming I have the needed permissions. Thanks again for the guidance and help, Hoss. Support for additional query operators (feature parity request) --- Key: SOLR-3028 URL: https://issues.apache.org/jira/browse/SOLR-3028 Project: Solr Issue Type: Improvement Components: search Affects Versions: 4.0 Reporter: Mike Labels: operator, queryparser Original Estimate: 6h Remaining Estimate: 6h I'm migrating my system from Sphinx Search, and there are a couple of operators that are not available to Solr, which are available in Sphinx. I would love to see the following added to the Dismax parser: 1. Exact match. This might be tricky to get right, since it requires work on the index side as well[1], but in Sphinx, you can do a query such as [ =running walking ], and running will have stemming off, while walking will have it on. 2. Term quorum. In Sphinx and some commercial search engines (like Recommind, Westlaw and Lexis), you can do a search such as [ (cat dog goat)/15 ], and find the three words within 15 terms of each other. I think this is possible in the backend via the span query, but there's no front end option for it, so it's quite hard to reveal to users. 3. Word order. Being able to say, this term before that one, and this other term before the next is something else in Sphinx that span queries support, but is missing in the query parser. Would be great to get this in too. These seem like the three biggest missing operators in Solr to me. I would love to help move these forward if there is any way I can help. [1] At least, *I* think it does. There's some discussion of one way of doing exact match like support in SOLR-2866. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3101) Add an operator to edismax for word order
Add an operator to edismax for word order - Key: SOLR-3101 URL: https://issues.apache.org/jira/browse/SOLR-3101 Project: Solr Issue Type: New Feature Components: search Reporter: Mike A project I'm working on requires *word order* searching. The users are accustomed to Sphinx search, and expect a query like [ A B ] to return only documents that contain the term A before the term B. I believe this can currently be done with the surround parser (SOLR-2703), but we lack an operator for it. It would be great to add it, so that word order searches can be combined by users into sophisticated queries. Note that this should also support a query like [ A A], which would require that the term be in the document twice (the first instance before the second). This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and exact match). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3099) Add query operator, index structure, and analyzer for exact match searching
[ https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike updated SOLR-3099: --- Issue Type: Sub-task (was: New Feature) Parent: SOLR-3028 Add query operator, index structure, and analyzer for exact match searching - Key: SOLR-3099 URL: https://issues.apache.org/jira/browse/SOLR-3099 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Mike Original Estimate: 4h Remaining Estimate: 4h A project I'm working on requires *exact match* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ =runs ] to return only documents that contain the exact term, runs, and not the stemmed word run. In SOLR-2866, there is similar work, but I believe it is different because it uses a huge-synonym file rather than storing the original terms directly in the index. What I'd like instead is two things: 1. An analyzer that says, store the original form of all words in the index along with the stemmed variations. If necessary, it's fine if this is simply an unstemmed field, but that seems cumbersome schema-wise and performance-wise. 2. An operator in edismax that allows users to query the exact form of the word. Sphinx uses the equals sign (=), and that makes sense logically to me. This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and word order). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3100) Add an operator to edismax for term quorum
[ https://issues.apache.org/jira/browse/SOLR-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike updated SOLR-3100: --- Issue Type: Sub-task (was: New Feature) Parent: SOLR-3028 Add an operator to edismax for term quorum -- Key: SOLR-3100 URL: https://issues.apache.org/jira/browse/SOLR-3100 Project: Solr Issue Type: Sub-task Components: search Reporter: Mike Original Estimate: 2h Remaining Estimate: 2h A project I'm working on requires *term quorum* searching with stemming turned off. The users are accostomed to Sphinx search, and thus expect a query like [ A AND (B C D)/2 ] to return only documents that contain A or at least two of B, C or D. So this document would match: a b c But this one wouldn't: a b This can be a useful form of fuzzy searching, and I think we support it via the MM parameter, but we lack a user-facing operator for this. It would be great to add it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3101) Add an operator to edismax for word order
[ https://issues.apache.org/jira/browse/SOLR-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike updated SOLR-3101: --- Issue Type: Sub-task (was: New Feature) Parent: SOLR-3028 Add an operator to edismax for word order - Key: SOLR-3101 URL: https://issues.apache.org/jira/browse/SOLR-3101 Project: Solr Issue Type: Sub-task Components: search Reporter: Mike Original Estimate: 4h Remaining Estimate: 4h A project I'm working on requires *word order* searching. The users are accustomed to Sphinx search, and expect a query like [ A B ] to return only documents that contain the term A before the term B. I believe this can currently be done with the surround parser (SOLR-2703), but we lack an operator for it. It would be great to add it, so that word order searches can be combined by users into sophisticated queries. Note that this should also support a query like [ A A], which would require that the term be in the document twice (the first instance before the second). This issue is part of a meta issue, SOLR-3028, that is requesting two other operators in edismax (quorum search and exact match). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: ToParentBlockJoinQuery vs filtered search
On Mon, Feb 6, 2012 at 2:25 AM, Michael McCandless luc...@mikemccandless.com wrote: Hi Mikhail, BlockJoinQParserPlugin sounds cool! I think you're right: the incoming filter will apply to the to document space. So, for ToParentBJQ it's parent docs, and ToChildBJQ it's child docs. The filter only needs to define the bits for docs in that to space... the other bits will not be used. Thanks for resolving my hesitations. It allows me move forward. It looks like that's what your test case is testing for...? Does it pass? Of course it doesn't. the first reason is that BlockJoinWeight.scorer() http://svn.apache.org/viewvc/lucene/dev/trunk/modules/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java?view=markup has the opposite intention (btw, are you %100 sure?): * Children query is filtered by the given filter childWeight.scorer(readerContext, true, false, *acceptDocs*); * Parent filter is not constrained parentsFilter.getDocIdSet(readerContext, *readerContext.reader().getLiveDocs()*); That's why I asked for the rationale of filtered BJQ search. The also complication which I met is that AssertingIndexSearcher.wrapFilter() randomly switches from filtered search to FilteredQuery. http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test-framework/java/org/apache/lucene/search/AssertingIndexSearcher.java it leads to IllegalStateExceptionparentFilter must return FixedBitSet; got BitsFilteredDocIdSet. I suppose I can deal with it. Mike McCandless http://blog.mikemccandless.com On Sun, Feb 5, 2012 at 3:25 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, I'd like to contribute BlockJoinQParserPlugin for Solr. It's not a very big deal, but I'm stuck during writing filtered search test cases. At the first glance it looks like deja vu for another join https://issues.apache.org/jira/browse/SOLR-3062 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java?r1=1238085r2=1239355. But then I realized that it's a question about requirements: What is the expected functionality for ToParentBlockJoinQuery for filtered search IndexSearcher.search(Query, *Filter*, Collector)? whether the given filter is applied to children documents or to the parent documents? Considering Solr's fq= I suppose that there is more sense to apply that filter to parent documents. WDYT? I'm attaching the small amendments to TestBlockJoin to get you my understanding. Thanks in advance. -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration
[ https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201039#comment-13201039 ] Christian Moen commented on SOLR-3097: -- Thanks, Robert. Is your thinking to use the {{sync-analyzers}} target to automatically copy resources to the right place as part of {{package}}, {{example}}, etc. -- or is this as convenience to easier make sure the files are in sync when we check them in separately? The {{sync-analyzers}} works fine for the latter purpose, but needs hookups elsewhere in {{build.xml}} if we want to do this automatically. Happy to follow up on the latter if this is what you'd like to see in the patch. Introduce default Japanese stoptags and stopwords to Solr's example configuration - Key: SOLR-3097 URL: https://issues.apache.org/jira/browse/SOLR-3097 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Attachments: SOLR-3097.patch, SOLR-3097.patch SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese in {{schema.xml}}. This configuration will be improved by also introducing default stopwords and stoptags configuration for the field type. I believe this configuration should be easily available and tunable to Solr users and I'm proposing that we introduce the same stopwords and stoptags provided in LUCENE-3745 to Solr example configuration. I'm proposing that files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English. (Longer term, I think should reconsider our overall approach to this across all languages, but that's perhaps a separate discussion.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()
[ https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201073#comment-13201073 ] Doron Cohen commented on LUCENE-3746: - Thanks Dawid! {quote} it's probably a system daemon thread for sending memory threshold notifications {quote} Yes this makes sense. Still the difference between the two JDKs felt bothering. Some more digging, and now I think it is clear. Here are the stack traces reported (at the end of the test) with Oracle: {noformat} 1. Thread[ReaderThread,5,main] 2. Thread[main,5,main] 3. Thread[Reference Handler,10,system] 4. Thread[Signal Dispatcher,9,system] 5. Thread[Finalizer,8,system] 6. Thread[Attach Listener,5,system] {noformat} And with IBM JDK: {noformat} 1. Thread[Attach API wait loop,10,main] 2. Thread[Finalizer thread,5,system] 3. Thread[JIT Compilation Thread,10,system] 4. Thread[main,5,main] 5. Thread[Gc Slave Thread,5,system] 6. Thread[ReaderThread,5,main] 7. Thread[Signal Dispatcher,5,main] 8. Thread[MemoryPoolMXBean notification dispatcher,6,main] {noformat} The 8th thread is the one that started only after accessing the memory management layer. The thing is, that in the IBM JDK that thread is created in the ThreadGroup main, while in the Oracle JDK it is created under system. To me the latter makes more sense. To be more sure I added a fake memory notification listener and check the thread in which notification happens: {code} MemoryMXBean mmxb = ManagementFactory.getMemoryMXBean(); NotificationListener listener = new NotificationListener() { @Override public void handleNotification(Notification notification, Object handback) { System.out.println(Thread.currentThread()); } }; ((NotificationEmitter) mmxb).addNotificationListener(listener, null, null); {code} Evidently in IBM JDK the notification is in main group thread (also in line with the thread-group in the original warning message which triggered this threads discussion): {noformat} Thread[MemoryPoolMXBean notification dispatcher,6,main] {noformat} While in Oracle JDK notification is in system group thread: {noformat} Thread[Low Memory Detector,9,system] {noformat} This also explains why the warning is reported only for IBM JDK: because the threads check in LTC only account for the threads in the same thread-group as the one running the specific test case. So when dispatching happens in a system group thread it is not sensed by that check at all. Ok now with mystery solved I can commit the simpler code... suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory() -- Key: LUCENE-3746 URL: https://issues.apache.org/jira/browse/LUCENE-3746 Project: Lucene - Java Issue Type: Bug Components: modules/spellchecker Reporter: Doron Cohen Fix For: 3.6, 4.0 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh] -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Welcome David Smiley
Wow! It is truly an honor to be selected by the Lucene PMC to join the committer ranks. You are a top notch team of coders working on one of the most important open-source projects. About me: My technical background is all tiers of web development with a focus on the middle tier and Java. Of course I have expertise in Lucene and Solr but I also focus on geospatial matters as well as threading / concurrency. I like solving hard interesting problems. I am employed full time by The MITRE Corporation, a US federally funded non-profit organization in which I mostly work in the defense sector. I've been with MITRE for ~14 years. I've been fortunate lately to work on projects that fund my open-source geospatial work. I conduct Solr training at MITRE (1 day and 2-day classes), and I'm sort of a search consultant within MITRE, advising MITRE and its government clients. For 6 months, I have also been working part-time for OpenSource Connections as a search consultant. At home, I'm married with two kids: Adeline who is 10 months old (she's in my arms sleeping as I write this) and Camille who is 2 years 10 months old. I don't know how I found the time to write a book, but now that it's done, I'm on full parental duty when at home. For fun, I like to follow Starcraft 2 professional e-sports. It's conveniently something I can do while I hold a baby; playing the game isn't, unfortuantely. I look forward to meeting you all at Lucene Revolution in May! I live close by in Lowell. Cheers, David Smiley - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Welcome-David-Smiley-tp3717248p3718969.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org