date:20120205

Re: [Lucene.Net] 3.0.3

2012-02-05 Thread Christopher Currens

I can only guarantee that these 31 bugs here (in the 3.0.3 version):
https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/CHANGES.txt are
part of the code. I mean, it's possible that other's are, but we'd really
need to check the others listed there to be sure that they are also
included. However, that's only a difference of 9 bugs, so I think we're
very close to a 3.0.3 release, depending on how many issues we want to get
done that related to changing the API.

Thanks,
Christopher

On Sat, Feb 4, 2012 at 10:03 AM, Prescott Nasser geobmx...@hotmail.comwrote:

So, Chris if you did this as a direct port of the java version (
https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/), Does
that mean that all of the LUCENE JIRA issues (
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+LUCENE+AND+fixVersion+%3D+%223.0.3%22+AND+status+%3D+Closed+ORDER+BY+priority+DESCmode=hide)
are part of this code already? That would make 3.0.3 well on it's way to
release... ~P
From: bode...@apache.org
To: lucene-net-...@incubator.apache.org
Date: Wed, 25 Jan 2012 12:35:25 +0100
Subject: Re: [Lucene.Net] 3.0.3

On 2012-01-25, Michael Herndon wrote:

Do we have a standard of copy or tag of Java's version source that
we're
doing a compare against? I only see the 3_1 and above in the tags.

Likely because the svn location has changed in between. I think it must
be https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/

Stefan

Re: [Lucene.Net] 3.0.3

2012-02-05 Thread Christopher Currens

2653, 2055, 2776, 2732, 2688, 2616, 2524, 2398, 2284, 2278, 2277, and 2249
are all on JIRA that aren't on that list in the CHANGES.txt file. It looks
like that file in SVN has some issues that aren't listen in JIRA. Anyway,
it's possible that those issues listed here have already been ported as
part of that changeset. I'm basing that on the fact that the last time
these bugs were updated was Dec 1st 2010, which was before the code was
released. However, we should still check to make sure.

Thanks,
Christopher

On Sun, Feb 5, 2012 at 11:37 AM, Christopher Currens
currens.ch...@gmail.com wrote:

Thanks,
Christopher

On Sat, Feb 4, 2012 at 10:03 AM, Prescott Nasser geobmx...@hotmail.comwrote:

On 2012-01-25, Michael Herndon wrote:

Do we have a standard of copy or tag of Java's version source that
we're
doing a compare against? I only see the 3_1 and above in the tags.

Likely because the svn location has changed in between. I think it must
be https://svn.apache.org/repos/asf/lucene/java/tags/lucene_3_0_3/

Stefan

[jira] [Updated] (SOLR-3056) Introduce Japanese field type in schema.xml

2012-02-05 Thread Christian Moen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen updated SOLR-3056:
-

Attachment: SOLR-3056_schema40.patch

 Introduce Japanese field type in schema.xml
 ---

 Key: SOLR-3056
 URL: https://issues.apache.org/jira/browse/SOLR-3056
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3056_move.patch, SOLR-3056_schema40.patch, 
 SOLR-3056_schema40.patch, SOLR-3056_schema40.patch


 Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again 
 Robert, Uwe and Simon). It would be very good to get a default field type 
 defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box 
 support in Solr.
 I've been playing with the below configuration today, which I think is a 
 reasonable starting point for Japanese.  There's lot to be said about various 
 considerations necessary when searching Japanese, but perhaps a wiki page is 
 more suitable to cover the wider topic?
 In order to make the below {{text_ja}} field type work, Kuromoji itself and 
 its analyzers need to be seen by the Solr classloader.  However, these are 
 currently in contrib and I'm wondering if we should consider moving them to 
 core to make them directly available.  If there are concerns with additional 
 memory usage, etc. for non-Japanese users, we can make sure resources are 
 loaded lazily and only when needed in factory-land.
 Any thoughts?
 {code:xml}
 !-- Text field type is suitable for Japanese text using morphological 
 analysis
  NOTE: Please copy files
contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar
dist/apache-solr-analysis-extras-x.y.z.jar
  to your Solr lib directory (i.e. example/solr/lib) before before 
 starting Solr.
  (x.y.z refers to a version number)
  If you would like to optimize for precision, default operator AND with
solrQueryParser defaultOperator=AND/
  below (this file).  Use OR if you would like to optimize for recall 
 (default).
 --
 fieldType name=text_ja class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=false
   analyzer
 !-- Kuromoji Japanese morphological analyzer/tokenizer
  Use search-mode to get a noun-decompounding effect useful for search.
  Example:
関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 
 (International) 空港 (airport)
so we get a match for 空港 (airport) as we would expect from a good 
 search engine
  Valid values for mode are:
 normal: default segmentation
 search: segmentation useful for search (extra compound splitting)
   extended: search mode with unigramming of unknown words 
 (experimental)
  NOTE: Search mode improves segmentation for search at the expense of 
 part-of-speech accuracy
 --
 tokenizer class=solr.KuromojiTokenizerFactory mode=search/
 !-- Reduces inflected verbs and adjectives to their base/dectionary 
 forms (辞書形) --  
 filter class=solr.KuromojiBaseFormFilterFactory/
 !-- Optionally remove tokens with certain part-of-speeches
 filter class=solr.KuromojiPartOfSpeechStopFilterFactory 
 tags=stopTags.txt enablePositionIncrements=true/ --
 !-- Normalizes full-width romaji to half-with and half-width kana to 
 full-width (Unicode NFKC subset) --
 filter class=solr.CJKWidthFilterFactory/
 !-- Lower-case romaji characters --
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3056) Introduce Japanese field type in schema.xml

2012-02-05 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200688#comment-13200688
 ] 

Christian Moen commented on SOLR-3056:
--

Stopwords and stoptags for Solr are now tracked in SOLR-3097 and a patch is 
available.

 Introduce Japanese field type in schema.xml
 ---

 Key: SOLR-3056
 URL: https://issues.apache.org/jira/browse/SOLR-3056
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3056_move.patch, SOLR-3056_schema40.patch, 
 SOLR-3056_schema40.patch, SOLR-3056_schema40.patch


 Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again 
 Robert, Uwe and Simon). It would be very good to get a default field type 
 defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box 
 support in Solr.
 I've been playing with the below configuration today, which I think is a 
 reasonable starting point for Japanese.  There's lot to be said about various 
 considerations necessary when searching Japanese, but perhaps a wiki page is 
 more suitable to cover the wider topic?
 In order to make the below {{text_ja}} field type work, Kuromoji itself and 
 its analyzers need to be seen by the Solr classloader.  However, these are 
 currently in contrib and I'm wondering if we should consider moving them to 
 core to make them directly available.  If there are concerns with additional 
 memory usage, etc. for non-Japanese users, we can make sure resources are 
 loaded lazily and only when needed in factory-land.
 Any thoughts?
 {code:xml}
 !-- Text field type is suitable for Japanese text using morphological 
 analysis
  NOTE: Please copy files
contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar
dist/apache-solr-analysis-extras-x.y.z.jar
  to your Solr lib directory (i.e. example/solr/lib) before before 
 starting Solr.
  (x.y.z refers to a version number)
  If you would like to optimize for precision, default operator AND with
solrQueryParser defaultOperator=AND/
  below (this file).  Use OR if you would like to optimize for recall 
 (default).
 --
 fieldType name=text_ja class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=false
   analyzer
 !-- Kuromoji Japanese morphological analyzer/tokenizer
  Use search-mode to get a noun-decompounding effect useful for search.
  Example:
関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 
 (International) 空港 (airport)
so we get a match for 空港 (airport) as we would expect from a good 
 search engine
  Valid values for mode are:
 normal: default segmentation
 search: segmentation useful for search (extra compound splitting)
   extended: search mode with unigramming of unknown words 
 (experimental)
  NOTE: Search mode improves segmentation for search at the expense of 
 part-of-speech accuracy
 --
 tokenizer class=solr.KuromojiTokenizerFactory mode=search/
 !-- Reduces inflected verbs and adjectives to their base/dectionary 
 forms (辞書形) --  
 filter class=solr.KuromojiBaseFormFilterFactory/
 !-- Optionally remove tokens with certain part-of-speeches
 filter class=solr.KuromojiPartOfSpeechStopFilterFactory 
 tags=stopTags.txt enablePositionIncrements=true/ --
 !-- Normalizes full-width romaji to half-with and half-width kana to 
 full-width (Unicode NFKC subset) --
 filter class=solr.CJKWidthFilterFactory/
 !-- Lower-case romaji characters --
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3056) Introduce Japanese field type in schema.xml

2012-02-05 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200689#comment-13200689
 ] 

Christian Moen commented on SOLR-3056:
--

Updated patch for {{schema.xml}} on {{trunk}}.

The field type {{text_ja}} now uses a {{KuromojiPartOfSpeechStopFilter}} and 
{{StopFilter}} for stopping and their configuration uses the stop sets in the 
SOLR-3097 patch.  Hence, SOLR-3097 should be applied before or at the same time 
as this patch.


 Introduce Japanese field type in schema.xml
 ---

 Key: SOLR-3056
 URL: https://issues.apache.org/jira/browse/SOLR-3056
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3056_move.patch, SOLR-3056_schema40.patch, 
 SOLR-3056_schema40.patch, SOLR-3056_schema40.patch


 Kuromoji (LUCENE-3305) is now on both on trunk and branch_3x (thanks again 
 Robert, Uwe and Simon). It would be very good to get a default field type 
 defined for Japanese in {{schema.xml}} so we can good Japanese out-of-the-box 
 support in Solr.
 I've been playing with the below configuration today, which I think is a 
 reasonable starting point for Japanese.  There's lot to be said about various 
 considerations necessary when searching Japanese, but perhaps a wiki page is 
 more suitable to cover the wider topic?
 In order to make the below {{text_ja}} field type work, Kuromoji itself and 
 its analyzers need to be seen by the Solr classloader.  However, these are 
 currently in contrib and I'm wondering if we should consider moving them to 
 core to make them directly available.  If there are concerns with additional 
 memory usage, etc. for non-Japanese users, we can make sure resources are 
 loaded lazily and only when needed in factory-land.
 Any thoughts?
 {code:xml}
 !-- Text field type is suitable for Japanese text using morphological 
 analysis
  NOTE: Please copy files
contrib/analysis-extras/lucene-libs/lucene-kuromoji-x.y.z.jar
dist/apache-solr-analysis-extras-x.y.z.jar
  to your Solr lib directory (i.e. example/solr/lib) before before 
 starting Solr.
  (x.y.z refers to a version number)
  If you would like to optimize for precision, default operator AND with
solrQueryParser defaultOperator=AND/
  below (this file).  Use OR if you would like to optimize for recall 
 (default).
 --
 fieldType name=text_ja class=solr.TextField positionIncrementGap=100 
 autoGeneratePhraseQueries=false
   analyzer
 !-- Kuromoji Japanese morphological analyzer/tokenizer
  Use search-mode to get a noun-decompounding effect useful for search.
  Example:
関西国際空港 (Kansai International Airpart) becomes 関西 (Kansai) 国際 
 (International) 空港 (airport)
so we get a match for 空港 (airport) as we would expect from a good 
 search engine
  Valid values for mode are:
 normal: default segmentation
 search: segmentation useful for search (extra compound splitting)
   extended: search mode with unigramming of unknown words 
 (experimental)
  NOTE: Search mode improves segmentation for search at the expense of 
 part-of-speech accuracy
 --
 tokenizer class=solr.KuromojiTokenizerFactory mode=search/
 !-- Reduces inflected verbs and adjectives to their base/dectionary 
 forms (辞書形) --  
 filter class=solr.KuromojiBaseFormFilterFactory/
 !-- Optionally remove tokens with certain part-of-speeches
 filter class=solr.KuromojiPartOfSpeechStopFilterFactory 
 tags=stopTags.txt enablePositionIncrements=true/ --
 !-- Normalizes full-width romaji to half-with and half-width kana to 
 full-width (Unicode NFKC subset) --
 filter class=solr.CJKWidthFilterFactory/
 !-- Lower-case romaji characters --
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3751) Align default Japanese configurations for Lucene and Solr

2012-02-05 Thread Christian Moen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen updated LUCENE-3751:
---

Attachment: LUCENE-3751.patch

 Align default Japanese configurations for Lucene and Solr
 -

 Key: LUCENE-3751
 URL: https://issues.apache.org/jira/browse/LUCENE-3751
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: LUCENE-3751.patch


 The {{KuromojiAnalyzer}} in Lucene shoud have the same default configuration 
 as the {{text_ja}} field type introduced in {{schema.xml}} by SOLR-3056.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3751) Align default Japanese configurations for Lucene and Solr

2012-02-05 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200690#comment-13200690
 ] 

Christian Moen commented on LUCENE-3751:


Patch for {{trunk}} is attached.

The behavior or {{KuromojiAnalyzer}} is now the same as field type {{text_ja}} 
in Solr's example {{schema.xml}} (see SOLR-3056), including the order of the 
filters.

I think it makes sense to have the {{LowerCaseFilter}} late in the chain as it 
might make sense to use a case-based {{StopFilter}}.  It doesn't perhaps matter 
much in {{KuromojiAnalyzer}}'s case since the defaults don't do this anyway, 
but I thought it was good to practice to align configuration anyway.

I've also clarified an error message and a javadoc.

 Align default Japanese configurations for Lucene and Solr
 -

 Key: LUCENE-3751
 URL: https://issues.apache.org/jira/browse/LUCENE-3751
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: LUCENE-3751.patch


 The {{KuromojiAnalyzer}} in Lucene shoud have the same default configuration 
 as the {{text_ja}} field type introduced in {{schema.xml}} by SOLR-3056.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration

2012-02-05 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200732#comment-13200732
 ] 

Robert Muir commented on LUCENE-3745:
-

Thanks for doing this, it will be much nicer to have a properly built 
configuration here!

I agree with the overall approach of leaning towards the conservative side: if 
someone wants
they can always be more aggressive (and use the data on this issue as a guide).





 Need stopwords and stoptags lists for default Japanese configuration
 

 Key: LUCENE-3745
 URL: https://issues.apache.org/jira/browse/LUCENE-3745
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Christian Moen
 Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, 
 top-100-pos.txt, top-pos.txt


 Stopwords and stoptags lists for Japanese needs to be developed, tested and 
 integrated into Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3726) Default KuromojiAnalyzer to use search mode

2012-02-05 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3726.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.6
 Assignee: Robert Muir

Thanks Christian: I committed this.

 Default KuromojiAnalyzer to use search mode
 ---

 Key: LUCENE-3726
 URL: https://issues.apache.org/jira/browse/LUCENE-3726
 Project: Lucene - Java
  Issue Type: Improvement
Affects Versions: 3.6, 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3726.patch, LUCENE-3726.patch, LUCENE-3726.patch, 
 kuromojieval.tar.gz


 Kuromoji supports an option to segment text in a way more suitable for search,
 by preventing long compound nouns as indexing terms.
 In general 'how you segment' can be important depending on the application 
 (see http://nlp.stanford.edu/pubs/acl-wmt08-cws.pdf for some studies on this 
 in chinese)
 The current algorithm punishes the cost based on some parameters 
 (SEARCH_MODE_PENALTY, SEARCH_MODE_LENGTH, etc)
 for long runs of kanji.
 Some questions (these can be separate future issues if any useful ideas come 
 out):
 * should these parameters continue to be static-final, or configurable?
 * should POS also play a role in the algorithm (can/should we refine exactly 
 what we decompound)?
 * is the Tokenizer the best place to do this, or should we do it in a 
 tokenfilter? or both?
   with a tokenfilter, one idea would be to also preserve the original 
 indexing term, overlapping it: e.g. ABCD - AB, CD, ABCD(posInc=0)
   from my understanding this tends to help with noun compounds in other 
 languages, because IDF of the original term boosts 'exact' compound matches.
   but does a tokenfilter provide the segmenter enough 'context' to do this 
 properly?
 Either way, I think as a start we should turn on what we have by default: its 
 likely a very easy win.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration

2012-02-05 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200747#comment-13200747
 ] 

Christian Moen commented on LUCENE-3745:


Thanks a lot for looking at this, Robert.  This was the thinking.  (I've 
referred to the issue in the stopwords and stoptags files.)

 Need stopwords and stoptags lists for default Japanese configuration
 

 Key: LUCENE-3745
 URL: https://issues.apache.org/jira/browse/LUCENE-3745
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Christian Moen
 Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, 
 top-100-pos.txt, top-pos.txt


 Stopwords and stoptags lists for Japanese needs to be developed, tested and 
 integrated into Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration

2012-02-05 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200749#comment-13200749
 ] 

Robert Muir commented on SOLR-3097:
---

{quote}
(Longer term, I think should reconsider our overall approach to this across all 
languages, but that's perhaps a separate discussion.)
{quote}

It is a larger issue... in general we should make it easier to keep the two 
synchronized, but off the top of my head an idea for a plan was:
* add 'snowball format' support to solr stopfilter so it can read all the 
lucene stopwords directly
* add an ant task to synchronize the solr example from lucene's resources. 
* (of course) add fieldtypes that actually use all these files.

On the other hand, realistically these resources are pretty static (don't 
change once added). So for now I don't think its a huge
risk that we don't have an auto-sync process... but we need to tackle these 
problems to easily integrate european languages anyway.

So I dont think this should block this issue, lets get japanese up and going 
for now.

 Introduce default Japanese stoptags and stopwords to Solr's example 
 configuration
 -

 Key: SOLR-3097
 URL: https://issues.apache.org/jira/browse/SOLR-3097
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3097.patch


 SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese 
 in {{schema.xml}}.  This configuration will be improved by also introducing 
 default stopwords and stoptags configuration for the field type.  
 I believe this configuration should be easily available and tunable to Solr 
 users and I'm proposing that we introduce the same stopwords and stoptags 
 provided in LUCENE-3745 to Solr example configuration.  I'm proposing that 
 files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and 
 {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English.  (Longer 
 term, I think should reconsider our overall approach to this across all 
 languages, but that's perhaps a separate discussion.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration

2012-02-05 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200752#comment-13200752
 ] 

Christian Moen commented on SOLR-3097:
--

Thanks a lot, Robert.

 Introduce default Japanese stoptags and stopwords to Solr's example 
 configuration
 -

 Key: SOLR-3097
 URL: https://issues.apache.org/jira/browse/SOLR-3097
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3097.patch


 SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese 
 in {{schema.xml}}.  This configuration will be improved by also introducing 
 default stopwords and stoptags configuration for the field type.  
 I believe this configuration should be easily available and tunable to Solr 
 users and I'm proposing that we introduce the same stopwords and stoptags 
 provided in LUCENE-3745 to Solr example configuration.  I'm proposing that 
 files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and 
 {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English.  (Longer 
 term, I think should reconsider our overall approach to this across all 
 languages, but that's perhaps a separate discussion.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration

2012-02-05 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200753#comment-13200753
 ] 

Robert Muir commented on LUCENE-3745:
-

Lets get my previous ad-hoc lists out of there :)

I'll commit this for now and if there are any concerns we can reopen or refine 
in further issues.

 Need stopwords and stoptags lists for default Japanese configuration
 

 Key: LUCENE-3745
 URL: https://issues.apache.org/jira/browse/LUCENE-3745
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Christian Moen
 Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, 
 top-100-pos.txt, top-pos.txt


 Stopwords and stoptags lists for Japanese needs to be developed, tested and 
 integrated into Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3745) Need stopwords and stoptags lists for default Japanese configuration

2012-02-05 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3745.
-

   Resolution: Fixed
Fix Version/s: 4.0
   3.6

Thanks Christian!

 Need stopwords and stoptags lists for default Japanese configuration
 

 Key: LUCENE-3745
 URL: https://issues.apache.org/jira/browse/LUCENE-3745
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Christian Moen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3745.patch, filter_stoptags.py, top-10.txt, 
 top-100-pos.txt, top-pos.txt


 Stopwords and stoptags lists for Japanese needs to be developed, tested and 
 integrated into Lucene.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration

2012-02-05 Thread Robert Muir (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-3097:
--

Attachment: SOLR-3097.patch

ok this ant task was easy enough to write...

here's my first stab at it.

 Introduce default Japanese stoptags and stopwords to Solr's example 
 configuration
 -

 Key: SOLR-3097
 URL: https://issues.apache.org/jira/browse/SOLR-3097
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3097.patch, SOLR-3097.patch


 SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese 
 in {{schema.xml}}.  This configuration will be improved by also introducing 
 default stopwords and stoptags configuration for the field type.  
 I believe this configuration should be easily available and tunable to Solr 
 users and I'm proposing that we introduce the same stopwords and stoptags 
 provided in LUCENE-3745 to Solr example configuration.  I'm proposing that 
 files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and 
 {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English.  (Longer 
 term, I think should reconsider our overall approach to this across all 
 languages, but that's perhaps a separate discussion.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Welcome David Smiley

2012-02-05 Thread Grant Ingersoll

I'm pleased to announce that the Lucene PMC has elected to add David Smiley as 
a committer to the Lucene/Solr project in recognition of  his ongoing 
contributions.

David, custom is to say a little bit about yourself, so feel free to give a 
little background on yourself.

Welcome aboard,
Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Michael McCandless

Welcome David!

Happy hacking,

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 5, 2012 at 8:46 AM, Grant Ingersoll gsing...@apache.org wrote:
 I'm pleased to announce that the Lucene PMC has elected to add David Smiley 
 as a committer to the Lucene/Solr project in recognition of  his ongoing 
 contributions.

 David, custom is to say a little bit about yourself, so feel free to give a 
 little background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3752) move preflexrw to lucene3x package

2012-02-05 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200769#comment-13200769
 ] 

Michael McCandless commented on LUCENE-3752:


+1

 move preflexrw to lucene3x package
 --

 Key: LUCENE-3752
 URL: https://issues.apache.org/jira/browse/LUCENE-3752
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0


 Currently there are a lot of things made public in lucene3x codec, but all 
 marked internal/experimental/deprecated.
 A lot of this is just so our test codec (preflexrw) can subclass it. I think 
 we should just move it to the same
 package, then it call all be package-private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Robert Muir

Welcome!

On Sun, Feb 5, 2012 at 8:46 AM, Grant Ingersoll gsing...@apache.org wrote:
 I'm pleased to announce that the Lucene PMC has elected to add David Smiley 
 as a committer to the Lucene/Solr project in recognition of  his ongoing 
 contributions.

 David, custom is to say a little bit about yourself, so feel free to give a 
 little background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3749) Similarity.java javadocs and simplifications for 4.0

2012-02-05 Thread Michael McCandless (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200771#comment-13200771
 ] 

Michael McCandless commented on LUCENE-3749:


+1

 Similarity.java javadocs and simplifications for 4.0
 

 Key: LUCENE-3749
 URL: https://issues.apache.org/jira/browse/LUCENE-3749
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3749.patch, LUCENE-3749_part2.patch


 As part of adding additional scoring systems to lucene, we made a lower-level 
 Similarity
 and the existing stuff became e.g. TFIDFSimilarity which extends it.
 However, I always feel bad about the complexity introduced here (though I do 
 feel there
 are some excuses, that its a difficult challenge).
 In order to try to mitigate this, we also exposed an easier API 
 (SimilarityBase) on top of 
 it that makes some assumptions (and trades off some performance) to try to 
 provide something 
 consumable for e.g. experiments.
 Still, we can cleanup a few things with the low-level api: fix outdated 
 documentation and
 shoot for better/clearer naming etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Simon Willnauer

welcome! :)

simon

On Sun, Feb 5, 2012 at 3:03 PM, Robert Muir rcm...@gmail.com wrote:
 Welcome!

 On Sun, Feb 5, 2012 at 8:46 AM, Grant Ingersoll gsing...@apache.org wrote:
 I'm pleased to announce that the Lucene PMC has elected to add David Smiley 
 as a committer to the Lucene/Solr project in recognition of  his ongoing 
 contributions.

 David, custom is to say a little bit about yourself, so feel free to give a 
 little background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3752) move preflexrw to lucene3x package

2012-02-05 Thread Simon Willnauer (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200784#comment-13200784
 ] 

Simon Willnauer commented on LUCENE-3752:
-

+1

 move preflexrw to lucene3x package
 --

 Key: LUCENE-3752
 URL: https://issues.apache.org/jira/browse/LUCENE-3752
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0


 Currently there are a lot of things made public in lucene3x codec, but all 
 marked internal/experimental/deprecated.
 A lot of this is just so our test codec (preflexrw) can subclass it. I think 
 we should just move it to the same
 package, then it call all be package-private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Welcome David Smiley

2012-02-05 Thread Uwe Schindler

Welcome!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Sunday, February 05, 2012 2:46 PM
 To: dev@lucene.apache.org
 Subject: Welcome David Smiley
 
 I'm pleased to announce that the Lucene PMC has elected to add David
Smiley
 as a committer to the Lucene/Solr project in recognition of  his ongoing
 contributions.
 
 David, custom is to say a little bit about yourself, so feel free to give
a little
 background on yourself.
 
 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Dawid Weiss

Welcome David!

Dawid

On Sun, Feb 5, 2012 at 3:50 PM, Uwe Schindler u...@thetaphi.de wrote:
 Welcome!

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Sunday, February 05, 2012 2:46 PM
 To: dev@lucene.apache.org
 Subject: Welcome David Smiley

 I'm pleased to announce that the Lucene PMC has elected to add David
 Smiley
 as a committer to the Lucene/Solr project in recognition of  his ongoing
 contributions.

 David, custom is to say a little bit about yourself, so feel free to give
 a little
 background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Martijn v Groningen

Welcome!

On 5 February 2012 16:01, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote:

 Welcome David!

 Dawid

 On Sun, Feb 5, 2012 at 3:50 PM, Uwe Schindler u...@thetaphi.de wrote:
  Welcome!
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Sunday, February 05, 2012 2:46 PM
  To: dev@lucene.apache.org
  Subject: Welcome David Smiley
 
  I'm pleased to announce that the Lucene PMC has elected to add David
  Smiley
  as a committer to the Lucene/Solr project in recognition of  his ongoing
  contributions.
 
  David, custom is to say a little bit about yourself, so feel free to
 give
  a little
  background on yourself.
 
  Welcome aboard,
  Grant
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
 additional
  commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Met vriendelijke groet,

Martijn van Groningen

[jira] [Updated] (SOLR-1860) improve stopwords list handling

2012-02-05 Thread Robert Muir (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-1860:
--

Attachment: SOLR-1860.patch

Now that Simon cleaned up wordlistloader, this is easy.

Attached is a patch to support the snowball format (format=snowball) in
StopFilterFactory and the common-grams factories.

Along with something like the ant task in SOLR-3097, we should be able to move
forwards with having some default configurations for other languages out-of-box.

improve stopwords list handling
---

Key: SOLR-1860
URL: https://issues.apache.org/jira/browse/SOLR-1860
Project: Solr
Issue Type: Improvement
Components: Schema and Analysis
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
Attachments: SOLR-1860.patch, SOLR-1860.patch

Currently Solr makes it easy to use english stopwords for StopFilter or
CommonGramsFilter.
Recently in lucene, we added stopwords lists (mostly, but not all from
snowball) to all the language analyzers.
So it would be nice if a user can easily specify that they want to use a
french stopword list, and use it for StopFilter or CommonGrams.
The ones from snowball, are however formatted in a different manner than the
others (although in Lucene we have parsers to deal with this).
Additionally, we abstract this from Lucene users by adding a static
getDefaultStopSet to all analyzers.
There are two approaches, the first one I think I prefer the most, but I'm
not sure it matters as long as we have good examples (maybe a foreign
language example schema?)
1. The user would specify something like:
filter class=solr.StopFilterFactory
fromAnalyzer=org.apache.lucene.analysis.FrenchAnalyzer .../
This would just grab the CharArraySet from the FrenchAnalyzer's
getDefaultStopSet method, who cares where it comes from or how its loaded.
2. We add support for snowball-formatted stopwords lists, and the user could
something like:
filter class=solr.StopFilterFactory
words=org/apache/lucene/analysis/snowball/french_stop.txt format=snowball
... /
The disadvantage to this is they have to know where the list is, what format
its in, etc. For example: snowball doesn't provide Romanian or Turkish
stopword lists to go along with their stemmers, so we had to add our own.
Let me know what you guys think, and I will create a patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Erick Erickson

Good to have you aboard

Erick

On Sun, Feb 5, 2012 at 10:20 AM, Martijn v Groningen
martijn.v.gronin...@gmail.com wrote:
 Welcome!


 On 5 February 2012 16:01, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote:

 Welcome David!

 Dawid

 On Sun, Feb 5, 2012 at 3:50 PM, Uwe Schindler u...@thetaphi.de wrote:
  Welcome!
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Grant Ingersoll [mailto:gsing...@apache.org]
  Sent: Sunday, February 05, 2012 2:46 PM
  To: dev@lucene.apache.org
  Subject: Welcome David Smiley
 
  I'm pleased to announce that the Lucene PMC has elected to add David
  Smiley
  as a committer to the Lucene/Solr project in recognition of  his
  ongoing
  contributions.
 
  David, custom is to say a little bit about yourself, so feel free to
  give
  a little
  background on yourself.
 
  Welcome aboard,
  Grant
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional
  commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: dev-h...@lucene.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




 --
 Met vriendelijke groet,

 Martijn van Groningen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3752) move preflexrw to lucene3x package

2012-02-05 Thread Robert Muir (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200792#comment-13200792
 ] 

Robert Muir commented on LUCENE-3752:
-

Thanks for the comments guys, Ill do the svn moves and make it all 
package-private (Except the codec).

I think it was especially confusing to see SegmentTerm[Enum/Docs/Positions] 
that resemble 3.x apis 
as public classes in 4.0 (even if they are 
deprecated/experimental/internal/full of warnings)...
they are really internal implementation details :)

 move preflexrw to lucene3x package
 --

 Key: LUCENE-3752
 URL: https://issues.apache.org/jira/browse/LUCENE-3752
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0


 Currently there are a lot of things made public in lucene3x codec, but all 
 marked internal/experimental/deprecated.
 A lot of this is just so our test codec (preflexrw) can subclass it. I think 
 we should just move it to the same
 package, then it call all be package-private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3752) move preflexrw to lucene3x package

2012-02-05 Thread Robert Muir (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3752.
-

Resolution: Fixed

Committed revision 1240750.

 move preflexrw to lucene3x package
 --

 Key: LUCENE-3752
 URL: https://issues.apache.org/jira/browse/LUCENE-3752
 Project: Lucene - Java
  Issue Type: Task
Reporter: Robert Muir
 Fix For: 4.0


 Currently there are a lot of things made public in lucene3x codec, but all 
 marked internal/experimental/deprecated.
 A lot of this is just so our test codec (preflexrw) can subclass it. I think 
 we should just move it to the same
 package, then it call all be package-private.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Website Update

2012-02-05 Thread Grant Ingersoll

I'm hitting a few snags w/ the build system related to bringing over the old 
content, but am otherwise ready to do the move.  Trying to get some help from 
infra, but it is Sunday morning, so...

-Grant

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Doron Cohen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3746:


Attachment: LUCENE-3746.patch

Updated patch using ManagementFactory.getMemoryMXBean().getHeapMemoryUsage(). 

Javadocs are not explicit about this call being atomic, but from the wording it 
seems almost certain to conclude that each call returns a new Usage instance. 
In this patch this is (Java) asserted and the assert passes (-ea) in two 
different JVMs - IBM and Oracle - so this might be correct. I searched some 
more explicit info on this with no success. 

Annoyingly though, in IBM JDK, running the tests like this produces the nice 
warning:

{noformat}
WARNING: test class left thread running: Thread[MemoryPoolMXBean notification 
dispatcher,6,main]
RESOURCE LEAK: test class left 1 thread(s) running
{noformat}

This makes me reluctant to use the memory bean - I did not find a way to 
prevent that thread leak.

So perhaps a better approach would be to be conservative about the sequence of 
calls when using Runtime? something like this:

{code}
long free = rt.freeMemory();
if (free is sufficient)
  return decideBy(free);
long max = rt.maxMemory();
long total = rt.totalMemory();
return decideBy(max - total)
{code}

This is conservative in that 'total' is computed last, and in that total-free 
is not added to the computed available bytes.

In both approaches, even if atomicity is guaranteed, it is possible that more 
heap is allocated in another thread between the time that the size is computed, 
to the time that the bytes are actually allocated, so not sure how safe this 
check can be made.

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Doron Cohen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-3746:


Attachment: LUCENE-3746.patch

Updated patch - without MemoryMXBean - computing 'max, total, free' (in that 
order) and deciding by 'free' or falling to 'max-free'. This is more 
conservative, than MemoryMxBean but since the latter is not full proof either, 
I prefer the simpler approach. 

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2667) Finish Solr Admin UI

2012-02-05 Thread Erick Erickson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200809#comment-13200809
 ] 

Erick Erickson commented on SOLR-2667:
--

re: SOLR-3094. If someone with javascript skills has the time/energy to help 
out with SOLR-3094, it would be awesome. I'm flying blind here. I can handle 
the LukeRequestHandler stuff, but it'll take a long time for me to figure out 
the javascript side.

Essentially, this problem makes the new UI unusable for any large index.


 Finish Solr Admin UI
 

 Key: SOLR-2667
 URL: https://issues.apache.org/jira/browse/SOLR-2667
 Project: Solr
  Issue Type: Improvement
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 4.0

 Attachments: SOLR-2667-110722.patch


 In SOLR-2399, we added a new admin UI. The issue has gotten too long to 
 follow, so this is a new issue to track remaining tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Mark Miller

Welcome aboard David!

On Feb 5, 2012, at 8:46 AM, Grant Ingersoll wrote:

 I'm pleased to announce that the Lucene PMC has elected to add David Smiley 
 as a committer to the Lucene/Solr project in recognition of  his ongoing 
 contributions.
 
 David, custom is to say a little bit about yourself, so feel free to give a 
 little background on yourself.
 
 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

- Mark Miller
lucidimagination.com












-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Tommaso Teofili

Welcome David!
Cheers,
Tommaso

2012/2/5 Grant Ingersoll gsing...@apache.org

 I'm pleased to announce that the Lucene PMC has elected to add David
 Smiley as a committer to the Lucene/Solr project in recognition of  his
 ongoing contributions.

 David, custom is to say a little bit about yourself, so feel free to give
 a little background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200834#comment-13200834
 ] 

Dawid Weiss commented on LUCENE-3746:
-

As for spawning MemoryPoolMXBean -- I wouldn't be worried about it, it's 
probably a system daemon thread for sending memory threshold notifications  
(didn't check though). I will peek at openjdk sources and see how the mx is 
implemented to verify if it's atomic or not (not a guarantee, just curiosity).

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-1860) improve stopwords list handling

2012-02-05 Thread Robert Muir (Resolved) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved SOLR-1860.
---

Resolution: Fixed
Fix Version/s: 4.0
3.6

I committed this.

Ill open up a new issue (related to SOLR-3097),
to provide setups for other languages.

improve stopwords list handling
---

Attachments: SOLR-1860.patch, SOLR-1860.patch

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Dawid Weiss (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200843#comment-13200843
 ] 

Dawid Weiss commented on LUCENE-3746:
-

Just checked and it seems to be that within a single memory pool the results 
will be atomic. Unfortunately that call aggregates all memory pools and 
(depending on the GC used) this may result in inconsistencies if the 
calculation happens to be interwoven with garbage collector activity. As stated 
in the sources of G1, for example:

{noformat}
// 4) Now, there is a very subtle issue with all the above. The
// framework will call get_memory_usage() on the three pools
// asynchronously. As a result, each call might get a different value
// for, say, survivor_num which will yield inconsistent values for
// eden_used, survivor_used, and old_gen_used (as survivor_num is used
// in the calculation of all three). This would normally be
// ok. However, it's possible that this might cause the sum of
// eden_used, survivor_used, and old_gen_used to go over the max heap
// size and this seems to sometimes cause JConsole (and maybe other
// clients) to get confused. There's not a really an easy / clean
// solution to this problem, due to the asynchrounous nature of the
// framework. 
{noformat}

Makes sense to me. I wouldn't bother with management interface then and just 
use the Runtime.* heuristic you proposed.

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

ToParentBlockJoinQuery vs filtered search

2012-02-05 Thread Mikhail Khludnev

Hello,

I'd like to contribute BlockJoinQParserPlugin for Solr. It's not a very big
deal, but I'm stuck during writing filtered search test cases. At the first
glance it looks like deja vu for another join
https://issues.apache.org/jira/browse/SOLR-3062
http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java?r1=1238085r2=1239355.
But then I realized that it's a question about requirements:

 What is the expected functionality for ToParentBlockJoinQuery for filtered
search IndexSearcher.search(Query, *Filter*, Collector)? whether the given
filter is applied to children documents or to the parent documents?

Considering Solr's fq= I suppose that there is more sense to apply that
filter to parent documents. WDYT?

I'm attaching the small amendments to TestBlockJoin to get you my
understanding.

Thanks in advance.

-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com
Index: modules/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java
===
--- modules/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java	(revision 1237200)
+++ modules/join/src/test/org/apache/lucene/search/join/TestBlockJoin.java	(working copy)
@@ -155,6 +155,63 @@ public class TestBlockJoin extends LuceneTestCase {
 dir.close();
   }
 
+  public void testSimpleFilter() throws Exception {
+
+  final Directory dir = newDirectory();
+  final RandomIndexWriter w = new RandomIndexWriter(random, dir);
+
+  final ListDocument docs = new ArrayListDocument();
+
+  docs.add(makeJob(java, 2007));
+  docs.add(makeJob(python, 2010));
+  docs.add(makeResume(Lisa, United Kingdom));
+  w.addDocuments(docs);
+
+  docs.clear();
+  docs.add(makeJob(ruby, 2005));
+  docs.add(makeJob(java, 2006));
+  docs.add(makeResume(Frank, United States));
+  w.addDocuments(docs);
+
+  IndexReader r = w.getReader();
+  w.close();
+  IndexSearcher s = newSearcher(r);
+
+  // Create a filter that defines parent documents in the index - in this case resumes
+  Filter parentsFilter = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term(docType, resume;
+
+  // Define child document criteria (finds an example of relevant work experience)
+  BooleanQuery childQuery = new BooleanQuery();
+  childQuery.add(new BooleanClause(new TermQuery(new Term(skill, java)), Occur.MUST));
+  childQuery.add(new BooleanClause(NumericRangeQuery.newIntRange(year, 2006, 2011, true, true), Occur.MUST));
+
+  // Define parent document criteria (find a resident in the UK)
+  Query parentQuery = new TermQuery(new Term(country, United Kingdom));
+  
+  // Wrap the child document query to 'join' any matches
+  // up to corresponding parent:
+  ToParentBlockJoinQuery childJoinQuery = new ToParentBlockJoinQuery(childQuery, parentsFilter, ToParentBlockJoinQuery.ScoreMode.Avg);
+  
+  assertEquals(no filter - both passed,s.search(childJoinQuery, 10).totalHits, 2);
+  assertEquals(dummy filter passes everyone ,s.search(childJoinQuery, parentsFilter, 10).totalHits, 2);
+  
+  // not found test
+  TopDocs ozHabitants  = s.search(childJoinQuery , new CachingWrapperFilter( new QueryWrapperFilter(new TermQuery(new Term(country, Oz, 10);
+  assertEquals(noone live there,0, ozHabitants.totalHits);
+  
+  // apply the UK filter by the searcher
+  TopDocs ukOnly = s.search(childJoinQuery, new CachingWrapperFilter(new QueryWrapperFilter(parentQuery)), 10);
+  //TopDocs ukOnly = s.search(childJoinQuery, new QueryWrapperFilter(parentQuery), 10);
+  assertEquals(has filter - single passed,1, ukOnly.totalHits);
+  assertEquals( Lisa, r.document(ukOnly.scoreDocs[0].doc).get(name));
+  // looking for US candidates
+  TopDocs usThen = s.search(childJoinQuery , new CachingWrapperFilter( new QueryWrapperFilter(new TermQuery(new Term(country, United States, 10);
+  assertEquals(has filter - single passed, 1,usThen.totalHits);
+  assertEquals(Frank, r.document(usThen.scoreDocs[0].doc).get(name));
+  r.close();
+  dir.close();
+  }
+  
   private Document getParentDoc(IndexReader reader, Filter parents, int childDocID) throws IOException {
 final AtomicReaderContext[] leaves = ReaderUtil.leaves(reader.getTopReaderContext());
 final int subIndex = ReaderUtil.subIndex(childDocID, leaves);

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Welcome David Smiley

2012-02-05 Thread Steven A Rowe

Welcome David!

 -Original Message-
 From: Grant Ingersoll [mailto:gsing...@apache.org]
 Sent: Sunday, February 05, 2012 8:46 AM
 To: dev@lucene.apache.org
 Subject: Welcome David Smiley

 I'm pleased to announce that the Lucene PMC has elected to add David
 Smiley as a committer to the Lucene/Solr project in recognition of  his
 ongoing contributions.

 David, custom is to say a little bit about yourself, so feel free to give
 a little background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3753) Restructure the Lucene build system

2012-02-05 Thread Steven Rowe (Created) (JIRA)

Restructure the Lucene build system
---

 Key: LUCENE-3753
 URL: https://issues.apache.org/jira/browse/LUCENE-3753
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 3.6, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe


Split out separate core/, test-framework/, and tools/ modules, each with its 
own build.xml, under the lucene/ directory, similar to the Solr restructuring 
done in SOLR-2452.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3753) Restructure the Lucene build system

2012-02-05 Thread Steven Rowe (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-3753:


Attachment: LUCENE-3753.patch

Patch implementing the idea, along with a script to fix existing patches 
against the old structure to be against the new structure.

Run this svn move script before applying the patch:

{noformat}
svn mv --parents lucene/src/java lucene/core/src/java
svn mv --parents lucene/src/test lucene/core/src/test
svn mv --parents lucene/src/resources lucene/core/src/resources
svn mv lucene/src/site lucene/site
svn mv --parents lucene/src/test-framework/java lucene/test-framework/src/java
svn mv --parents lucene/src/test-framework/resources 
lucene/test-framework/src/resources
svn mv --parents lucene/src/tools/java lucene/tools/src/java
svn mv --parents lucene/src/tools/javadoc lucene/tools/javadoc
svn mv --parents lucene/src/tools/prettify lucene/tools/prettify
svn rm lucene/src
svn mv --parents dev-tools/maven/lucene/src/pom.xml.template 
dev-tools/maven/lucene/core/pom.xml.template
svn mv --parents dev-tools/maven/lucene/src/test-framework/pom.xml.template 
dev-tools/maven/lucene/test-framework/pom.xml.template
svn rm dev-tools/maven/lucene/src
{noformat}

I think this is ready to go.

 Restructure the Lucene build system
 ---

 Key: LUCENE-3753
 URL: https://issues.apache.org/jira/browse/LUCENE-3753
 Project: Lucene - Java
  Issue Type: Improvement
  Components: general/build
Affects Versions: 3.6, 4.0
Reporter: Steven Rowe
Assignee: Steven Rowe
 Attachments: LUCENE-3753.patch


 Split out separate core/, test-framework/, and tools/ modules, each with its 
 own build.xml, under the lucene/ directory, similar to the Solr restructuring 
 done in SOLR-2452.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3602) Add join query to Lucene

2012-02-05 Thread Martijn van Groningen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200913#comment-13200913
 ] 

Martijn van Groningen commented on LUCENE-3602:
---

Jason: Better late then never... BRH is used to collect the matching from 
terms. The DTI just contains all terms / ords for a field. Comparing DTI ords 
isn't going to work when a term is in more than one segment or appears in a 
different field (fromField / toField). So I think the BRH can't be replaced by 
the DTI. The BRH could be cached per query.

 Add join query to Lucene
 

 Key: LUCENE-3602
 URL: https://issues.apache.org/jira/browse/LUCENE-3602
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/join
Reporter: Martijn van Groningen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3602-3x.patch, LUCENE-3602.patch, 
 LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
 LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
 LUCENE-3602.patch


 Solr has (psuedo) join query for a while now. I think this should also be 
 available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3602) Add join query to Lucene

2012-02-05 Thread Martijn van Groningen (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated LUCENE-3602:
--

Attachment: LUCENE-3602-3x.patch

Attached updated version of query time joining for 3x branch. Instead of doing 
a binary search for each term comparison it seeks / iterates forward. It can't 
do seeking like we do in trunk, so it isn't as fast as in trunk. However I do 
think this can be committed to at least have query time join support in 3x. 
Back porting per segment filtering and the MTQ that is in trunk is quite some 
work...

 Add join query to Lucene
 

 Key: LUCENE-3602
 URL: https://issues.apache.org/jira/browse/LUCENE-3602
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/join
Reporter: Martijn van Groningen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3602-3x.patch, LUCENE-3602-3x.patch, 
 LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
 LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, LUCENE-3602.patch, 
 LUCENE-3602.patch, LUCENE-3602.patch


 Solr has (psuedo) join query for a while now. I think this should also be 
 available in Lucene.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1758) schema definition for configuration files (validation, XSD)

2012-02-05 Thread Mike Sokolov (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200933#comment-13200933
 ] 

Mike Sokolov commented on SOLR-1758:


Yes - the schema will have to evolve as the config files evolve, so there will 
be a need for a new version, probably with each release. 

I think matching using LuceneMatchVersion makes a lot of sense.  For versions 
that are old enough (eg a different major release), the validator could still 
run, but produce warnings only.  Or else it could simply produce a message 
saying: warning; stale config version, not validating or something to that 
effect.  I'm not clear on how reasonable it is to maintain an old config 
version: isn't this the kind of thing that users will *want* to be prompted to 
revisit?


 schema definition for configuration files (validation, XSD)
 ---

 Key: SOLR-1758
 URL: https://issues.apache.org/jira/browse/SOLR-1758
 Project: Solr
  Issue Type: New Feature
Reporter: Jorg Heymans
  Labels: configuration, schema.xml, solrconfig.xml, validation, 
 xsd
 Fix For: 4.0

 Attachments: config-validation-20110523.patch


 It is too easy to make configuration errors in Solr without getting warnings. 
 We should explore ways of validation configurations. See mailing list 
 discussion at http://search-lucene.com/m/h6xKf1EShE6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3098) analysis gui hangs if no tokens are output

2012-02-05 Thread Robert Muir (Created) (JIRA)

analysis gui hangs if no tokens are output
--

 Key: SOLR-3098
 URL: https://issues.apache.org/jira/browse/SOLR-3098
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir


try entering the for text_en

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: ToParentBlockJoinQuery vs filtered search

2012-02-05 Thread Michael McCandless

Hi Mikhail,

BlockJoinQParserPlugin sounds cool!

I think you're right: the incoming filter will apply to the to
document space. So, for ToParentBJQ it's parent docs, and ToChildBJQ
it's child docs. The filter only needs to define the bits for docs in
that to space... the other bits will not be used.

It looks like that's what your test case is testing for...? Does it pass?

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 5, 2012 at 3:25 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Hello,

I'd like to contribute BlockJoinQParserPlugin for Solr. It's not a very big
deal, but I'm stuck during writing filtered search test cases. At the first
glance it looks like deja vu for another
join https://issues.apache.org/jira/browse/SOLR-3062 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java?r1=1238085r2=1239355.
But then I realized that it's a question about requirements:

What is the expected functionality for ToParentBlockJoinQuery for filtered
search IndexSearcher.search(Query, *Filter*, Collector)? whether the given
filter is applied to children documents or to the parent documents?

Considering Solr's fq= I suppose that there is more sense to apply that
filter to parent documents. WDYT?

I'm attaching the small amendments to TestBlockJoin to get you my
understanding.

Thanks in advance.

--
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3754) Store generated archive manifests in per-module output directories

2012-02-05 Thread Steven Rowe (Created) (JIRA)

Store generated archive manifests in per-module output directories
--

 Key: LUCENE-3754
 URL: https://issues.apache.org/jira/browse/LUCENE-3754
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Steven Rowe
Assignee: Steven Rowe
Priority: Minor


Currently, generated archive manifests are all stored in the same location, so 
each module's build overwrites the previously built module's manifest.  
Locating these files in the per-module build dirs will allow them to be rebuilt 
only when necessary, rather than every time a module's {{jar}} target is called.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Ryan McKinley

welcome david!


On Sun, Feb 5, 2012 at 5:46 AM, Grant Ingersoll gsing...@apache.org wrote:
 I'm pleased to announce that the Lucene PMC has elected to add David Smiley 
 as a committer to the Lucene/Solr project in recognition of  his ongoing 
 contributions.

 David, custom is to say a little bit about yourself, so feel free to give a 
 little background on yourself.

 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way

2012-02-05 Thread Uwe Schindler (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3736:
--

Attachment: LUCENE-3736.patch

Attached is a patch implementing the above proposal using the builder pattern. 
The builder pattern (sorry Robert), is the only nice setup that allows to set 
properties like ignroing stored fields on the parallel readers, but make the 
built reader unmodifiable!

 ParallelReader is now atomic, add convenience methods to wrap 
 CompositeReaders in either slow atomic or fast composite way
 --

 Key: LUCENE-3736
 URL: https://issues.apache.org/jira/browse/LUCENE-3736
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-3736.patch, LUCENE-3736.patch


 ParallelReader is now atomic. We should add a sugar wrapper method to allow 
 synchronized composite readers (with same segment sizes) to be aligned with 
 MultiReaders or wrapped by Slow:
 - one ParallelReader with Slow wrapped parallel readers, they only need same 
 maxDoc() (and deletions)
 - a MultiReader containing all sub-ParallelReaders. This needs 
 CompositeReaders with same docStarts[]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way

2012-02-05 Thread Uwe Schindler (Issue Comment Edited) (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13198337#comment-13198337
]

Uwe Schindler edited comment on LUCENE-3736 at 2/5/12 11:56 PM:

Here just my cleanup work in ParallelReader, nothing new. It's as before, only
bugs (missing open checks) fixed and code violations (synthetic accessors,
final fields).

The next step will be to remove the add() methods, as IndexReaders should not
be changed after create.

Will work more tomorrow.

The plan is:
- Move all subreaders to ctor (builder-like API. First build reader-set, then
call build)
- Rename ParallelReader to ParallelAtomicReader
- Add a ParallelCompositeReader with same builder API, but taking any
CompositeReader-set and checks them that they are aligned (docStarts
identical). The subreaders are ParallelAtomicReaders.

was (Author: thetaphi):
Here just my cleanup work in ParallelReader, nothing new. It's as before,
only bugs (missing open checks) fixed and code violations (synthetic
accessors, final fields).

The next step will be to remove the add() methods, as IndexReaders should not
be changed after create.

Will work more tomorrow.

The plan is:
- Move all subreaders to ctor (builder-like API. First build reader-set, then
call build)
- Rename ParallelReader to AtomicParallelReader
- Add a CompositeParallelReader with same builder API, but taking any
CompositeReader-set and checks them that they are aligned (docStarts
identical). The subreaders are AtomicParallelReaders.

ParallelReader is now atomic, add convenience methods to wrap
CompositeReaders in either slow atomic or fast composite way
--

Key: LUCENE-3736
URL: https://issues.apache.org/jira/browse/LUCENE-3736
Project: Lucene - Java
Issue Type: Sub-task
Components: core/index
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Fix For: 4.0

Attachments: LUCENE-3736.patch, LUCENE-3736.patch

ParallelReader is now atomic. We should add a sugar wrapper method to allow
synchronized composite readers (with same segment sizes) to be aligned with
MultiReaders or wrapped by Slow:
- one ParallelReader with Slow wrapped parallel readers, they only need same
maxDoc() (and deletions)
- a MultiReader containing all sub-ParallelReaders. This needs
CompositeReaders with same docStarts[]

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3736) ParallelReader is now atomic, add convenience methods to wrap CompositeReaders in either slow atomic or fast composite way

2012-02-05 Thread Uwe Schindler (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200975#comment-13200975
 ] 

Uwe Schindler commented on LUCENE-3736:
---

There are som test todos: The tests for parallel readers are very simplistic 
and have only 2 documents (which is especially stupid for composite readers to 
test them). We should raise number of documents.

 ParallelReader is now atomic, add convenience methods to wrap 
 CompositeReaders in either slow atomic or fast composite way
 --

 Key: LUCENE-3736
 URL: https://issues.apache.org/jira/browse/LUCENE-3736
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/index
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-3736.patch, LUCENE-3736.patch


 ParallelReader is now atomic. We should add a sugar wrapper method to allow 
 synchronized composite readers (with same segment sizes) to be aligned with 
 MultiReaders or wrapped by Slow:
 - one ParallelReader with Slow wrapped parallel readers, they only need same 
 maxDoc() (and deletions)
 - a MultiReader containing all sub-ParallelReaders. This needs 
 CompositeReaders with same docStarts[]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Koji Sekiguchi


Welcome David!

(12/02/05 22:46), Grant Ingersoll wrote:

I'm pleased to announce that the Lucene PMC has elected to add David Smiley as 
a committer to the Lucene/Solr project in recognition of  his ongoing 
contributions.

David, custom is to say a little bit about yourself, so feel free to give a 
little background on yourself.

Welcome aboard,
Grant
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





--
http://www.rondhuit.com/en/

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1758) schema definition for configuration files (validation, XSD)

2012-02-05 Thread Commented


[ 
https://issues.apache.org/jira/browse/SOLR-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200979#comment-13200979
 ] 

Jan Høydahl commented on SOLR-1758:
---

A not validating warning due to pre Solr4.x version sounds good to me.

Would it make more sense to keep the xsd file(s) inside the WAR instead of in 
example/solr/conf? This way you don't need to copy all the schemas (for v4.0, 
4.1, 4.2 etc) around with your solr config. Then add a JavaOpt which can 
disable validation -{{Dsolr.validate=false}}

 schema definition for configuration files (validation, XSD)
 ---

 Key: SOLR-1758
 URL: https://issues.apache.org/jira/browse/SOLR-1758
 Project: Solr
  Issue Type: New Feature
Reporter: Jorg Heymans
  Labels: configuration, schema.xml, solrconfig.xml, validation, 
 xsd
 Fix For: 4.0

 Attachments: config-validation-20110523.patch


 It is too easy to make configuration errors in Solr without getting warnings. 
 We should explore ways of validation configurations. See mailing list 
 discussion at http://search-lucene.com/m/h6xKf1EShE6

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread Jan Høydahl

Heartly welcome, we need the committing bandwidth you add to the project!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 5. feb. 2012, at 14:46, Grant Ingersoll wrote:

 I'm pleased to announce that the Lucene PMC has elected to add David Smiley 
 as a committer to the Lucene/Solr project in recognition of  his ongoing 
 contributions.
 
 David, custom is to say a little bit about yourself, so feel free to give a 
 little background on yourself.
 
 Welcome aboard,
 Grant
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3099) Add query operator, index structure, and analyzer for exact match searching

2012-02-05 Thread Mike (Created) (JIRA)

Add query operator, index structure, and analyzer for exact match searching
-

 Key: SOLR-3099
 URL: https://issues.apache.org/jira/browse/SOLR-3099
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Mike


A project I'm working on requires *exact match* searching with stemming turned 
off. The users are accostomed to Sphinx search, and thus expect a query like [ 
=runs ] to return only documents that contain the exact term, runs, and not 
the stemmed word run.

In SOLR-2866, there is similar work, but I believe it is different because it 
uses a huge-synonym file rather than storing the original terms directly in the 
index. 

What I'd like instead is two things:
1. An analyzer that says, store the original form of all words in the index 
along with the stemmed variations. If necessary, it's fine if this is simply 
an unstemmed field, but that seems cumbersome schema-wise and performance-wise.
2. An operator in edismax that allows users to query the exact form of the 
word. Sphinx uses the equals sign (=), and that makes sense logically to me.

This issue is part of a meta issue, SOLR-3028, that is requesting two other 
operators in edismax (quorum search and word order).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2866) Marked synonym filter for selective token expansion

2012-02-05 Thread Mike (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201002#comment-13201002
 ] 

Mike commented on SOLR-2866:


Hi. FYI, I've created a new issue, SOLR-3099, that is requesting that this 
feature be supported in the index and the edismax parser. I don't *think* the 
overlap is huge, but that seemed like a better approach to me, so I've created 
a branch of the conversation over there. 

 Marked synonym filter for selective token expansion
 ---

 Key: SOLR-2866
 URL: https://issues.apache.org/jira/browse/SOLR-2866
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
 Environment: Solr 3.4
Reporter: Victor van der Wolf
Priority: Minor
  Labels: stemming, synonyms
 Fix For: 3.6

 Attachments: MarkedSynonymFilterFactory.java, 
 SlowMarkedSynonymFilter.java, SlowMarkedSynonymFilterFactory.java


 Hi everybody,
 My name is Victor van der Wolf and since recently I work for the Royal 
 Library in the Netherlands. One of my first assignments here was to see if I 
 could implement some stemming algorithm for our websites. Our search engine 
 is solr/lucene 3.4.
 Basically I had 2 requirements to work with:
 1)   It should be possible to switch the stemming functionality on and 
 off in the front end
 2)   No extra storage should be required (no extra indexing).
 I shortly came to the conclusion that it would be practical to use the 
 SynonymFilter to do that. I got hold of a dutch library and used a stemming 
 algorithm to generate a synonym file on that.
 Then I thought that I could maybe use 2 different query analyzers under the 
 field type and then call one or the other depending if I want stemming or 
 not, like this q=field:analyzer:search term. Unfortunately this did not 
 seem possible.
 Then, after some discussions with Erick Erickson, it became clear that a good 
 approach could be to write my own SynonymFilter and apply some kind of token 
 marking to decide it that token should be synonymized or not. Well, I did 
 just that and it works like a charm.
 I would like to contribute this MarkedSynonymFilter class to the project.
 I used the SynonymFilter class as a starting point and added some extra 
 functionality to that. First of all, I added 3 new parameters called lookup, 
 preMark and postmark. The preMark and postmark parameters contain some kind 
 of pre- and suffix to recognize if a token should be synonymized or not. A 
 simple regex is used to determine this. Then the lookup parameter determines 
 the behaviour of the MarkedSynonymFilter:
 lookup=marked - marked tokens will be synonymized
 lookup=unmarked - unmarked tokens will be synonymized
 lookup=all - all tokens should be synonymized
 lookup=none - none of the tokens should be synonymized
 I started out writing this based on version 3.3, later I discovered that we 
 were using 3.4 and I had to upgrade it. Unfortunately the whole SynonymFilter 
 code has been revised and for the moment there is the Slow and the Fast 
 synonym filter where the Slow one if depricated. My addition is based on the 
 slow version I am afraid.
 Anyway, I am curious about your comments. Please let me know if I should go 
 forward with this and create a JIRA issue + my code as a patch.
 Cheers,
 Victor van der Wolf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3100) Add an operator to edismax for term quorum

2012-02-05 Thread Mike (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201003#comment-13201003
 ] 

Mike commented on SOLR-3100:


Oops. Please ignore the bit about stemming above. Poor copy/paste on my behalf.

 Add an operator to edismax for term quorum
 --

 Key: SOLR-3100
 URL: https://issues.apache.org/jira/browse/SOLR-3100
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Mike
   Original Estimate: 2h
  Remaining Estimate: 2h

 A project I'm working on requires *term quorum* searching with stemming 
 turned off. The users are accostomed to Sphinx search, and thus expect a 
 query like [ A AND (B C D)/2 ] to return only documents that contain A or at 
 least two of B, C or D. 
 So this document would match:
 a b c
 But this one wouldn't:
 a b
 This can be a useful form of fuzzy searching, and I think we support it via 
 the MM parameter, but we lack a user-facing operator for this. It would be 
 great to add it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3028) Support for additional query operators (feature parity request)

2012-02-05 Thread Mike (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201006#comment-13201006
]

Mike commented on SOLR-3028:

Agreed - now that we're talking through three threads simultaneously, it seems
obvious we need three tickets. This one can serve as a meta ticket, I suppose.

Therefore:
1. I split off *exact match* into SOLR-3099, and made a comment in SOLR-2866. I
think they're different enough to warrant separate issues.
2. I split off *quorum search* into SOLR-3100.
3. I split off *word order* to issue SOLR-3101..

And I'll set depends on flags shortly here, assuming I have the needed
permissions. Thanks again for the guidance and help, Hoss.

Support for additional query operators (feature parity request)
---

Key: SOLR-3028
URL: https://issues.apache.org/jira/browse/SOLR-3028
Project: Solr
Issue Type: Improvement
Components: search
Affects Versions: 4.0
Reporter: Mike
Labels: operator, queryparser
Original Estimate: 6h
Remaining Estimate: 6h

I'm migrating my system from Sphinx Search, and there are a couple of
operators that are not available to Solr, which are available in Sphinx.
I would love to see the following added to the Dismax parser:
1. Exact match. This might be tricky to get right, since it requires work on
the index side as well[1], but in Sphinx, you can do a query such as [
=running walking ], and running will have stemming off, while walking will
have it on.
2. Term quorum. In Sphinx and some commercial search engines (like Recommind,
Westlaw and Lexis), you can do a search such as [ (cat dog goat)/15 ], and
find the three words within 15 terms of each other. I think this is possible
in the backend via the span query, but there's no front end option for it, so
it's quite hard to reveal to users.
3. Word order. Being able to say, this term before that one, and this other
term before the next is something else in Sphinx that span queries support,
but is missing in the query parser. Would be great to get this in too.
These seem like the three biggest missing operators in Solr to me. I would
love to help move these forward if there is any way I can help.
[1] At least, *I* think it does. There's some discussion of one way of doing
exact match like support in SOLR-2866.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-3101) Add an operator to edismax for word order

2012-02-05 Thread Mike (Created) (JIRA)

Add an operator to edismax for word order
-

 Key: SOLR-3101
 URL: https://issues.apache.org/jira/browse/SOLR-3101
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Mike


A project I'm working on requires *word order* searching. The users are 
accustomed to Sphinx search, and expect a query like [ A  B ] to return only 
documents that contain the term A before the term B.

I believe this can currently be done with the surround parser (SOLR-2703), but 
we lack an operator for it. It would be great to add it, so that word order 
searches can be combined by users into sophisticated queries. 

Note that this should also support a query like [ A  A], which would require 
that the term be in the document twice (the first instance before the second).

This issue is part of a meta issue, SOLR-3028, that is requesting two other 
operators in edismax (quorum search and exact match).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3099) Add query operator, index structure, and analyzer for exact match searching

2012-02-05 Thread Mike (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-3099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike updated SOLR-3099:
---

Issue Type: Sub-task (was: New Feature)
Parent: SOLR-3028

Add query operator, index structure, and analyzer for exact match searching
-

Key: SOLR-3099
URL: https://issues.apache.org/jira/browse/SOLR-3099
Project: Solr
Issue Type: Sub-task
Components: Schema and Analysis
Reporter: Mike
Original Estimate: 4h
Remaining Estimate: 4h

A project I'm working on requires *exact match* searching with stemming
turned off. The users are accostomed to Sphinx search, and thus expect a
query like [ =runs ] to return only documents that contain the exact term,
runs, and not the stemmed word run.
In SOLR-2866, there is similar work, but I believe it is different because it
uses a huge-synonym file rather than storing the original terms directly in
the index.
What I'd like instead is two things:
1. An analyzer that says, store the original form of all words in the index
along with the stemmed variations. If necessary, it's fine if this is simply
an unstemmed field, but that seems cumbersome schema-wise and
performance-wise.
2. An operator in edismax that allows users to query the exact form of the
word. Sphinx uses the equals sign (=), and that makes sense logically to me.
This issue is part of a meta issue, SOLR-3028, that is requesting two other
operators in edismax (quorum search and word order).

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3100) Add an operator to edismax for term quorum

2012-02-05 Thread Mike (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike updated SOLR-3100:
---

Issue Type: Sub-task  (was: New Feature)
Parent: SOLR-3028

 Add an operator to edismax for term quorum
 --

 Key: SOLR-3100
 URL: https://issues.apache.org/jira/browse/SOLR-3100
 Project: Solr
  Issue Type: Sub-task
  Components: search
Reporter: Mike
   Original Estimate: 2h
  Remaining Estimate: 2h

 A project I'm working on requires *term quorum* searching with stemming 
 turned off. The users are accostomed to Sphinx search, and thus expect a 
 query like [ A AND (B C D)/2 ] to return only documents that contain A or at 
 least two of B, C or D. 
 So this document would match:
 a b c
 But this one wouldn't:
 a b
 This can be a useful form of fuzzy searching, and I think we support it via 
 the MM parameter, but we lack a user-facing operator for this. It would be 
 great to add it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3101) Add an operator to edismax for word order

2012-02-05 Thread Mike (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike updated SOLR-3101:
---

Issue Type: Sub-task  (was: New Feature)
Parent: SOLR-3028

 Add an operator to edismax for word order
 -

 Key: SOLR-3101
 URL: https://issues.apache.org/jira/browse/SOLR-3101
 Project: Solr
  Issue Type: Sub-task
  Components: search
Reporter: Mike
   Original Estimate: 4h
  Remaining Estimate: 4h

 A project I'm working on requires *word order* searching. The users are 
 accustomed to Sphinx search, and expect a query like [ A  B ] to return 
 only documents that contain the term A before the term B.
 I believe this can currently be done with the surround parser (SOLR-2703), 
 but we lack an operator for it. It would be great to add it, so that word 
 order searches can be combined by users into sophisticated queries. 
 Note that this should also support a query like [ A  A], which would 
 require that the term be in the document twice (the first instance before the 
 second).
 This issue is part of a meta issue, SOLR-3028, that is requesting two other 
 operators in edismax (quorum search and exact match).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: ToParentBlockJoinQuery vs filtered search

2012-02-05 Thread Mikhail Khludnev

On Mon, Feb 6, 2012 at 2:25 AM, Michael McCandless
luc...@mikemccandless.com wrote:

Hi Mikhail,

BlockJoinQParserPlugin sounds cool!

Thanks for resolving my hesitations. It allows me move forward.

It looks like that's what your test case is testing for...? Does it pass?

Of course it doesn't.
the first reason is that BlockJoinWeight.scorer()
http://svn.apache.org/viewvc/lucene/dev/trunk/modules/join/src/java/org/apache/lucene/search/join/ToParentBlockJoinQuery.java?view=markup
has the opposite intention (btw, are you %100 sure?):
* Children query is filtered by the given filter
childWeight.scorer(readerContext, true, false, *acceptDocs*);
* Parent filter is not constrained
parentsFilter.getDocIdSet(readerContext,
*readerContext.reader().getLiveDocs()*);
That's why I asked for the rationale of filtered BJQ search.

The also complication which I met is that
AssertingIndexSearcher.wrapFilter() randomly switches from filtered
search to FilteredQuery.
http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/src/test-framework/java/org/apache/lucene/search/AssertingIndexSearcher.java
it leads to IllegalStateExceptionparentFilter must return
FixedBitSet; got BitsFilteredDocIdSet. I suppose I can deal with it.

Mike McCandless

http://blog.mikemccandless.com

On Sun, Feb 5, 2012 at 3:25 PM, Mikhail Khludnev
mkhlud...@griddynamics.com wrote:
Hello,

Considering Solr's fq= I suppose that there is more sense to apply that
filter to parent documents. WDYT?

I'm attaching the small amendments to TestBlockJoin to get you my
understanding.

Thanks in advance.

--
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

--
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3097) Introduce default Japanese stoptags and stopwords to Solr's example configuration

2012-02-05 Thread Christian Moen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201039#comment-13201039
 ] 

Christian Moen commented on SOLR-3097:
--

Thanks, Robert.

Is your thinking to use the {{sync-analyzers}} target to automatically copy 
resources to the right place as part of {{package}}, {{example}}, etc. -- or is 
this as convenience to easier make sure the files are in sync when we check 
them in separately?

The {{sync-analyzers}} works fine for the latter purpose, but needs hookups 
elsewhere in {{build.xml}} if we want to do this automatically.  Happy to 
follow up on the latter if this is what you'd like to see in the patch.


 Introduce default Japanese stoptags and stopwords to Solr's example 
 configuration
 -

 Key: SOLR-3097
 URL: https://issues.apache.org/jira/browse/SOLR-3097
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
 Attachments: SOLR-3097.patch, SOLR-3097.patch


 SOLR-3056 discusses introducing a default field type {{text_ja}} for Japanese 
 in {{schema.xml}}.  This configuration will be improved by also introducing 
 default stopwords and stoptags configuration for the field type.  
 I believe this configuration should be easily available and tunable to Solr 
 users and I'm proposing that we introduce the same stopwords and stoptags 
 provided in LUCENE-3745 to Solr example configuration.  I'm proposing that 
 files can live in {{solr/example/solr/conf}} as {{stopwords_ja.txt}} and 
 {{stoptags_ja.txt}} alongside {{stopwords_en.txt}} for English.  (Longer 
 term, I think should reconsider our overall approach to this across all 
 languages, but that's perhaps a separate discussion.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3746) suggest.fst.Sort.BufferSize should not automatically fail just because of freeMemory()

2012-02-05 Thread Doron Cohen (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13201073#comment-13201073
 ] 

Doron Cohen commented on LUCENE-3746:
-

Thanks Dawid! 

{quote}
it's probably a system daemon thread for sending memory threshold notifications
{quote}

Yes this makes sense. 
Still the difference between the two JDKs felt bothering.
Some more digging, and now I think it is clear. 

Here are the stack traces reported (at the end of the test) with Oracle:
{noformat}
1.  Thread[ReaderThread,5,main]
2.  Thread[main,5,main]
3.  Thread[Reference Handler,10,system]
4.  Thread[Signal Dispatcher,9,system]
5.  Thread[Finalizer,8,system]
6.  Thread[Attach Listener,5,system]
{noformat}

And with IBM JDK:
{noformat}
1.  Thread[Attach API wait loop,10,main]
2.  Thread[Finalizer thread,5,system]
3.  Thread[JIT Compilation Thread,10,system]
4.  Thread[main,5,main]
5.  Thread[Gc Slave Thread,5,system]
6.  Thread[ReaderThread,5,main]
7.  Thread[Signal Dispatcher,5,main]
8.  Thread[MemoryPoolMXBean notification dispatcher,6,main]
{noformat}

The 8th thread is the one that started only after accessing the memory 
management layer. The thing is, that in the IBM JDK that thread is created in 
the ThreadGroup main, while in the Oracle JDK it is created under system. 
To me the latter makes more sense. 

To be more sure I added a fake memory notification listener and check the 
thread in which notification happens: 
{code}
MemoryMXBean mmxb = ManagementFactory.getMemoryMXBean();
NotificationListener listener = new NotificationListener() {
  @Override
  public void handleNotification(Notification notification, Object handback) {
System.out.println(Thread.currentThread());
  }
};
((NotificationEmitter) mmxb).addNotificationListener(listener, null, null);
{code}

Evidently in IBM JDK the notification is in main group thread (also in line 
with the thread-group in the original warning message which triggered this 
threads discussion):
{noformat}
Thread[MemoryPoolMXBean notification dispatcher,6,main]
{noformat}

While in Oracle JDK notification is in system group thread:
{noformat}
Thread[Low Memory Detector,9,system]
{noformat}

This also explains why the warning is reported only for IBM JDK: because the 
threads check in LTC only account for the threads in the same thread-group as 
the one running the specific test case. So when dispatching happens in a 
system group thread it is not sensed by that check at all.

Ok now with mystery solved I can commit the simpler code...

 suggest.fst.Sort.BufferSize should not automatically fail just because of 
 freeMemory()
 --

 Key: LUCENE-3746
 URL: https://issues.apache.org/jira/browse/LUCENE-3746
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/spellchecker
Reporter: Doron Cohen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3746.patch, LUCENE-3746.patch, LUCENE-3746.patch


 Follow up op dev thread: [FSTCompletionTest failure At least 0.5MB RAM 
 buffer is needed | http://markmail.org/message/d7ugfo5xof4h5jeh]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome David Smiley

2012-02-05 Thread David Smiley (@MITRE.org)

Wow! It is truly an honor to be selected by the Lucene PMC to join the
committer ranks.  You are a top notch team of coders working on one of the
most important open-source projects.

About me:

My technical background is all tiers of web development with a focus on the
middle tier and Java.  Of course I have expertise in Lucene and Solr but I
also focus on geospatial matters as well as threading / concurrency.  I like
solving hard interesting problems.

I am employed full time by The MITRE Corporation, a US federally funded
non-profit organization in which I mostly work in the defense sector. I've
been with MITRE for ~14 years. I've been fortunate lately to work on
projects that fund my open-source geospatial work.  I conduct Solr training
at MITRE (1 day and 2-day classes), and I'm sort of a search consultant
within MITRE, advising MITRE and its government clients.  For 6 months, I
have also been working part-time for OpenSource Connections as a search
consultant.

At home, I'm married with two kids: Adeline who is 10 months old (she's in
my arms sleeping as I write this) and Camille who is 2 years 10 months old. 
I don't know how I found the time to write a book, but now that it's done,
I'm on full parental duty when at home.  For fun, I like to follow Starcraft
2 professional e-sports.  It's conveniently something I can do while I hold
a baby; playing the game isn't, unfortuantely.

I look forward to meeting you all at Lucene Revolution in May!  I live close
by in Lowell.

Cheers,
  David Smiley

-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Welcome-David-Smiley-tp3717248p3718969.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

67 matches

Mail list logo