Re: Eclipse project files...

2010-04-12 Thread Koji Sekiguchi

Paolo Castagna wrote:

Hi,
could you be more precise about 'import project from source tree'.


I think he suggested

File  New  Java Project  Create project from existing source

?

Koji

--
http://www.rondhuit.com/en/



[jira] Closed: (SOLR-1879) Error loading class 'Solr.ASCIIFoldingFilterFactory'

2010-04-12 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi closed SOLR-1879.


Resolution: Not A Problem

Adlene, please use solr-user mailing list for getting help.

http://lucene.apache.org/solr/mailing_lists.html


 Error loading class 'Solr.ASCIIFoldingFilterFactory'
 

 Key: SOLR-1879
 URL: https://issues.apache.org/jira/browse/SOLR-1879
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
 Environment: Windows XP, Apache Tomcat 6
Reporter: adlene sifi

 I am trying  to use Solr.ASCIIFoldingFilterFactory filter  as follow :
 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.StopFilterFactory
 ignoreCase=true
 words=french_stopwords.txt
 enablePositionIncrements=true
 /
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.SnowballPorterFilterFactory language=French/
   filter class=Solr.ASCIIFoldingFilterFactory/
   charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
   /analyzer
 ...
 /fieldType
 However I receive the following error message when restarting Apach Tomcat 
 server :
 GRAVE: org.apache.solr.common.SolrException: Error loading class 
 'Solr.ASCIIFoldingFilterFactory'
   at 
 org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
   at 
 org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:388)
 .
 Caused by: java.lang.ClassNotFoundException: Solr.ASCIIFoldingFilterFactory
   at java.net.URLClassLoader$1.run(Unknown Source)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(Unknown Source)
   at java.lang.ClassLoader.loadClass(Unknown Source)
   ... 40 more
 Could you please help me on that ?
 Thanks a lot
 Adlene

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (SOLR-1878) RelaxQueryComponent - A new SearchComponent that relaxes the main in a semiautomatic way

2010-04-11 Thread Koji Sekiguchi (JIRA)
RelaxQueryComponent - A new SearchComponent that relaxes the main in a 
semiautomatic way


 Key: SOLR-1878
 URL: https://issues.apache.org/jira/browse/SOLR-1878
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor


I have the following use case:

Imagine that you visit a web page for searching an apartment for rent. You 
choose parameters, usually mark check boxes and this makes AND queries:

{code}
rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *]
{code}

If the conditions are too tight, Solr may return few or zero leasehold 
properties. Because the things is not good for the site visitors and also 
owners, the owner may want to recommend the visitors to relax the conditions 
something like:

{code}
rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *]
{code}

or:

{code}
rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *]
{code}

And if the relaxed query get more numFound than original, the web page can 
provide a link with a comment if you can pay additional $100, ${numFound} 
properties will be found!.

Today, I need to implement client for this scenario, but this way makes two 
round trips for showing one page and consistency problem (and laborious of 
course!).

I'm thinking a new SearchComponent that can be used with QueryComponent. It 
does search when numFound of the main query is less than a threshold. Clients 
can specify via request parameters how the query can be relaxed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (SOLR-1878) RelaxQueryComponent - A new SearchComponent that relaxes the main query in a semiautomatic way

2010-04-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1878:
-

Summary: RelaxQueryComponent - A new SearchComponent that relaxes the 
main query in a semiautomatic way  (was: RelaxQueryComponent - A new 
SearchComponent that relaxes the main in a semiautomatic way)
Description: 
I have the following use case:

Imagine that you visit a web page for searching an apartment for rent. You 
choose parameters, usually mark check boxes and this makes AND queries:

{code}
rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *]
{code}

If the conditions are too tight, Solr may return few or zero leasehold 
properties. Because the things is not good for the site visitors and also 
owners, the owner may want to recommend the visitors to relax the conditions 
something like:

{code}
rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *]
{code}

or:

{code}
rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *]
{code}

And if the relaxed query get more numFound than original, the web page can 
provide a link with a comment if you can pay additional $100, ${numFound} 
properties will be found!.

Today, I need to implement Solr client for this scenario, but this way makes 
two round trips for showing one page and consistency problem (and laborious of 
course!).

I'm thinking a new SearchComponent that can be used with QueryComponent. It 
does search when numFound of the main query is less than a threshold. Clients 
can specify via request parameters how the query can be relaxed.

  was:
I have the following use case:

Imagine that you visit a web page for searching an apartment for rent. You 
choose parameters, usually mark check boxes and this makes AND queries:

{code}
rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *]
{code}

If the conditions are too tight, Solr may return few or zero leasehold 
properties. Because the things is not good for the site visitors and also 
owners, the owner may want to recommend the visitors to relax the conditions 
something like:

{code}
rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *]
{code}

or:

{code}
rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *]
{code}

And if the relaxed query get more numFound than original, the web page can 
provide a link with a comment if you can pay additional $100, ${numFound} 
properties will be found!.

Today, I need to implement client for this scenario, but this way makes two 
round trips for showing one page and consistency problem (and laborious of 
course!).

I'm thinking a new SearchComponent that can be used with QueryComponent. It 
does search when numFound of the main query is less than a threshold. Clients 
can specify via request parameters how the query can be relaxed.


 RelaxQueryComponent - A new SearchComponent that relaxes the main query in a 
 semiautomatic way
 --

 Key: SOLR-1878
 URL: https://issues.apache.org/jira/browse/SOLR-1878
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor

 I have the following use case:
 Imagine that you visit a web page for searching an apartment for rent. You 
 choose parameters, usually mark check boxes and this makes AND queries:
 {code}
 rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[100 TO *]
 {code}
 If the conditions are too tight, Solr may return few or zero leasehold 
 properties. Because the things is not good for the site visitors and also 
 owners, the owner may want to recommend the visitors to relax the conditions 
 something like:
 {code}
 rent:[* TO 1700] AND bedroom:[2 TO *] AND floor:[100 TO *]
 {code}
 or:
 {code}
 rent:[* TO 1500] AND bedroom:[2 TO *] AND floor:[90 TO *]
 {code}
 And if the relaxed query get more numFound than original, the web page can 
 provide a link with a comment if you can pay additional $100, ${numFound} 
 properties will be found!.
 Today, I need to implement Solr client for this scenario, but this way makes 
 two round trips for showing one page and consistency problem (and laborious 
 of course!).
 I'm thinking a new SearchComponent that can be used with QueryComponent. It 
 does search when numFound of the main query is less than a threshold. Clients 
 can specify via request parameters how the query can be relaxed.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] Created: (SOLR-1868) Cutover to flex APIs

2010-04-08 Thread Koji Sekiguchi

Michael McCandless (JIRA) wrote:

Cutover to flex APIs


 Key: SOLR-1868
 URL: https://issues.apache.org/jira/browse/SOLR-1868
 Project: Solr
  Issue Type: Bug
Reporter: Michael McCandless
 Fix For: 3.1


We need to fix Solr to use flex APIs!

  

Hello,

I'm a latecomer on the issue of flex, but I'd like to know it
and if possible, contribute something. But I chicken out
when I see LUCENE-1458 and its friend issues:

https://issues.apache.org/jira/browse/LUCENE/fixforversion/12314439

I think I should read FlexibleIndexing on wiki, but I'd appreciate
it if someone recommend pointers for flex latecomers, if any.

Thank you!

Koji

--
http://www.rondhuit.com/en/



Re: (SOLR-1868) Cutover to flex APIs

2010-04-08 Thread Koji Sekiguchi



for Solr  is to cutover to FieldsEnum, TermsEnum, DocsEnum,
DocsAndPositionsEnum instead of TermEnum, TermDocs, TermPositions.

  

Hi Mike,

These lines are really helps for me, thanks!

Koji

--
http://www.rondhuit.com/en/



[jira] Updated: (SOLR-860) moreLikeThis Degug

2010-04-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-860:


Attachment: SOLR-860.patch

With the attached patch, BooleanQueries constructed by MLT and MLT helper 
function can be seen in debug area.

sample request and response:

{code}
http://localhost:8983/solr/select/?q=solr+ipodindent=onmlt=onmlt.fl=featuresmlt.mintf=1mlt.count=2debugQuery=onwt=json
{code}

{code}
debug:{
  moreLikeThis:{
IW-02:{
   rawMLTQuery:,
   boostedMLTQuery:,
   realMLTQuery:+() -id:IW-02},
SOLR1000:{
   rawMLTQuery:,
   boostedMLTQuery:,
   realMLTQuery:+() -id:SOLR1000},
F8V7067-APL-KIT:{
   rawMLTQuery:,
   boostedMLTQuery:,
   realMLTQuery:+() -id:F8V7067-APL-KIT},
MA147LL/A:{
   rawMLTQuery:features:2 features:0 features:lcd features:x features:3,
   boostedMLTQuery:features:2 features:0 features:lcd features:x 
features:3,
   realMLTQuery:+(features:2 features:0 features:lcd features:x 
features:3) -id:MA147LL/A}},

}
{code}


 moreLikeThis Degug
 --

 Key: SOLR-860
 URL: https://issues.apache.org/jira/browse/SOLR-860
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
 Environment: Gentoo Linux, Solr 1.4, tomcat webserver
Reporter: Jeff
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-860.patch


 moreLikeThis searchcomponent currently has no way to debug or see information 
 on the process.  This means that if moreLikeThis suggests another document 
 there is no way to actually view why it picked that to hone the searching.  
 Adding an explain would be extremely useful in determining the reasons why 
 solr is recommending the items.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-860) moreLikeThis Degug

2010-04-06 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-860:


  Component/s: (was: search)
   SearchComponents - other
 Priority: Minor  (was: Major)
Fix Version/s: (was: 1.5)
   3.1

 moreLikeThis Degug
 --

 Key: SOLR-860
 URL: https://issues.apache.org/jira/browse/SOLR-860
 Project: Solr
  Issue Type: New Feature
  Components: SearchComponents - other
Affects Versions: 1.3
 Environment: Gentoo Linux, Solr 1.4, tomcat webserver
Reporter: Jeff
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 3.1

 Attachments: SOLR-860.patch


 moreLikeThis searchcomponent currently has no way to debug or see information 
 on the process.  This means that if moreLikeThis suggests another document 
 there is no way to actually view why it picked that to hone the searching.  
 Adding an explain would be extremely useful in determining the reasons why 
 solr is recommending the items.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-860) moreLikeThis Degug

2010-03-29 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-860:
---

Assignee: Koji Sekiguchi

 moreLikeThis Degug
 --

 Key: SOLR-860
 URL: https://issues.apache.org/jira/browse/SOLR-860
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
 Environment: Gentoo Linux, Solr 1.4, tomcat webserver
Reporter: Jeff
Assignee: Koji Sekiguchi
 Fix For: 1.5


 moreLikeThis searchcomponent currently has no way to debug or see information 
 on the process.  This means that if moreLikeThis suggests another document 
 there is no way to actually view why it picked that to hone the searching.  
 Adding an explain would be extremely useful in determining the reasons why 
 solr is recommending the items.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-860) moreLikeThis Degug

2010-03-29 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851072#action_12851072
 ] 

Koji Sekiguchi commented on SOLR-860:
-

At minimum, I'd like to see how the BooleanQuery constructed by mlt look like. 
Can ResponseBuilder.addDebugInfo() be used for it?

 moreLikeThis Degug
 --

 Key: SOLR-860
 URL: https://issues.apache.org/jira/browse/SOLR-860
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
 Environment: Gentoo Linux, Solr 1.4, tomcat webserver
Reporter: Jeff
Assignee: Koji Sekiguchi
 Fix For: 1.5


 moreLikeThis searchcomponent currently has no way to debug or see information 
 on the process.  This means that if moreLikeThis suggests another document 
 there is no way to actually view why it picked that to hone the searching.  
 Adding an explain would be extremely useful in determining the reasons why 
 solr is recommending the items.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1703) Sorting by function problems on multicore (more than one core)

2010-03-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1703:
-

Description: 
When using sort by function (for example dist function) with multicore with 
more than one core (on multicore with one core, ie. the example deployment the 
problem doesn`t exist) there is a problem with not using the right schema. I 
think there is a problem with this portion of code:

QueryParsing.java:

{code}
public static FunctionQuery parseFunction(String func, IndexSchema schema) 
throws ParseException {
SolrCore core = SolrCore.getSolrCore();
return (FunctionQuery) (QParser.getParser(func, func, new 
LocalSolrQueryRequest(core, new HashMap())).parse());
// return new FunctionQuery(parseValSource(new StrParser(func), schema));
}
{code}

Code above uses deprecated method to get the core sometimes getting the wrong 
core effecting in impossibility to find the right fields in index. 

  was:
When using sort by function (for example dist function) with multicore with 
more than one core (on multicore with one core, ie. the example deployment the 
problem doesn`t exist) there is a problem with not using the right schema. I 
think there is a problem with this portion of code:

QueryParsing.java:

public static FunctionQuery parseFunction(String func, IndexSchema schema) 
throws ParseException {
SolrCore core = SolrCore.getSolrCore();
return (FunctionQuery) (QParser.getParser(func, func, new 
LocalSolrQueryRequest(core, new HashMap())).parse());
// return new FunctionQuery(parseValSource(new StrParser(func), schema));
}

Code above uses deprecated method to get the core sometimes getting the wrong 
core effecting in impossibility to find the right fields in index. 


 Sorting by function problems on multicore (more than one core)
 --

 Key: SOLR-1703
 URL: https://issues.apache.org/jira/browse/SOLR-1703
 Project: Solr
  Issue Type: Bug
  Components: multicore, search
Affects Versions: 1.5
 Environment: Linux (debian, ubuntu), 64bits
Reporter: Rafał Kuć

 When using sort by function (for example dist function) with multicore with 
 more than one core (on multicore with one core, ie. the example deployment 
 the problem doesn`t exist) there is a problem with not using the right 
 schema. I think there is a problem with this portion of code:
 QueryParsing.java:
 {code}
 public static FunctionQuery parseFunction(String func, IndexSchema schema) 
 throws ParseException {
 SolrCore core = SolrCore.getSolrCore();
 return (FunctionQuery) (QParser.getParser(func, func, new 
 LocalSolrQueryRequest(core, new HashMap())).parse());
 // return new FunctionQuery(parseValSource(new StrParser(func), schema));
 }
 {code}
 Code above uses deprecated method to get the core sometimes getting the wrong 
 core effecting in impossibility to find the right fields in index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: intersection of the results of multiple queries

2010-03-05 Thread Koji Sekiguchi

Seffie Schwartz wrote:

hi -

Is there any way to get the intersection of the results of multiple queries 
without iterating through each result set?

seff
  

How about using multiple fq parameters?

Koji

--
http://www.rondhuit.com/en/



[jira] Updated: (SOLR-1297) Enable sorting by Function Query

2010-03-01 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1297:
-

Attachment: SOLR-1297-2.patch

When I set *bit* complex function to sort parameter, I got the error:

{panel}
Must declare sort field or function
org.apache.solr.common.SolrException: Must declare sort field or function
at 
org.apache.solr.search.QueryParsing.processSort(QueryParsing.java:376)
at org.apache.solr.search.QueryParsing.parseSort(QueryParsing.java:281)
at 
org.apache.solr.search.QueryParsingTest.testSort(QueryParsingTest.java:105)
{panel}

Attached the fix and the test case.

 Enable sorting by Function Query
 

 Key: SOLR-1297
 URL: https://issues.apache.org/jira/browse/SOLR-1297
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1297-2.patch, SOLR-1297.patch


 It would be nice if one could sort by FunctionQuery.  See also SOLR-773, 
 where this was first mentioned by Yonik as part of the generic solution to 
 geo-search

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-03-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839879#action_12839879
 ] 

Koji Sekiguchi commented on SOLR-1268:
--

bq. When using Dismax, the fast vector highlighter fails to return any 
highlighting when there is more than one column in qf (eg. qf=Name Company)...

Right. See https://issues.apache.org/jira/browse/LUCENE-2243 .


 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, 
 SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833527#action_12833527
 ] 

Koji Sekiguchi commented on SOLR-1773:
--

Oops, I've glanced at SOLR-236 related issues, but I wasn't awake to the 
existence. I'll look into SOLR-1682. Thanks! :)

 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833527#action_12833527
 ] 

Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 8:19 AM:
---

Oops, I've glanced at SOLR-236 related issues, but I thought it was for 
finalize response format from Description. I'll look into SOLR-1682. Thanks! :)

  was (Author: koji):
Oops, I've glanced at SOLR-236 related issues, but I wasn't awake to the 
existence. I'll look into SOLR-1682. Thanks! :)
  
 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Clover 2.6.3

2010-02-14 Thread Koji Sekiguchi
Why don't we move to Clover 2.6.3?

Index: build.xml
===
--- build.xml (revision 909743)
+++ build.xml (working copy)
@@ -429,7 +429,7 @@
description=Instrument the Unit tests using Clover. Requires a Clover
license and clover.jar in the ANT classpath. To use, specify
-Drun.clover=true on the command line./

target name=clover.setup if=clover.enabled
- taskdef resource=clovertasks/
+ taskdef resource=cloverlib.xml/
mkdir dir=${clover.db.dir}/
clover-setup initString=${clover.db.dir}/solr_coverage.db
fileset dir=src/common/

Koji

-- 
http://www.rondhuit.com/en/



[jira] Updated: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-13 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1773:
-

Attachment: SOLR-1773.patch

The first draft, untested patch. Use for PoC only. In this patch, I hard-coded 
sort field by using java.util.Comparator.

 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-13 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495
 ] 

Koji Sekiguchi commented on SOLR-1773:
--

Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|


 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-13 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1773:
-

Attachment: LOADTEST.patch

A very rough/simple load test patch attached.

QTime average of 1,000 times random queries were:

||num docs in index||SOLR-236||SOLR-1773||
|1M|321 ms|185ms|
|10M|2,914 ms (*)|1,642 ms|

(*) I needed to set -Xmx1024m in this case, though 512m for other cases, to 
avoid OOM.

SOLR-1773 is 43% faster.

 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-13 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495
 ] 

Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 4:51 AM:
---

Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported to specify sort criteria in 
collapse group

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|


  was (Author: koji):
Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|

  
 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)

2010-02-13 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495
 ] 

Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 4:54 AM:
---

Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported to specify sort criteria in 
collapse group

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group. default is 0.|
|collapse.fl|comma- or space- delimited list of fields to return. multiValued 
field and TrieField are not supported yet|


  was (Author: koji):
Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported to specify sort criteria in 
collapse group

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|

  
 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-02-05 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268.patch

The patch includes:

# eliminate hl.useHighlighter parameter
# introduce hl.useFastVectorHighlighter parameter. The default is false

Therefore, Highlighter will be used unless hl.useFastVectorHighlighter set to 
true. I'll commit in a few days.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, 
 SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2010-02-04 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829522#action_12829522
 ] 

Koji Sekiguchi commented on SOLR-236:
-

The following snippet in CollapseComponent.doProcess():

{code}
DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
  collapseResult == null ? rb.getFilters() : null,
  collapseResult.getCollapsedDocset(),
  rb.getSortSpec().getSort(),
  rb.getSortSpec().getOffset(),
  rb.getSortSpec().getCount(),
  rb.getFieldFlags());
{code}

2nd line implies that collapseResult may be null. If it is null, we got NPE at 
3rd line?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search

2010-02-04 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1753:
-

Affects Version/s: (was: 1.5)
Fix Version/s: 1.5

 StatsComponent throws java.lang.NullPointerException when getting statistics 
 for facets in distributed search
 -

 Key: SOLR-1753
 URL: https://issues.apache.org/jira/browse/SOLR-1753
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Windows
Reporter: Janne Majaranta
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-1753.patch


 When using the StatsComponent with a sharded request and getting statistics 
 over facets, a NullPointerException is thrown.
 Stacktrace:
 java.lang.NullPointerException at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) 
 at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) 
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
 java.lang.Thread.run(Unknown Source) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search

2010-02-04 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829914#action_12829914
 ] 

Koji Sekiguchi commented on SOLR-1753:
--

Patch looks good! Will commit shortly.

 StatsComponent throws java.lang.NullPointerException when getting statistics 
 for facets in distributed search
 -

 Key: SOLR-1753
 URL: https://issues.apache.org/jira/browse/SOLR-1753
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Windows
Reporter: Janne Majaranta
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-1753.patch


 When using the StatsComponent with a sharded request and getting statistics 
 over facets, a NullPointerException is thrown.
 Stacktrace:
 java.lang.NullPointerException at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) 
 at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) 
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
 java.lang.Thread.run(Unknown Source) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search

2010-02-04 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1753.
--

Resolution: Fixed

Committed revision 906781. Thanks Janne!

 StatsComponent throws java.lang.NullPointerException when getting statistics 
 for facets in distributed search
 -

 Key: SOLR-1753
 URL: https://issues.apache.org/jira/browse/SOLR-1753
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Windows
Reporter: Janne Majaranta
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-1753.patch


 When using the StatsComponent with a sharded request and getting statistics 
 over facets, a NullPointerException is thrown.
 Stacktrace:
 java.lang.NullPointerException at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) 
 at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) 
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
 java.lang.Thread.run(Unknown Source) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-02-04 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268-0_fragsize.patch

Hmm, FVH doesn't work appropriately when fragsize=Integer.MAX_SIZE (see 
test0FragSize() in attached patch. It indicates FVH cannot produce whole 
snippet when fragsize=Integer.MAX_SIZE).

Now I think I should change the (traditional) Highlighter is default even if 
the highlighting field's termVectors/termPositions/termOffsets are all true, 
then only when hl.useFastVectorHighlighter is set to true, FVH will be used. 
hl.useFastVectorHighlighter parameter accepts per-field overrides. Plus FVH 
doesn't support 0 fragsize.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, 
 SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2010-02-01 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828039#action_12828039
 ] 

Koji Sekiguchi commented on SOLR-236:
-

A random comment, don't we need to check collapse.field is indexed in 
checkCollapseField()?

{code}
protected void checkCollapseField(IndexSchema schema) {
  SchemaField schemaField = schema.getFieldOrNull(collapseField);
  if (schemaField == null) {
throw new RuntimeException(Could not collapse, because collapse field does 
not exist in the schema.);
  }

  if (schemaField.multiValued()) {
throw new RuntimeException(Could not collapse, because collapse field is 
multivalued);
  }

  if (schemaField.getType().isTokenized()) {
throw new RuntimeException(Could not collapse, because collapse field is 
tokenized);
  }
}
{code}

I accidentally specified an unindexed field for collapse.field, I got 
unexpected result without any errors.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: configure FastVectorHihglighter in trunk

2010-01-29 Thread Koji Sekiguchi

Marc Sturlese wrote:

I think it fails when using defType dismax with more than one field.
In the default Solr example doesn't work eighter. I have added the default
.xml files with docs and using standard requestHandler it works. It doesn't
when using the dismax requestHandler

  

Ah, I see. FVH doesn't support DisjunctionMaxQuery. I should
be awake to it when you indicated that you used dismax at the
previous mail. Sorry about that.
I'll open an issue in Lucene and try to write a patch.

Thank you,

Koji

--
http://www.rondhuit.com/en/



Re: configure FastVectorHihglighter in trunk

2010-01-29 Thread Koji Sekiguchi

Koji Sekiguchi wrote:

Marc Sturlese wrote:

I think it fails when using defType dismax with more than one field.
In the default Solr example doesn't work eighter. I have added the 
default
.xml files with docs and using standard requestHandler it works. It 
doesn't

when using the dismax requestHandler

  

Ah, I see. FVH doesn't support DisjunctionMaxQuery. I should
be awake to it when you indicated that you used dismax at the
previous mail. Sorry about that.
I'll open an issue in Lucene and try to write a patch.

Thank you,

Koji


Opened:
https://issues.apache.org/jira/browse/LUCENE-2243

Koji

--
http://www.rondhuit.com/en/



[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-29 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268-0_fragsize.patch

{quote}
I have noticed an exception is thrown when using fragSize = 0 (wich should 
return the whole field highlighted):
fragCharSize(0) is too small. It must be 18 or higher. 
java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 
or higher
{quote}

Thanks, Marc.
Solr 1.4 uses NullFragmenter that highlights whole content when you set 
fragsize to 0. But FVH doesn't have such feature because of using different 
algorithm.
In the attached patch, Solr sets fragsize to Integer.MAX_VALUE if user trys to 
set 0 when FVH is used. This prevents runtime error.
I think it is necessary in Solr level because Solr automatically switch to use 
FVH when the highlighting field is termVectors/termPositions/termOffsets are 
all true unless hl.useHighlighter set to true.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268.patch, 
 SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: configure FastVectorHihglighter in trunk

2010-01-28 Thread Koji Sekiguchi

Marc Sturlese wrote:

Can you give me the following info to reproduce the problem?

* field data


all fields are plain english text analyzed with the same analyzer

  

I meant I'd like to know your concrete data...

Koji

--
http://www.rondhuit.com/en/





Re: configure FastVectorHihglighter in trunk

2010-01-27 Thread Koji Sekiguchi

Can you give me the following info to reproduce the problem?

* field data
* query string
* field definition in schema.xml

 **I also have noticed that using snippet fragment size to 0 (wich in 
normal

 highlight returns the whole field highlighted) gives an error.

Hmm, I should check it. Can you open a JIRA issue?

Thank you,

Koji

--
http://www.rondhuit.com/en/


Marc Sturlese wrote:

I am having some trouble to make it work. I am debuging the code and I see
when de  FastVectorHighlighter constructor is created, the parameters that
it recieves are ok

// get FastVectorHighlighter instance out of the processing loop
FastVectorHighlighter fvh = new FastVectorHighlighter(
// FVH cannot process hl.usePhraseHighlighter parameter per-field
basis
params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ),
// FVH cannot process hl.requireFieldMatch parameter per-field basis
params.getBool( HighlightParams.FIELD_MATCH, false ),
getFragListBuilder( params ),
getFragmentsBuilder( params ) );

The query here is ok aswell:
FieldQuery fieldQuery = fvh.getFieldQuery( query );

But I can't see what's in fieldQuery (just a memory path and don't know to
do someting similar to toString())

The problem I see is in:

String[] snippets = highlighter.getBestFragments( fieldQuery,
req.getSearcher().getReader(), docId, fieldName,
params.getFieldInt( fieldName, HighlightParams.FRAGSIZE, 100
),
params.getFieldInt( fieldName, HighlightParams.SNIPPETS, 1 )
);

snippets ends up with an empty array so it jumps to:
alternateField( docSummaries, params, doc, fieldName );

In solrconfig.xml I added:
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=false/
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=false/

Maybe I am missing something... any idea?
Using the doHighlightingByHighlighter highlight works perfect.

**I also have noticed that using snippet fragment size to 0 (wich in normal
highlight returns the whole field highlighted) gives an error.



Koji Sekiguchi-2 wrote:
  

Marc Sturlese wrote:


How do I activate FastVectorHighlighter in trunk? Wich of those params
sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
  
  

You do not need to activate it. DefaultSolrHighlighter, which is the
default SolrHighlighter impl, calls automatically uses FVH when you
specify field names that are termVectors, termPositions and termOffsets
are true through hl.fl parameter. If you want to use multi colored tag
feature, you need to specify MultiColored*FragmentsBuilder in 
solrconfig.xml.


Koji

--
http://www.rondhuit.com/en/






  





Re: configure FastVectorHihglighter in trunk

2010-01-26 Thread Koji Sekiguchi

Marc Sturlese wrote:

How do I activate FastVectorHighlighter in trunk? Wich of those params sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
  

You do not need to activate it. DefaultSolrHighlighter, which is the
default SolrHighlighter impl, calls automatically uses FVH when you
specify field names that are termVectors, termPositions and termOffsets
are true through hl.fl parameter. If you want to use multi colored tag
feature, you need to specify MultiColored*FragmentsBuilder in 
solrconfig.xml.


Koji

--
http://www.rondhuit.com/en/



Re: how to sort facets?

2010-01-26 Thread Koji Sekiguchi

David Rühr wrote:

hi,

we make a Filter with Faceting feature. In our faceting list the order 
is by count by the matches:

facet.sort=count

but we need to sort by = facet.sort=manufacturer.
Url manipulation doesn't change anything, why?

select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 



so long,
David


Try facet.sort=index. facet.sort accepts only count or index.

http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

Koji

--
http://www.rondhuit.com/en/



[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting

2010-01-23 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804087#action_12804087
 ] 

Koji Sekiguchi commented on SOLR-1731:
--

So why don't you uni-gram on both index and query for sku field?

{code}
fieldType name=text_1g class=solr.TextField positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping.txt/
tokenizer class=solr.NGramTokenizerFactory minGramSize=1 
maxGramSize=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.NGramTokenizerFactory minGramSize=1 
maxGramSize=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
{code}

{quote}
As far as my application cares, those are all equivalent and should just be 
indexed as:

a1280c
{quote}

To eliminate space/period/hyphen, mapping.txt would look like:

{code}
  = 
. = 
- = 
{code}



 ArrayIndexOutOfBoundsException when highlighting
 

 Key: SOLR-1731
 URL: https://issues.apache.org/jira/browse/SOLR-1731
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Tim Underwood
Priority: Minor

 I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to 
 highlight for certain queries.  The error seems to be an issue with the 
 combination of the ShingleFilterFactory, PositionFilterFactory and the 
 LengthFilterFactory. 
 Here's my fieldType definition:
 fieldType name=textSku class=solr.TextField positionIncrementGap=100 
 omitNorms=true
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LengthFilterFactory min=2 max=100/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.ShingleFilterFactory maxShingleSize=8 
 outputUnigrams=true/
   filter class=solr.PositionFilterFactory /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.LengthFilterFactory min=2 max=100/ !-- works 
 if this is commented out --
 /analyzer
 /fieldType
 Here's the field definition:
 field name=sku_new type=textSku indexed=true stored=true 
 omitNorms=true/
 Here's a sample doc:
 add
 doc
   field name=id1/field
   field name=sku_newA 1280 C/field
 /doc
 /add
 Doing a query for sku_new:A 1280 C and requesting highlighting throws the 
 exception (full stack trace below):  
 http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=*
 If I comment out the LengthFilterFactory from my query analyzer section 
 everything seems to work.  Commenting out just the PositionFilterFactory also 
 makes the exception go away and seems to work for this specific query.
 Full stack trace:
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202)
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
 at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
 at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
 at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
 at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
 at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216

[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting

2010-01-22 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803976#action_12803976
 ] 

Koji Sekiguchi commented on SOLR-1731:
--

Can't you use WhitespaceTokenizer for index? 

 ArrayIndexOutOfBoundsException when highlighting
 

 Key: SOLR-1731
 URL: https://issues.apache.org/jira/browse/SOLR-1731
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Tim Underwood
Priority: Minor

 I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to 
 highlight for certain queries.  The error seems to be an issue with the 
 combination of the ShingleFilterFactory, PositionFilterFactory and the 
 LengthFilterFactory. 
 Here's my fieldType definition:
 fieldType name=textSku class=solr.TextField positionIncrementGap=100 
 omitNorms=true
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LengthFilterFactory min=2 max=100/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.ShingleFilterFactory maxShingleSize=8 
 outputUnigrams=true/
   filter class=solr.PositionFilterFactory /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.LengthFilterFactory min=2 max=100/ !-- works 
 if this is commented out --
 /analyzer
 /fieldType
 Here's the field definition:
 field name=sku_new type=textSku indexed=true stored=true 
 omitNorms=true/
 Here's a sample doc:
 add
 doc
   field name=id1/field
   field name=sku_newA 1280 C/field
 /doc
 /add
 Doing a query for sku_new:A 1280 C and requesting highlighting throws the 
 exception (full stack trace below):  
 http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=*
 If I comment out the LengthFilterFactory from my query analyzer section 
 everything seems to work.  Commenting out just the PositionFilterFactory also 
 makes the exception go away and seems to work for this specific query.
 Full stack trace:
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202)
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
 at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
 at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
 at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
 at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
 at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821

Upgrading Lucene jars

2010-01-17 Thread Koji Sekiguchi
I'd like to upgrade all Lucene jars to the latest 2.9 branch (r900222).
If there is no objections, I'll commit tomorrow.
Now I'm testing Lucene 2.9 branch and Solr trunk with latest 2.9.

Thank you,

Koji

-- 
http://www.rondhuit.com/en/



Re: Build failed in Hudson: Solr-trunk #1027

2010-01-10 Thread Koji Sekiguchi

http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/testReport/org.apache.solr.request/TestWriterPerf/testPerf/

The cause of this failure is undefined field t1 is set to hl.fl in the 
test code.

Before FastVectorHighlighter committed, it seems undefined fields
are ignored. I think I should ignore them in FVH, too. I'm look into it...

Koji

--
http://www.rondhuit.com/en/



Apache Hudson Server wrote:

See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/

--
[...truncated 2343 lines...]
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.49 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.574 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.977 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.506 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.618 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 17.669 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 33.972 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 39.944 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.917 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.375 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.43 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.488 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.507 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.768 sec
[junit] Running org.apache.solr.client.solrj.response.TermsResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.705 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.645 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.408 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.549 sec
[junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.48 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.431 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.682 sec
[junit] Running org.apache.solr.common.util.DOMUtilTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.565 sec
[junit] Running org.apache.solr.common.util.FileUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.433 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.446 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.396 sec
[junit] Running org.apache.solr.common.util.TestFastInputStream
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.547 sec
[junit] Running org.apache.solr.common.util.TestHash
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.698 sec
[junit] Running org.apache.solr.common.util.TestNamedListCodec
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.891 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.381 sec
[junit] Running org.apache.solr.core.AlternateDirectoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.552 sec
[junit] 

Re: Build failed in Hudson: Solr-trunk #1027

2010-01-10 Thread Koji Sekiguchi

Koji Sekiguchi wrote:
http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/testReport/org.apache.solr.request/TestWriterPerf/testPerf/ 



The cause of this failure is undefined field t1 is set to hl.fl in the 
test code.

Before FastVectorHighlighter committed, it seems undefined fields
are ignored. I think I should ignore them in FVH, too. I'm look into 
it...


Koji


Committed revision 897611.

Koji

--
http://www.rondhuit.com/en/



[jira] Updated: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-09 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1696:
-

Attachment: SOLR-1696.patch

A new patch attached. Just to sync with trunk plus warning log when deprecated 
syntax is found (the idea Chris mentioned above).

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch, SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2010-01-08 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798271#action_12798271
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

Thanks, Paul! I've just committed revision 897357.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-08 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1268.
--

Resolution: Fixed

Committed revision 897383.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-08 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798312#action_12798312
 ] 

Koji Sekiguchi commented on SOLR-1696:
--

I've just committed SOLR-1268. Now I'm trying to contribute a patch for this to 
sync with trunk...

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-07 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797841#action_12797841
 ] 

Koji Sekiguchi commented on SOLR-1696:
--

Noble, thank you for opening this and attaching the patch! Are you planning to 
commit this shortly? because I'm ready to commit SOLR-1268 that is using old 
style config. If you commit it, I'll rewrite SOLR-1268. Or I can assign 
SOLR-1696 to me.

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-04 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796147#action_12796147
 ] 

Koji Sekiguchi commented on SOLR-1268:
--

I'll commit in a few days if nobody objects.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-03 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796075#action_12796075
 ] 

Koji Sekiguchi commented on SOLR-1268:
--

I'm introducing fragListBuilder/ and fragmentsBuilder/ new sub tags of 
highlighting/ in solrconfig.xml in this patch, rather than 
searchComponent/. I think we can open a separate ticket for moving 
highlighting/ settings to searchComponent/, if needed.

FYI:
http://old.nabble.com/highlighting-setting-in-solrconfig.xml-td26984003.html

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-02 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268.patch

First draft, untested patch attached.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug

2009-12-19 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792920#action_12792920
 ] 

Koji Sekiguchi commented on SOLR-1670:
--

bq. the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
which does not really validate that two lists of token are equal, it just stops 
at the shorted one.

I agree with you regarding this part. But I'm not sure that the following 
size() should be 1 in your patch:

{code}
+assertEquals(1, getTokList(map,a b,false).size());
{code}

If what repeats implies is repeating same term intentionally, I think it can 
boost tf.

 synonymfilter/map repeat bug
 

 Key: SOLR-1670
 URL: https://issues.apache.org/jira/browse/SOLR-1670
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir
 Attachments: SOLR-1670_test.patch


 as part of converting tests for SOLR-1657, I ran into a problem with 
 synonymfilter
 the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
 which does not really validate that two lists of token are equal, it just 
 stops at the shorted one.
 {code}
 // repeats
 map.add(strings(a b), tokens(ab), orig, merge);
 map.add(strings(a b), tokens(ab), orig, merge);
 assertTokEqual(getTokList(map,a b,false), tokens(ab));
 /* in reality the result from getTokList is ab ab ab! */
 {code}
 when converted to assertTokenStreamContents this problem surfaced. attached 
 is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug

2009-12-19 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792928#action_12792928
 ] 

Koji Sekiguchi commented on SOLR-1670:
--

Robert, sorry, I wanted to say I agree with you regarding the test for 
'repeats' has a flaw. Then boost TF was just an input, though I don't know 
it is intentional feature or side effect.

Why don't you fix the flaws in SynonymFilter test in this ticket first, then 
fix SOLR-1674? (I've not looked into SOLR-1674 yet.)

 synonymfilter/map repeat bug
 

 Key: SOLR-1670
 URL: https://issues.apache.org/jira/browse/SOLR-1670
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir
 Attachments: SOLR-1670_test.patch


 as part of converting tests for SOLR-1657, I ran into a problem with 
 synonymfilter
 the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
 which does not really validate that two lists of token are equal, it just 
 stops at the shorted one.
 {code}
 // repeats
 map.add(strings(a b), tokens(ab), orig, merge);
 map.add(strings(a b), tokens(ab), orig, merge);
 assertTokEqual(getTokList(map,a b,false), tokens(ab));
 /* in reality the result from getTokList is ab ab ab! */
 {code}
 when converted to assertTokenStreamContents this problem surfaced. attached 
 is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1653) add PatternReplaceCharFilter

2009-12-15 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1653.
--

Resolution: Fixed

Committed revision 890798. Thanks Shalin and Noble for taking time to review 
the patch.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order 
of the groups|


 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:27 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the 
order of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order 
of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:29 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order 
of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order 
of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:28 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order 
of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the 
order of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:30 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc=1234=5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order 
of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order 
of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790127#action_12790127
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

bq. I guess this can be achieved with the matcher#replaceAll() directly 

You're right if we don't correct offset of the output char stream. I need to 
process one match at a time.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1653:
-

Attachment: SOLR-1653.patch

Excuse myself, because I tried to correct offset per group in a match when I 
started the first patch, I introduced my own syntax. But, yes, now I've 
implemented the offset correction per match, so I can use standard syntax. Here 
is the new patch.

Usage:
{code:title=schema.xml}
fieldType name=textCharNorm class=solr.TextField 
positionIncrementGap=100 
  analyzer
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([nN][oO]\.)\s*(\d+)
replaceWith=$1$2/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType
{code}

If there is no objections, I'll commit later today.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2009-12-14 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790572#action_12790572
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

I see that existing PatternReplaceFilter (not CharFilter) is using pattern. 
But it uses replacement, not replaceWith. I think I use pattern and 
replacement.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1653) add PatternReplaceCharFilter

2009-12-13 Thread Koji Sekiguchi (JIRA)
add PatternReplaceCharFilter


 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5


Add a new CharFilter that uses a regular expression for the target of replace 
string in char stream.

Usage:
{code:title=schema.xml}
fieldType name=textCharNorm class=solr.TextField 
positionIncrementGap=100 
  analyzer
charFilter class=solr.PatternReplaceCharFilterFactory
groupedPattern=([nN][oO]\.)\s*(\d+)
replaceGroups=1,2 blockDelimiters=:;/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter

2009-12-13 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1653:
-

Attachment: SOLR-1653.patch

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1653) add PatternReplaceCharFilter

2009-12-13 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-1653:


Assignee: Koji Sekiguchi

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2009-12-13 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789957#action_12789957
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

I'll commit in a few days.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Upgrading Lucene jars

2009-12-08 Thread Koji Sekiguchi

Shalin Shekhar Mangar wrote:

I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead and
upgrade all Lucene jars to the latest 2.9 branch code?

  

+1.

Koji

--
http://www.rondhuit.com/en/



[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-05 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786448#action_12786448
 ] 

Koji Sekiguchi commented on SOLR-1606:
--

Jason, I got a failure when running TestRefreshReader.

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1607) use a proper key other than IndexReader for ExternalFileField and QueryElevationCompenent to work properly when reopenReaders is set to true

2009-11-28 Thread Koji Sekiguchi (JIRA)
use a proper key other than IndexReader for ExternalFileField and 
QueryElevationCompenent to work properly when reopenReaders is set to true


 Key: SOLR-1607
 URL: https://issues.apache.org/jira/browse/SOLR-1607
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5


As introducing reopenReaders feature in 1.4, this prevent reload 
external_[fieldname] and elevate.xml files in dataDir when commit is submitted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)

2009-11-25 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1489:
-

Attachment: SOLR-1489.patch

Attached patch fixes the above failure, but I got another failure (no expires 
header):

{code}
Testcase: testCacheVetoHandler took 3.29 sec
Testcase: testCacheVetoException took 1.395 sec
FAILED
We got no Expires header
junit.framework.AssertionFailedError: We got no Expires header
at 
org.apache.solr.servlet.CacheHeaderTest.checkVetoHeaders(CacheHeaderTest.java:73)
at 
org.apache.solr.servlet.CacheHeaderTest.testCacheVetoException(CacheHeaderTest.java:59)

Testcase: testLastModified took 1.485 sec
Testcase: testEtag took 1.577 sec
Testcase: testCacheControl took 1.035 sec
{code}


 A UTF-8 character is output twice (Bug in Jetty)
 

 Key: SOLR-1489
 URL: https://issues.apache.org/jira/browse/SOLR-1489
 Project: Solr
  Issue Type: Bug
 Environment: Jetty-6.1.3
 Jetty-6.1.21
 Jetty-7.0.0RC6
Reporter: Jun Ohtani
Assignee: Koji Sekiguchi
Priority: Critical
 Attachments: error_utf8-example.xml, jetty-6.1.22.jar, 
 jetty-util-6.1.22.jar, jettybugsample.war, jsp-2.1.zip, 
 servlet-api-2.5-20081211.jar, SOLR-1489.patch


 A UTF-8 character is output twice under particular conditions.
 Attach the sample data.(error_utf8-example.xml)
 Registered only sample data, click the following URL.
 http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json
 Sample data is only B, but response is BB.
 When wt=phps, error occurs in PHP unsrialize() function.
 This bug is like a bug in Jetty.
 jettybugsample.war is the simplest one to reproduce the problem.
 Copy example/webapps, and start Jetty server, and click the following URL.
 http://localhost:8983/jettybugsample/filter/hoge
 Like earlier, B is output twice. Sysout only B once.
 I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6.
 (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in 
 web.xml. )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1601) Schema browser does not indicate presence of charFilter

2009-11-25 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1601:
-

  Component/s: Schema and Analysis
Affects Version/s: 1.4
Fix Version/s: 1.5
 Assignee: Koji Sekiguchi

 Schema browser does not indicate presence of charFilter
 ---

 Key: SOLR-1601
 URL: https://issues.apache.org/jira/browse/SOLR-1601
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Jake Brownell
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.5


 My schema has a field defined as:
 {noformat}
 fieldType name=text class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 analyzer type=query
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /  
  
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 /fieldType
 {noformat}
 and when I view the field in the schema browser, I see:
 {noformat}
 Tokenized:  true
 Class Name:  org.apache.solr.schema.TextField
 Index Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 1 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 Query Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt 
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 {noformat}
 It's not a big deal, but I expected to see some indication of the charFilter 
 that is in place.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1601) Schema browser does not indicate presence of charFilter

2009-11-25 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1601:
-

Attachment: SOLR-1601.patch

Will commit shortly.

 Schema browser does not indicate presence of charFilter
 ---

 Key: SOLR-1601
 URL: https://issues.apache.org/jira/browse/SOLR-1601
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Jake Brownell
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1601.patch


 My schema has a field defined as:
 {noformat}
 fieldType name=text class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 analyzer type=query
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /  
  
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 /fieldType
 {noformat}
 and when I view the field in the schema browser, I see:
 {noformat}
 Tokenized:  true
 Class Name:  org.apache.solr.schema.TextField
 Index Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 1 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 Query Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt 
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 {noformat}
 It's not a big deal, but I expected to see some indication of the charFilter 
 that is in place.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1601) Schema browser does not indicate presence of charFilter

2009-11-25 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1601.
--

Resolution: Fixed

Committed revision 884180. Thanks, Jake.

 Schema browser does not indicate presence of charFilter
 ---

 Key: SOLR-1601
 URL: https://issues.apache.org/jira/browse/SOLR-1601
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Jake Brownell
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1601.patch


 My schema has a field defined as:
 {noformat}
 fieldType name=text class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 analyzer type=query
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /  
  
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 /fieldType
 {noformat}
 and when I view the field in the schema browser, I see:
 {noformat}
 Tokenized:  true
 Class Name:  org.apache.solr.schema.TextField
 Index Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 1 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 Query Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt 
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 {noformat}
 It's not a big deal, but I expected to see some indication of the charFilter 
 that is in place.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)

2009-11-24 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782335#action_12782335
 ] 

Koji Sekiguchi commented on SOLR-1489:
--

Thanks, Ohtani-san.

Using these new jetty jars (6.1.22), I run ant test, but I got a failure:

{code:title=TEST-org.apache.solr.servlet.CacheHeaderTest.txt}
Testcase: testCacheVetoHandler took 2.469 sec
Testcase: testCacheVetoException took 1.25 sec
FAILED
null expected:[no-cache, ]no-store but 
was:[must-revalidate,no-cache,]no-store
junit.framework.ComparisonFailure: null expected:[no-cache, ]no-store but 
was:[must-revalidate,no-cache,]no-store
at 
org.apache.solr.servlet.CacheHeaderTest.checkVetoHeaders(CacheHeaderTest.java:65)
at 
org.apache.solr.servlet.CacheHeaderTest.testCacheVetoException(CacheHeaderTest.java:59)

Testcase: testLastModified took 1.188 sec
Testcase: testEtag took 1.11 sec
Testcase: testCacheControl took 1.391 sec
{code}

According to SOLR-632, the cache header related test was failed when we used 
jetty-6.1.11, Lars filed https://jira.codehaus.org/browse/JETTY-646. Now the 
issue has been fixed, I thought jetty-6.1.22 should work. I've not looked into 
the details of cache header test, though.

 A UTF-8 character is output twice (Bug in Jetty)
 

 Key: SOLR-1489
 URL: https://issues.apache.org/jira/browse/SOLR-1489
 Project: Solr
  Issue Type: Bug
 Environment: Jetty-6.1.3
 Jetty-6.1.21
 Jetty-7.0.0RC6
Reporter: Jun Ohtani
Assignee: Koji Sekiguchi
Priority: Critical
 Attachments: error_utf8-example.xml, jetty-6.1.22.jar, 
 jetty-util-6.1.22.jar, jettybugsample.war, jsp-2.1.zip, 
 servlet-api-2.5-20081211.jar


 A UTF-8 character is output twice under particular conditions.
 Attach the sample data.(error_utf8-example.xml)
 Registered only sample data, click the following URL.
 http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json
 Sample data is only B, but response is BB.
 When wt=phps, error occurs in PHP unsrialize() function.
 This bug is like a bug in Jetty.
 jettybugsample.war is the simplest one to reproduce the problem.
 Copy example/webapps, and start Jetty server, and click the following URL.
 http://localhost:8983/jettybugsample/filter/hoge
 Like earlier, B is output twice. Sysout only B once.
 I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6.
 (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in 
 web.xml. )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)

2009-11-18 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12779814#action_12779814
 ] 

Koji Sekiguchi commented on SOLR-1489:
--

Ok, http://jira.codehaus.org/browse/JETTY-1122 has been marked as fixed and 
jetty 6.1.22 released. Ohtani-san, can you test the new jetty with your test 
case to see the bug is gone? Thanks.

 A UTF-8 character is output twice (Bug in Jetty)
 

 Key: SOLR-1489
 URL: https://issues.apache.org/jira/browse/SOLR-1489
 Project: Solr
  Issue Type: Bug
 Environment: Jetty-6.1.3
 Jetty-6.1.21
 Jetty-7.0.0RC6
Reporter: Jun Ohtani
Assignee: Koji Sekiguchi
Priority: Critical
 Attachments: error_utf8-example.xml, jettybugsample.war


 A UTF-8 character is output twice under particular conditions.
 Attach the sample data.(error_utf8-example.xml)
 Registered only sample data, click the following URL.
 http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json
 Sample data is only B, but response is BB.
 When wt=phps, error occurs in PHP unsrialize() function.
 This bug is like a bug in Jetty.
 jettybugsample.war is the simplest one to reproduce the problem.
 Copy example/webapps, and start Jetty server, and click the following URL.
 http://localhost:8983/jettybugsample/filter/hoge
 Like earlier, B is output twice. Sysout only B once.
 I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6.
 (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in 
 web.xml. )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1506) Search multiple cores using MultiReader

2009-11-03 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12773213#action_12773213
 ] 

Koji Sekiguchi commented on SOLR-1506:
--

bq. Commit doesn't work because reopen isn't supported by MultiReader.

Regarding MultiReader and reopen, I've set reopenReaders to false:

{code:title=solrconfig.xml}
reopenReadersfalse/reopenReaders
  :
indexReaderFactory name=IndexReaderFactory 
class=mypackage.MultiReaderFactory/
{code}


 Search multiple cores using MultiReader
 ---

 Key: SOLR-1506
 URL: https://issues.apache.org/jira/browse/SOLR-1506
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1506.patch, SOLR-1506.patch


 I need to search over multiple cores, and SOLR-1477 is more
 complicated than expected, so here we'll create a MultiReader
 over the cores to allow searching on them.
 Maybe in the future we can add parallel searching however
 SOLR-1477, if it gets completed, provides that out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer

2009-10-24 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12769741#action_12769741
 ] 

Koji Sekiguchi commented on SOLR-822:
-

bq. Please update the Wiki for this feature. 

Done. :)

 CharFilter - normalize characters before tokenizer
 --

 Key: SOLR-822
 URL: https://issues.apache.org/jira/browse/SOLR-822
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4

 Attachments: character-normalization.JPG, 
 japanese-h-to-k-mapping.txt, sample_mapping_ja.txt, sample_mapping_ja.txt, 
 SOLR-822-for-1.3.patch, SOLR-822-renameMethod.patch, SOLR-822.patch, 
 SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch


 A new plugin which can be placed in front of tokenizer/.
 {code:xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping_ja.txt /
 tokenizer class=solr.MappingCJKTokenizerFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 {code}
 charFilter/ can be multiple (chained). I'll post a JPEG file to show 
 character normalization sample soon.
 MOTIVATION:
 In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and 
 Morphological Analyzer.
 When we use morphological analyzer, because the analyzer uses Japanese 
 dictionary to detect terms,
 we need to normalize characters before tokenization.
 I'll post a patch soon, too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-561) Solr replication by Solr (for windows also)

2009-10-23 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-561:


Component/s: (was: replication (scripts))
 replication (java)

change component from scripts to java

 Solr replication by Solr (for windows also)
 ---

 Key: SOLR-561
 URL: https://issues.apache.org/jira/browse/SOLR-561
 Project: Solr
  Issue Type: New Feature
  Components: replication (java)
Affects Versions: 1.4
 Environment: All
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4

 Attachments: deletion_policy.patch, SOLR-561-core.patch, 
 SOLR-561-fixes.patch, SOLR-561-fixes.patch, SOLR-561-fixes.patch, 
 SOLR-561-full.patch, SOLR-561-full.patch, SOLR-561-full.patch, 
 SOLR-561-full.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, SOLR-561.patch, 
 SOLR-561.patch, SOLR-561.patch, SOLR-561.patch


 The current replication strategy in solr involves shell scripts . The 
 following are the drawbacks with the approach
 *  It does not work with windows
 * Replication works as a separate piece not integrated with solr.
 * Cannot control replication from solr admin/JMX
 * Each operation requires manual telnet to the host
 Doing the replication in java has the following advantages
 * Platform independence
 * Manual steps can be completely eliminated. Everything can be driven from 
 solrconfig.xml .
 ** Adding the url of the master in the slaves should be good enough to enable 
 replication. Other things like frequency of
 snapshoot/snappull can also be configured . All other information can be 
 automatically obtained.
 * Start/stop can be triggered from solr/admin or JMX
 * Can get the status/progress while replication is going on. It can also 
 abort an ongoing replication
 * No need to have a login into the machine 
 * From a development perspective, we can unit test it
 This issue can track the implementation of solr replication in java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-551) Solr replication should include the schema also

2009-10-23 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-551:


Component/s: (was: replication (scripts))
 replication (java)

change component from scripts to java

 Solr replication should include the schema also
 ---

 Key: SOLR-551
 URL: https://issues.apache.org/jira/browse/SOLR-551
 Project: Solr
  Issue Type: Improvement
  Components: replication (java)
Affects Versions: 1.4
Reporter: Noble Paul
Assignee: Shalin Shekhar Mangar
 Fix For: 1.4


 The current Solr replication just copy the data directory . So if the
 schema changes and I do a re-index it will blissfully copy the index
 and the slaves will fail because of incompatible schema.
 So the steps we follow are
  * Stop rsync on slaves
  * Update the master with new schema
  * re-index data
  * forEach slave
  ** Kill the slave
  ** clean the data directory
  ** install the new schema
  ** restart
  ** do a manual snappull
 The amount of work the admin needs to do is quite significant
 (depending on the no:of slaves). These are manual steps and very error
 prone
 The solution :
 Make the replication mechanism handle the schema replication also. So
 all I need to do is to just change the master and the slaves synch
 automatically
 What is a good way to implement this?
 We have an idea along the following lines
 This should involve changes to the snapshooter and snappuller scripts
 and the snapinstaller components
 Everytime the snapshooter takes a snapshot it must keep the timestamps
 of schema.xml and elevate.xml (all the files which might affect the
 runtime behavior in slaves)
 For subsequent snapshots if the timestamps of any of them is changed
 it must copy the all of them also for replication.
 The snappuller copies the new directory as usual
 The snapinstaller checks if these config files are present ,
 if yes,
  * It can create a temporary core
  * install the changed index and configuration
  * load it completely and swap it out with the original core

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1099) FieldAnalysisRequestHandler

2009-10-20 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1099.
--

Resolution: Fixed

Committed revision 827032. Thanks.

 FieldAnalysisRequestHandler
 ---

 Key: SOLR-1099
 URL: https://issues.apache.org/jira/browse/SOLR-1099
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Uri Boness
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: AnalisysRequestHandler_refactored.patch, 
 analysis_request_handlers_incl_solrj.patch, 
 AnalysisRequestHandler_refactored1.patch, 
 FieldAnalysisRequestHandler_incl_test.patch, 
 SOLR-1099-ordered-TokenizerChain.patch, SOLR-1099.patch, SOLR-1099.patch, 
 SOLR-1099.patch


 The FieldAnalysisRequestHandler provides the analysis functionality of the 
 web admin page as a service. This handler accepts a filetype/fieldname 
 parameter and a value and as a response returns a breakdown of the analysis 
 process. It is also possible to send a query value which will use the 
 configured query analyzer as well as a showmatch parameter which will then 
 mark every matched token as a match.
 If this handler is added to the code base, I also recommend to rename the 
 current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
 them both inherit from one AnalysisRequestHandlerBase class which provides 
 the common functionality of the analysis breakdown and its translation to 
 named lists. This will also enhance the current AnalysisRequestHandler which 
 right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (SOLR-1099) FieldAnalysisRequestHandler

2009-10-19 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reopened SOLR-1099:
--

  Assignee: Koji Sekiguchi  (was: Shalin Shekhar Mangar)

Hmm, I think the order of Tokenizer/TokenFilters in response is unconsidered. 
For example, I cannot take out Tokenizer/TokenFilters from ruby response in 
order...

 FieldAnalysisRequestHandler
 ---

 Key: SOLR-1099
 URL: https://issues.apache.org/jira/browse/SOLR-1099
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Uri Boness
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: AnalisysRequestHandler_refactored.patch, 
 analysis_request_handlers_incl_solrj.patch, 
 AnalysisRequestHandler_refactored1.patch, 
 FieldAnalysisRequestHandler_incl_test.patch, SOLR-1099.patch, 
 SOLR-1099.patch, SOLR-1099.patch


 The FieldAnalysisRequestHandler provides the analysis functionality of the 
 web admin page as a service. This handler accepts a filetype/fieldname 
 parameter and a value and as a response returns a breakdown of the analysis 
 process. It is also possible to send a query value which will use the 
 configured query analyzer as well as a showmatch parameter which will then 
 mark every matched token as a match.
 If this handler is added to the code base, I also recommend to rename the 
 current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
 them both inherit from one AnalysisRequestHandlerBase class which provides 
 the common functionality of the analysis breakdown and its translation to 
 named lists. This will also enhance the current AnalysisRequestHandler which 
 right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1099) FieldAnalysisRequestHandler

2009-10-19 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1099:
-

Attachment: SOLR-1099-ordered-TokenizerChain.patch

I'd like to use NamedList rather than SimpleOrderedMap. If there is no 
objections, I'll commit soon. All tests pass.

 FieldAnalysisRequestHandler
 ---

 Key: SOLR-1099
 URL: https://issues.apache.org/jira/browse/SOLR-1099
 Project: Solr
  Issue Type: New Feature
  Components: Analysis
Affects Versions: 1.3
Reporter: Uri Boness
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: AnalisysRequestHandler_refactored.patch, 
 analysis_request_handlers_incl_solrj.patch, 
 AnalysisRequestHandler_refactored1.patch, 
 FieldAnalysisRequestHandler_incl_test.patch, 
 SOLR-1099-ordered-TokenizerChain.patch, SOLR-1099.patch, SOLR-1099.patch, 
 SOLR-1099.patch


 The FieldAnalysisRequestHandler provides the analysis functionality of the 
 web admin page as a service. This handler accepts a filetype/fieldname 
 parameter and a value and as a response returns a breakdown of the analysis 
 process. It is also possible to send a query value which will use the 
 configured query analyzer as well as a showmatch parameter which will then 
 mark every matched token as a match.
 If this handler is added to the code base, I also recommend to rename the 
 current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
 them both inherit from one AnalysisRequestHandlerBase class which provides 
 the common functionality of the analysis breakdown and its translation to 
 named lists. This will also enhance the current AnalysisRequestHandler which 
 right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1515) Javadoc typo in SolrQueryResponse

2009-10-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1515:
-

Fix Version/s: (was: 1.5)
   1.4

 Javadoc typo in SolrQueryResponse
 -

 Key: SOLR-1515
 URL: https://issues.apache.org/jira/browse/SOLR-1515
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
 Environment: my local MacBook pro
Reporter: Chris A. Mattmann
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1515.101709.Mattmann.patch.txt


 There is a minute typo in the javadoc for 
 o.a.s.request.SolrQueryResponse.java. This patch fixes that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1515) Javadoc typo in SolrQueryResponse

2009-10-17 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1515.
--

Resolution: Fixed

Committed revision 826321. Thanks.

 Javadoc typo in SolrQueryResponse
 -

 Key: SOLR-1515
 URL: https://issues.apache.org/jira/browse/SOLR-1515
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
 Environment: my local MacBook pro
Reporter: Chris A. Mattmann
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1515.101709.Mattmann.patch.txt


 There is a minute typo in the javadoc for 
 o.a.s.request.SolrQueryResponse.java. This patch fixes that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Code Freeze, Release Process, etc.

2009-10-13 Thread Koji Sekiguchi

Grant Ingersoll wrote:

OK, so we are in code freeze right now.

I'm going to follow the Release process at 
http://wiki.apache.org/solr/HowToRelease


I will put up an RC now, then people can try it out, etc.  I would 
then like to have a goal of putting up an official set of artifacts to 
be voted on next Monday.


In the interim, we should review docs, etc. and update the wiki where 
possible.


How does that sound?

-Grant


Sounds great!

Koji

--
http://www.rondhuit.com/en/



Re: 1.4.0 RC

2009-10-13 Thread Koji Sekiguchi

Yonik Seeley wrote:

On Tue, Oct 13, 2009 at 8:12 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
  

On Tue, Oct 13, 2009 at 8:03 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:


: http://people.apache.org/~gsingers/solr/1.4.0-RC/

I suspect we're going to want to wait for Lucene 2.9.1 - particularly
because of LUCENE-1974.
  

I know I was lobbying for not using non-released versions of Lucene
due to the increase in flux, but I really meant non-bugfix branches.
Seems safe to use an unreleased 2.9.1 branch?



If there are no objections, I'll update to the fixed 2.9.1 branch.
We can figure out whether to wait for 2.9.1 or not later when we know
the schedule.

  

+1 to update to the fixed 2.9 branch and proceed to release RC.

Koji

--
http://www.rondhuit.com/en/



Re: rollback and cumulative_add

2009-10-12 Thread Koji Sekiguchi
Koji Sekiguchi wrote:
 Hello,

 I found that rollback resets adds and docsPending count,
 but doesn't reset cumulative_adds.

 $ cd example/exampledocs
 # comment out the line of commit/ so avoid committing in post.sh
 $ ./post.sh *.xml
 = docsPending=19, adds=19, cumulative_adds=19

 # do rollback
 $ curl http://localhost:8983/solr/update?rollback=true
 = rollbacks=1, docsPending=0, adds=0, cumulative_adds=19

 Is this correct behavior?

 Koji

   
(forwarded dev list)

I think this is a bug that was introduced by me when I contributed
the first patch for the rollback and the bug was inherited by
the successive patches. I'll reopen SOLR-670 and attach the fix soon:

https://issues.apache.org/jira/browse/SOLR-670

Koji
-- 

http://www.rondhuit.com/




[jira] Updated: (SOLR-670) UpdateHandler must provide a rollback feature

2009-10-12 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-670:


Attachment: SOLR-670-revert-cumulative-counts.patch

The fix and test case. I'll commit soon.

 UpdateHandler must provide a rollback feature
 -

 Key: SOLR-670
 URL: https://issues.apache.org/jira/browse/SOLR-670
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: SOLR-670-revert-cumulative-counts.patch, SOLR-670.patch, 
 SOLR-670.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch


 Lucene IndexWriter already has a rollback method. There should be a 
 counterpart for the same in _UpdateHandler_  so that users can do a rollback 
 over http 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-670) UpdateHandler must provide a rollback feature

2009-10-12 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-670.
-

Resolution: Fixed

Committed revision 824380.

 UpdateHandler must provide a rollback feature
 -

 Key: SOLR-670
 URL: https://issues.apache.org/jira/browse/SOLR-670
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: SOLR-670-revert-cumulative-counts.patch, SOLR-670.patch, 
 SOLR-670.patch, SOLR-670.patch, SOLR-670.patch, SOLR-670.patch


 Lucene IndexWriter already has a rollback method. There should be a 
 counterpart for the same in _UpdateHandler_  so that users can do a rollback 
 over http 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1504) empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.

2009-10-11 Thread Koji Sekiguchi (JIRA)
empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and 
co.
---

 Key: SOLR-1504
 URL: https://issues.apache.org/jira/browse/SOLR-1504
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4


If you have the following mapping rule in mapping.txt:

{code}
# destination can be empty
NULL = 
{code}

you can get AIOOBE by specifying NULL for either index or query data in the 
input form of analysis.jsp (and co. i.e. DocumentAnalysisRequestHandler and 
FieldAnalysisRequestHandler).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1504) empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.

2009-10-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1504:
-

Attachment: SOLR-1504.patch

A patch for the fix. Will commit soon.

 empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp 
 and co.
 ---

 Key: SOLR-1504
 URL: https://issues.apache.org/jira/browse/SOLR-1504
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1504.patch


 If you have the following mapping rule in mapping.txt:
 {code}
 # destination can be empty
 NULL = 
 {code}
 you can get AIOOBE by specifying NULL for either index or query data in the 
 input form of analysis.jsp (and co. i.e. DocumentAnalysisRequestHandler and 
 FieldAnalysisRequestHandler).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1504) empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp and co.

2009-10-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1504.
--

Resolution: Fixed

Committed revision 824045.

 empty char mapping can cause ArrayIndexOutOfBoundsException in analysis.jsp 
 and co.
 ---

 Key: SOLR-1504
 URL: https://issues.apache.org/jira/browse/SOLR-1504
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1504.patch


 If you have the following mapping rule in mapping.txt:
 {code}
 # destination can be empty
 NULL = 
 {code}
 you can get AIOOBE by specifying NULL for either index or query data in the 
 input form of analysis.jsp (and co. i.e. DocumentAnalysisRequestHandler and 
 FieldAnalysisRequestHandler).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Down to 5

2009-10-09 Thread Koji Sekiguchi

Hi Shalin,

 What about FastVectorHighlighter?
 https://issues.apache.org/jira/browse/SOLR-1268

If we're targeting RC in this week, I'd like to push it to 1.5
because there is no patches. But perhaps you think
13 votes is considerable?

Koji




[jira] Assigned: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2009-10-09 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-1268:


Assignee: Koji Sekiguchi

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2009-10-09 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Fix Version/s: 1.5

Mark it to 1.5 because there is no patches.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Down to 5

2009-10-04 Thread Koji Sekiguchi

+1.

Grant Ingersoll wrote:
Coming along:  
https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true 



If we can finish these up this week, I can generate RCs next week.

Thoughts?

-Grant





[jira] Commented: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)

2009-10-03 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12761900#action_12761900
 ] 

Koji Sekiguchi commented on SOLR-1489:
--

Good catch, Otani-san! I can reproduce the problem with the data and the filter 
you attached when running it on Jetty. And thank you for opening the JIRA 
ticket in Jetty.
Now we are closing to releasing 1.4, I don't want this to be a blocker because 
this is not a Solr bug as you said. You can run Solr on arbitrary servlet 
containers other than Jetty if you'd like.
I'd like to keep this opening, and watching  
http://jira.codehaus.org/browse/JETTY-1122 . Thanks.

 A UTF-8 character is output twice (Bug in Jetty)
 

 Key: SOLR-1489
 URL: https://issues.apache.org/jira/browse/SOLR-1489
 Project: Solr
  Issue Type: Bug
 Environment: Jetty-6.1.3
 Jetty-6.1.21
 Jetty-7.0.0RC6
Reporter: Jun Ohtani
Priority: Critical
 Attachments: error_utf8-example.xml, jettybugsample.war


 A UTF-8 character is output twice under particular conditions.
 Attach the sample data.(error_utf8-example.xml)
 Registered only sample data, click the following URL.
 http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json
 Sample data is only B, but response is BB.
 When wt=phps, error occurs in PHP unsrialize() function.
 This bug is like a bug in Jetty.
 jettybugsample.war is the simplest one to reproduce the problem.
 Copy example/webapps, and start Jetty server, and click the following URL.
 http://localhost:8983/jettybugsample/filter/hoge
 Like earlier, B is output twice. Sysout only B once.
 I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6.
 (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in 
 web.xml. )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)

2009-10-03 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-1489:


Assignee: Koji Sekiguchi

 A UTF-8 character is output twice (Bug in Jetty)
 

 Key: SOLR-1489
 URL: https://issues.apache.org/jira/browse/SOLR-1489
 Project: Solr
  Issue Type: Bug
 Environment: Jetty-6.1.3
 Jetty-6.1.21
 Jetty-7.0.0RC6
Reporter: Jun Ohtani
Assignee: Koji Sekiguchi
Priority: Critical
 Attachments: error_utf8-example.xml, jettybugsample.war


 A UTF-8 character is output twice under particular conditions.
 Attach the sample data.(error_utf8-example.xml)
 Registered only sample data, click the following URL.
 http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json
 Sample data is only B, but response is BB.
 When wt=phps, error occurs in PHP unsrialize() function.
 This bug is like a bug in Jetty.
 jettybugsample.war is the simplest one to reproduce the problem.
 Copy example/webapps, and start Jetty server, and click the following URL.
 http://localhost:8983/jettybugsample/filter/hoge
 Like earlier, B is output twice. Sysout only B once.
 I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6.
 (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in 
 web.xml. )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1481) phps writer ignores omitHeader parameter

2009-10-01 Thread Koji Sekiguchi (JIRA)
phps writer ignores omitHeader parameter


 Key: SOLR-1481
 URL: https://issues.apache.org/jira/browse/SOLR-1481
 Project: Solr
  Issue Type: Bug
  Components: search
Reporter: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.4


My co-worker found this one. I'm expecting a patch will be attached soon by 
him. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: svn commit: r819314 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java src/test/org/apache/solr/highlight/HighlighterTest.java

2009-09-27 Thread Koji Sekiguchi

 Also make both options default to true.

If so, isn't this line (from HighlightComponent) needed to be
also true by default?

   boolean rewrite = 
!(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER)) 
 
Boolean.valueOf(req.getParams().get(HighlightParams.HIGHLIGHT_MULTI_TERM)));


I think MultiTermQueries are converted to ConstantScoreQuery
by rewrite?

Koji


markrmil...@apache.org wrote:

Author: markrmiller
Date: Sun Sep 27 13:58:30 2009
New Revision: 819314

URL: http://svn.apache.org/viewvc?rev=819314view=rev
Log:
SOLR-1221: Change Solr Highlighting to use the SpanScorer with MultiTerm 
expansion by default

Modified:
lucene/solr/trunk/CHANGES.txt

lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java
lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java

Modified: lucene/solr/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt?rev=819314r1=819313r2=819314view=diff
==
--- lucene/solr/trunk/CHANGES.txt (original)
+++ lucene/solr/trunk/CHANGES.txt Sun Sep 27 13:58:30 2009
@@ -503,8 +503,8 @@
 45. SOLR-1078: Fixes to WordDelimiterFilter to avoid splitting or dropping
 international non-letter characters such as non spacing marks. (yonik)
 
-46. SOLR-825: Enables highlighting for range/wildcard/fuzzy/prefix queries if using hl.usePhraseHighlighter=true

-and hl.highlightMultiTerm=true.  (Mark Miller)
+46. SOLR-825, SOLR-1221: Enables highlighting for range/wildcard/fuzzy/prefix 
queries if using hl.usePhraseHighlighter=true
+and hl.highlightMultiTerm=true. Also make both options default to true. 
(Mark Miller)
 
 47. SOLR-1174: Fix Logging admin form submit url for multicore. (Jacob Singh via shalin)
 


Modified: 
lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java
URL: 
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java?rev=819314r1=819313r2=819314view=diff
==
--- 
lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java
 (original)
+++ 
lucene/solr/trunk/src/java/org/apache/solr/highlight/DefaultSolrHighlighter.java
 Sun Sep 27 13:58:30 2009
@@ -144,7 +144,7 @@
*/
   private QueryScorer getSpanQueryScorer(Query query, String fieldName, 
TokenStream tokenStream, SolrQueryRequest request) throws IOException {
 boolean reqFieldMatch = request.getParams().getFieldBool(fieldName, 
HighlightParams.FIELD_MATCH, false);
-Boolean highlightMultiTerm = 
request.getParams().getBool(HighlightParams.HIGHLIGHT_MULTI_TERM);
+Boolean highlightMultiTerm = 
request.getParams().getBool(HighlightParams.HIGHLIGHT_MULTI_TERM, true);
 if(highlightMultiTerm == null) {
   highlightMultiTerm = false;
 }
@@ -306,8 +306,9 @@
 }
  
 Highlighter highlighter;

-if 
(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER))) {
-  // wrap CachingTokenFilter around TokenStream for reuse
+if 
(Boolean.valueOf(req.getParams().get(HighlightParams.USE_PHRASE_HIGHLIGHTER, 
true))) {
+  // TODO: this is not always necessary - eventually we would like 
to avoid this wrap
+  //   when it is not needed.
   tstream = new CachingTokenFilter(tstream);
   
   // get highlighter


Modified: 
lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java
URL: 
http://svn.apache.org/viewvc/lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java?rev=819314r1=819313r2=819314view=diff
==
--- lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java 
(original)
+++ lucene/solr/trunk/src/test/org/apache/solr/highlight/HighlighterTest.java 
Sun Sep 27 13:58:30 2009
@@ -585,6 +585,7 @@
 args.put(hl.fl, t_text);
 args.put(hl.fragsize, 40);
 args.put(hl.snippets, 10);
+args.put(hl.usePhraseHighlighter, false);
 
 TestHarness.LocalRequestFactory sumLRF = h.getRequestFactory(

   standard, 0, 200, args);



  




[jira] Resolved: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others

2009-09-18 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1423.
--

Resolution: Fixed

Committed revision 816502. Thanks, Uwe!

 Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream  
 others
 

 Key: SOLR-1423
 URL: https://issues.apache.org/jira/browse/SOLR-1423
 Project: Solr
  Issue Type: Task
  Components: Analysis
Affects Versions: 1.4
Reporter: Uwe Schindler
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: SOLR-1423-FieldType.patch, 
 SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, 
 SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, 
 SOLR-1423.patch


 Because of some backwards compatibility problems (LUCENE-1906) we changed the 
 CharStream/CharFilter API a little bit. Tokenizer now only has a input field 
 of type java.io.Reader (as before the CharStream code). To correct offsets, 
 it is now needed to call the Tokenizer.correctOffset(int) method, which 
 delegates to the CharStream (if input is subclass of CharStream), else 
 returns an uncorrected offset. Normally it is enough to change all occurences 
 of input.correctOffset() to this.correctOffset() in Tokenizers. It should 
 also be checked, if custom Tokenizers in Solr do correct their offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream others

2009-09-17 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756923#action_12756923
 ] 

Koji Sekiguchi commented on SOLR-1423:
--

The patch looks good! Will commit shortly.

 Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream  
 others
 

 Key: SOLR-1423
 URL: https://issues.apache.org/jira/browse/SOLR-1423
 Project: Solr
  Issue Type: Task
  Components: Analysis
Affects Versions: 1.4
Reporter: Uwe Schindler
Assignee: Koji Sekiguchi
 Fix For: 1.4

 Attachments: SOLR-1423-FieldType.patch, 
 SOLR-1423-fix-empty-tokens.patch, SOLR-1423-fix-empty-tokens.patch, 
 SOLR-1423-with-empty-tokens.patch, SOLR-1423.patch, SOLR-1423.patch, 
 SOLR-1423.patch


 Because of some backwards compatibility problems (LUCENE-1906) we changed the 
 CharStream/CharFilter API a little bit. Tokenizer now only has a input field 
 of type java.io.Reader (as before the CharStream code). To correct offsets, 
 it is now needed to call the Tokenizer.correctOffset(int) method, which 
 delegates to the CharStream (if input is subclass of CharStream), else 
 returns an uncorrected offset. Normally it is enough to change all occurences 
 of input.correctOffset() to this.correctOffset() in Tokenizers. It should 
 also be checked, if custom Tokenizers in Solr do correct their offsets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   >