from:"Koji Sekiguchi"

[
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Koji Sekiguchi updated SOLR-1773:
-

Attachment: SOLR-1773.patch

The first draft, untested patch. Use for PoC only. In this patch, I hard-coded
sort field by using java.util.Comparator.

Field Collapsing (lightweight version)
--

I'd like to start another approach for field collapsing suggested by Yonik on
19/Dec/09 at SOLR-236. Re-posting the idea:
{code}
=== two pass collapsing algorithm for collapse.aggregate=max

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1773) Field Collapsing (lightweight version)

[
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495
]

Koji Sekiguchi commented on SOLR-1773:
--

Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported

supported parameters:

Field Collapsing (lightweight version)
--

I'd like to start another approach for field collapsing suggested by Yonik on
19/Dec/09 at SOLR-236. Re-posting the idea:
{code}
=== two pass collapsing algorithm for collapse.aggregate=max

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1773) Field Collapsing (lightweight version)


 [ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1773:
-

Attachment: LOADTEST.patch

A very rough/simple load test patch attached.

QTime average of 1,000 times random queries were:

||num docs in index||SOLR-236||SOLR-1773||
|1M|321 ms|185ms|
|10M|2,914 ms (*)|1,642 ms|

(*) I needed to set -Xmx1024m in this case, though 512m for other cases, to 
avoid OOM.

SOLR-1773 is 43% faster.

 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)


[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495
 ] 

Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 4:51 AM:
---

Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported to specify sort criteria in 
collapse group

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|


  was (Author: koji):
Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|

  
 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1773) Field Collapsing (lightweight version)


[ 
https://issues.apache.org/jira/browse/SOLR-1773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833495#action_12833495
 ] 

Koji Sekiguchi edited comment on SOLR-1773 at 2/14/10 4:54 AM:
---

Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported to specify sort criteria in 
collapse group

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group. default is 0.|
|collapse.fl|comma- or space- delimited list of fields to return. multiValued 
field and TrieField are not supported yet|


  was (Author: koji):
Random comment on the patch:

- TimeAllowed not supported
- cache not supported
- distributed search is not supported
- sort field is hard-coded in the patch
- collapse.type=adjacent is not supported
- collapse.aggregate is not supported (but supportable)
- not yet, but collapse.sort can be supported to specify sort criteria in 
collapse group

supported parameters:

|collapse|set to on to use field collapsing|
|collapse.field|field name to collapse (required)|
|collapse.limit|maximum number of collapsed docs to return in each collapse 
group|
|collapse.fl|comma- or space- delimited list of fields to return|

  
 Field Collapsing (lightweight version)
 --

 Key: SOLR-1773
 URL: https://issues.apache.org/jira/browse/SOLR-1773
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Attachments: LOADTEST.patch, SOLR-1773.patch


 I'd like to start another approach for field collapsing suggested by Yonik on 
 19/Dec/09 at SOLR-236. Re-posting the idea:
 {code}
 === two pass collapsing algorithm for collapse.aggregate=max 
 
 First pass: pretend that collapseCount=1
   - Use a TreeSet as  a priority queue since one can remove and insert 
 entries.
   - A HashMapKey,TreeSetEntry will be used to map from collapse group to 
 top entry in the TreeSet
   - compare new doc with smallest element in treeset.  If smaller discard and 
 go to the next doc.
   - If new doc is bigger, look up it's group.  Use the Map to find if the 
 group has been added to the TreeSet and add it if not.
   - If the new bigger doc is already in the TreeSet, compare with the 
 document in that group.  If bigger, update the node,
 remove and re-add to the TreeSet to re-sort.
 efficiency: the treeset and hashmap are both only the size of the top number 
 of docs we are looking at (10 for instance)
 We will now have the top 10 documents collapsed by the right field with a 
 collapseCount of 1.  Put another way, we have the top 10 groups.
 Second pass (if collapseCount1):
  - create a priority queue for each group (10) of size collapseCount
  - re-execute the query (or if the sort within the collapse groups does not 
 involve score, we could just use the docids gathered during phase 1)
  - for each document, find it's appropriate priority queue and insert
  - optimization: we can use the previous info from phase1 to even avoid 
 creating a priority queue if no other items matched.
 So instead of creating collapse groups for every group in the set (as is done 
 now?), we create it for only 10 groups.
 Instead of collecting the score for every document in the set (40MB per 
 request for a 10M doc index is *big*) we re-execute the query if needed.
 We could optionally store the score as is done now... but I bet aggregate 
 throughput on large indexes would be better by just re-executing.
 Other thought: we could also cache the first phase in the query cache which 
 would allow one to quickly move to the 2nd phase for any collapseCount.
 {code}
 The restriction is:
 {quote}
 one would not be able to tell the total number of collapsed docs, or the 
 total number of hits (or the DocSet) after collapsing. So only 
 collapse.facet=before would be supported.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-02-05 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268.patch

The patch includes:

# eliminate hl.useHighlighter parameter
# introduce hl.useFastVectorHighlighter parameter. The default is false

Therefore, Highlighter will be used unless hl.useFastVectorHighlighter set to 
true. I'll commit in a few days.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, 
 SOLR-1268.patch, SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829522#action_12829522
 ] 

Koji Sekiguchi commented on SOLR-236:
-

The following snippet in CollapseComponent.doProcess():

{code}
DocListAndSet results = searcher.getDocListAndSet(rb.getQuery(),
  collapseResult == null ? rb.getFilters() : null,
  collapseResult.getCollapsedDocset(),
  rb.getSortSpec().getSort(),
  rb.getSortSpec().getOffset(),
  rb.getSortSpec().getCount(),
  rb.getFieldFlags());
{code}

2nd line implies that collapseResult may be null. If it is null, we got NPE at 
3rd line?

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, 
 SOLR-236_collapsing.patch, SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search


 [ 
https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1753:
-

Affects Version/s: (was: 1.5)
Fix Version/s: 1.5

 StatsComponent throws java.lang.NullPointerException when getting statistics 
 for facets in distributed search
 -

 Key: SOLR-1753
 URL: https://issues.apache.org/jira/browse/SOLR-1753
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Windows
Reporter: Janne Majaranta
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-1753.patch


 When using the StatsComponent with a sharded request and getting statistics 
 over facets, a NullPointerException is thrown.
 Stacktrace:
 java.lang.NullPointerException at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) 
 at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) 
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
 java.lang.Thread.run(Unknown Source) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search


[ 
https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829914#action_12829914
 ] 

Koji Sekiguchi commented on SOLR-1753:
--

Patch looks good! Will commit shortly.

 StatsComponent throws java.lang.NullPointerException when getting statistics 
 for facets in distributed search
 -

 Key: SOLR-1753
 URL: https://issues.apache.org/jira/browse/SOLR-1753
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Windows
Reporter: Janne Majaranta
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-1753.patch


 When using the StatsComponent with a sharded request and getting statistics 
 over facets, a NullPointerException is thrown.
 Stacktrace:
 java.lang.NullPointerException at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) 
 at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) 
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
 java.lang.Thread.run(Unknown Source) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1753) StatsComponent throws java.lang.NullPointerException when getting statistics for facets in distributed search


 [ 
https://issues.apache.org/jira/browse/SOLR-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1753.
--

Resolution: Fixed

Committed revision 906781. Thanks Janne!

 StatsComponent throws java.lang.NullPointerException when getting statistics 
 for facets in distributed search
 -

 Key: SOLR-1753
 URL: https://issues.apache.org/jira/browse/SOLR-1753
 Project: Solr
  Issue Type: Bug
Affects Versions: 1.4
 Environment: Windows
Reporter: Janne Majaranta
Assignee: Koji Sekiguchi
 Fix For: 1.5

 Attachments: SOLR-1753.patch


 When using the StatsComponent with a sharded request and getting statistics 
 over facets, a NullPointerException is thrown.
 Stacktrace:
 java.lang.NullPointerException at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:54) 
 at 
 org.apache.solr.handler.component.StatsValues.accumulate(StatsValues.java:82) 
 at 
 org.apache.solr.handler.component.StatsComponent.handleResponses(StatsComponent.java:116)
  at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:290)
  at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) 
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) 
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
  at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
 at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) 
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
  at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at 
 java.lang.Thread.run(Unknown Source) 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter


 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268-0_fragsize.patch

Hmm, FVH doesn't work appropriately when fragsize=Integer.MAX_SIZE (see 
test0FragSize() in attached patch. It indicates FVH cannot produce whole 
snippet when fragsize=Integer.MAX_SIZE).

Now I think I should change the (traditional) Highlighter is default even if 
the highlighting field's termVectors/termPositions/termOffsets are all true, 
then only when hl.useFastVectorHighlighter is set to true, FVH will be used. 
hl.useFastVectorHighlighter parameter accepts per-field overrides. Plus FVH 
doesn't support 0 fragsize.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268-0_fragsize.patch, 
 SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2010-02-01 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12828039#action_12828039
 ] 

Koji Sekiguchi commented on SOLR-236:
-

A random comment, don't we need to check collapse.field is indexed in 
checkCollapseField()?

{code}
protected void checkCollapseField(IndexSchema schema) {
  SchemaField schemaField = schema.getFieldOrNull(collapseField);
  if (schemaField == null) {
throw new RuntimeException(Could not collapse, because collapse field does 
not exist in the schema.);
  }

  if (schemaField.multiValued()) {
throw new RuntimeException(Could not collapse, because collapse field is 
multivalued);
  }

  if (schemaField.getType().isTokenized()) {
throw new RuntimeException(Could not collapse, because collapse field is 
tokenized);
  }
}
{code}

I accidentally specified an unindexed field for collapse.field, I got 
unexpected result without any errors.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: configure FastVectorHihglighter in trunk

2010-01-29 Thread Koji Sekiguchi


Marc Sturlese wrote:

I think it fails when using defType dismax with more than one field.
In the default Solr example doesn't work eighter. I have added the default
.xml files with docs and using standard requestHandler it works. It doesn't
when using the dismax requestHandler

  

Ah, I see. FVH doesn't support DisjunctionMaxQuery. I should
be awake to it when you indicated that you used dismax at the
previous mail. Sorry about that.
I'll open an issue in Lucene and try to write a patch.

Thank you,

Koji

--
http://www.rondhuit.com/en/

Re: configure FastVectorHihglighter in trunk

2010-01-29 Thread Koji Sekiguchi


Koji Sekiguchi wrote:

Marc Sturlese wrote:

I think it fails when using defType dismax with more than one field.
In the default Solr example doesn't work eighter. I have added the 
default
.xml files with docs and using standard requestHandler it works. It 
doesn't

when using the dismax requestHandler

  

Ah, I see. FVH doesn't support DisjunctionMaxQuery. I should
be awake to it when you indicated that you used dismax at the
previous mail. Sorry about that.
I'll open an issue in Lucene and try to write a patch.

Thank you,

Koji


Opened:
https://issues.apache.org/jira/browse/LUCENE-2243

Koji

--
http://www.rondhuit.com/en/

[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-29 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268-0_fragsize.patch

{quote}
I have noticed an exception is thrown when using fragSize = 0 (wich should 
return the whole field highlighted):
fragCharSize(0) is too small. It must be 18 or higher. 
java.lang.IllegalArgumentException: fragCharSize(0) is too small. It must be 18 
or higher
{quote}

Thanks, Marc.
Solr 1.4 uses NullFragmenter that highlights whole content when you set 
fragsize to 0. But FVH doesn't have such feature because of using different 
algorithm.
In the attached patch, Solr sets fragsize to Integer.MAX_VALUE if user trys to 
set 0 when FVH is used. This prevents runtime error.
I think it is necessary in Solr level because Solr automatically switch to use 
FVH when the highlighting field is termVectors/termPositions/termOffsets are 
all true unless hl.useHighlighter set to true.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268-0_fragsize.patch, SOLR-1268.patch, 
 SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: configure FastVectorHihglighter in trunk

2010-01-28 Thread Koji Sekiguchi


Marc Sturlese wrote:

Can you give me the following info to reproduce the problem?

* field data


all fields are plain english text analyzed with the same analyzer

  

I meant I'd like to know your concrete data...

Koji

--
http://www.rondhuit.com/en/

Re: configure FastVectorHihglighter in trunk

2010-01-27 Thread Koji Sekiguchi


Can you give me the following info to reproduce the problem?

* field data
* query string
* field definition in schema.xml

 **I also have noticed that using snippet fragment size to 0 (wich in 
normal

 highlight returns the whole field highlighted) gives an error.

Hmm, I should check it. Can you open a JIRA issue?

Thank you,

Koji

--
http://www.rondhuit.com/en/


Marc Sturlese wrote:

I am having some trouble to make it work. I am debuging the code and I see
when de  FastVectorHighlighter constructor is created, the parameters that
it recieves are ok

// get FastVectorHighlighter instance out of the processing loop
FastVectorHighlighter fvh = new FastVectorHighlighter(
// FVH cannot process hl.usePhraseHighlighter parameter per-field
basis
params.getBool( HighlightParams.USE_PHRASE_HIGHLIGHTER, true ),
// FVH cannot process hl.requireFieldMatch parameter per-field basis
params.getBool( HighlightParams.FIELD_MATCH, false ),
getFragListBuilder( params ),
getFragmentsBuilder( params ) );

The query here is ok aswell:
FieldQuery fieldQuery = fvh.getFieldQuery( query );

But I can't see what's in fieldQuery (just a memory path and don't know to
do someting similar to toString())

The problem I see is in:

String[] snippets = highlighter.getBestFragments( fieldQuery,
req.getSearcher().getReader(), docId, fieldName,
params.getFieldInt( fieldName, HighlightParams.FRAGSIZE, 100
),
params.getFieldInt( fieldName, HighlightParams.SNIPPETS, 1 )
);

snippets ends up with an empty array so it jumps to:
alternateField( docSummaries, params, doc, fieldName );

In solrconfig.xml I added:
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=false/
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=false/

Maybe I am missing something... any idea?
Using the doHighlightingByHighlighter highlight works perfect.

**I also have noticed that using snippet fragment size to 0 (wich in normal
highlight returns the whole field highlighted) gives an error.



Koji Sekiguchi-2 wrote:
  

Marc Sturlese wrote:


How do I activate FastVectorHighlighter in trunk? Wich of those params
sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
  
  

You do not need to activate it. DefaultSolrHighlighter, which is the
default SolrHighlighter impl, calls automatically uses FVH when you
specify field names that are termVectors, termPositions and termOffsets
are true through hl.fl parameter. If you want to use multi colored tag
feature, you need to specify MultiColored*FragmentsBuilder in 
solrconfig.xml.


Koji

--
http://www.rondhuit.com/en/

Re: configure FastVectorHihglighter in trunk

2010-01-26 Thread Koji Sekiguchi


Marc Sturlese wrote:

How do I activate FastVectorHighlighter in trunk? Wich of those params sets
it up?
   !-- Configure the standard fragListBuilder --
   fragListBuilder name=simple
class=org.apache.solr.highlight.SimpleFragListBuilder default=true/

   !-- Configure the standard fragmentsBuilder --
   fragmentsBuilder name=colored
class=org.apache.solr.highlight.MultiColoredScoreOrderFragmentsBuilder
default=true/

   fragmentsBuilder name=scoreOrder
class=org.apache.solr.highlight.ScoreOrderFragmentsBuilder
default=true/

Thanks in advance.
  

You do not need to activate it. DefaultSolrHighlighter, which is the
default SolrHighlighter impl, calls automatically uses FVH when you
specify field names that are termVectors, termPositions and termOffsets
are true through hl.fl parameter. If you want to use multi colored tag
feature, you need to specify MultiColored*FragmentsBuilder in 
solrconfig.xml.


Koji

--
http://www.rondhuit.com/en/

Re: how to sort facets?

2010-01-26 Thread Koji Sekiguchi


David Rühr wrote:

hi,

we make a Filter with Faceting feature. In our faceting list the order 
is by count by the matches:

facet.sort=count

but we need to sort by = facet.sort=manufacturer.
Url manipulation doesn't change anything, why?

select?fl=*%2Cscorefq=type%3Apagespellcheck=truefacet=truefacet.mincount=1facet.sort=manufacturerbf=log(supplier_faktor)facet.field=supplierfacet.field=manufacturerversion=1.2q=kindstart=0rows=10 



so long,
David


Try facet.sort=index. facet.sort accepts only count or index.

http://wiki.apache.org/solr/SimpleFacetParameters#facet.sort

Koji

--
http://www.rondhuit.com/en/

[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting

2010-01-23 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804087#action_12804087
 ] 

Koji Sekiguchi commented on SOLR-1731:
--

So why don't you uni-gram on both index and query for sku field?

{code}
fieldType name=text_1g class=solr.TextField positionIncrementGap=100
analyzer type=index
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping.txt/
tokenizer class=solr.NGramTokenizerFactory minGramSize=1 
maxGramSize=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.NGramTokenizerFactory minGramSize=1 
maxGramSize=1/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
{code}

{quote}
As far as my application cares, those are all equivalent and should just be 
indexed as:

a1280c
{quote}

To eliminate space/period/hyphen, mapping.txt would look like:

{code}
  = 
. = 
- = 
{code}



 ArrayIndexOutOfBoundsException when highlighting
 

 Key: SOLR-1731
 URL: https://issues.apache.org/jira/browse/SOLR-1731
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Tim Underwood
Priority: Minor

 I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to 
 highlight for certain queries.  The error seems to be an issue with the 
 combination of the ShingleFilterFactory, PositionFilterFactory and the 
 LengthFilterFactory. 
 Here's my fieldType definition:
 fieldType name=textSku class=solr.TextField positionIncrementGap=100 
 omitNorms=true
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LengthFilterFactory min=2 max=100/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.ShingleFilterFactory maxShingleSize=8 
 outputUnigrams=true/
   filter class=solr.PositionFilterFactory /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.LengthFilterFactory min=2 max=100/ !-- works 
 if this is commented out --
 /analyzer
 /fieldType
 Here's the field definition:
 field name=sku_new type=textSku indexed=true stored=true 
 omitNorms=true/
 Here's a sample doc:
 add
 doc
   field name=id1/field
   field name=sku_newA 1280 C/field
 /doc
 /add
 Doing a query for sku_new:A 1280 C and requesting highlighting throws the 
 exception (full stack trace below):  
 http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=*
 If I comment out the LengthFilterFactory from my query analyzer section 
 everything seems to work.  Commenting out just the PositionFilterFactory also 
 makes the exception go away and seems to work for this specific query.
 Full stack trace:
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202)
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
 at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
 at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
 at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
 at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
 at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216

[jira] Commented: (SOLR-1731) ArrayIndexOutOfBoundsException when highlighting

2010-01-22 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12803976#action_12803976
 ] 

Koji Sekiguchi commented on SOLR-1731:
--

Can't you use WhitespaceTokenizer for index? 

 ArrayIndexOutOfBoundsException when highlighting
 

 Key: SOLR-1731
 URL: https://issues.apache.org/jira/browse/SOLR-1731
 Project: Solr
  Issue Type: Bug
  Components: highlighter
Affects Versions: 1.4
Reporter: Tim Underwood
Priority: Minor

 I'm seeing an java.lang.ArrayIndexOutOfBoundsException when trying to 
 highlight for certain queries.  The error seems to be an issue with the 
 combination of the ShingleFilterFactory, PositionFilterFactory and the 
 LengthFilterFactory. 
 Here's my fieldType definition:
 fieldType name=textSku class=solr.TextField positionIncrementGap=100 
 omitNorms=true
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory /
 filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 filter class=solr.LengthFilterFactory min=2 max=100/
   /analyzer
   analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.ShingleFilterFactory maxShingleSize=8 
 outputUnigrams=true/
   filter class=solr.PositionFilterFactory /
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 
 catenateAll=1/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.LengthFilterFactory min=2 max=100/ !-- works 
 if this is commented out --
 /analyzer
 /fieldType
 Here's the field definition:
 field name=sku_new type=textSku indexed=true stored=true 
 omitNorms=true/
 Here's a sample doc:
 add
 doc
   field name=id1/field
   field name=sku_newA 1280 C/field
 /doc
 /add
 Doing a query for sku_new:A 1280 C and requesting highlighting throws the 
 exception (full stack trace below):  
 http://localhost:8983/solr/select/?q=sku_new%3A%22A+1280+C%22version=2.2start=0rows=10indent=onhl=onhl.fl=sku_newfl=*
 If I comment out the LengthFilterFactory from my query analyzer section 
 everything seems to work.  Commenting out just the PositionFilterFactory also 
 makes the exception go away and seems to work for this specific query.
 Full stack trace:
 java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:202)
 at 
 org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
 at 
 org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
 at 
 org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
 at 
 org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
 at 
 org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
 at 
 org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
 at org.mortbay.jetty.Server.handle(Server.java:285)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821

Upgrading Lucene jars

2010-01-17 Thread Koji Sekiguchi

I'd like to upgrade all Lucene jars to the latest 2.9 branch (r900222).
If there is no objections, I'll commit tomorrow.
Now I'm testing Lucene 2.9 branch and Solr trunk with latest 2.9.

Thank you,

Koji

-- 
http://www.rondhuit.com/en/

Re: Build failed in Hudson: Solr-trunk #1027

2010-01-10 Thread Koji Sekiguchi


http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/testReport/org.apache.solr.request/TestWriterPerf/testPerf/

The cause of this failure is undefined field t1 is set to hl.fl in the 
test code.

Before FastVectorHighlighter committed, it seems undefined fields
are ignored. I think I should ignore them in FVH, too. I'm look into it...

Koji

--
http://www.rondhuit.com/en/



Apache Hudson Server wrote:

See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/

--
[...truncated 2343 lines...]
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.49 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.574 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.977 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.506 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.618 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 17.669 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 33.972 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 39.944 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.917 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.375 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.43 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.488 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.507 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.768 sec
[junit] Running org.apache.solr.client.solrj.response.TermsResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.705 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 13.645 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.408 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.549 sec
[junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.48 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.431 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.682 sec
[junit] Running org.apache.solr.common.util.DOMUtilTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.565 sec
[junit] Running org.apache.solr.common.util.FileUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.433 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.446 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.396 sec
[junit] Running org.apache.solr.common.util.TestFastInputStream
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.547 sec
[junit] Running org.apache.solr.common.util.TestHash
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.698 sec
[junit] Running org.apache.solr.common.util.TestNamedListCodec
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 2.891 sec
[junit] Running org.apache.solr.common.util.TestXMLEscaping
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.381 sec
[junit] Running org.apache.solr.core.AlternateDirectoryTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 5.552 sec
[junit]

Re: Build failed in Hudson: Solr-trunk #1027

2010-01-10 Thread Koji Sekiguchi


Koji Sekiguchi wrote:
http://hudson.zones.apache.org/hudson/job/Solr-trunk/1027/testReport/org.apache.solr.request/TestWriterPerf/testPerf/ 



The cause of this failure is undefined field t1 is set to hl.fl in the 
test code.

Before FastVectorHighlighter committed, it seems undefined fields
are ignored. I think I should ignore them in FVH, too. I'm look into 
it...


Koji


Committed revision 897611.

Koji

--
http://www.rondhuit.com/en/

[jira] Updated: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-09 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1696:
-

Attachment: SOLR-1696.patch

A new patch attached. Just to sync with trunk plus warning log when deprecated 
syntax is found (the idea Chris mentioned above).

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch, SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2010-01-08 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798271#action_12798271
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

Thanks, Paul! I've just committed revision 897357.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-08 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1268.
--

Resolution: Fixed

Committed revision 897383.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-08 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798312#action_12798312
 ] 

Koji Sekiguchi commented on SOLR-1696:
--

I've just committed SOLR-1268. Now I'm trying to contribute a patch for this to 
sync with trunk...

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-07 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797841#action_12797841
 ] 

Koji Sekiguchi commented on SOLR-1696:
--

Noble, thank you for opening this and attaching the patch! Are you planning to 
commit this shortly? because I'm ready to commit SOLR-1268 that is using old 
style config. If you commit it, I'll rewrite SOLR-1268. Or I can assign 
SOLR-1696 to me.

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-04 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796147#action_12796147
 ] 

Koji Sekiguchi commented on SOLR-1268:
--

I'll commit in a few days if nobody objects.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-03 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796075#action_12796075
 ] 

Koji Sekiguchi commented on SOLR-1268:
--

I'm introducing fragListBuilder/ and fragmentsBuilder/ new sub tags of 
highlighting/ in solrconfig.xml in this patch, rather than 
searchComponent/. I think we can open a separate ticket for moving 
highlighting/ settings to searchComponent/, if needed.

FYI:
http://old.nabble.com/highlighting-setting-in-solrconfig.xml-td26984003.html

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch, SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1268) Incorporate Lucene's FastVectorHighlighter

2010-01-02 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1268:
-

Attachment: SOLR-1268.patch

First draft, untested patch attached.

 Incorporate Lucene's FastVectorHighlighter
 --

 Key: SOLR-1268
 URL: https://issues.apache.org/jira/browse/SOLR-1268
 Project: Solr
  Issue Type: New Feature
  Components: highlighter
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1268.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug

2009-12-19 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792920#action_12792920
 ] 

Koji Sekiguchi commented on SOLR-1670:
--

bq. the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
which does not really validate that two lists of token are equal, it just stops 
at the shorted one.

I agree with you regarding this part. But I'm not sure that the following 
size() should be 1 in your patch:

{code}
+assertEquals(1, getTokList(map,a b,false).size());
{code}

If what repeats implies is repeating same term intentionally, I think it can 
boost tf.

 synonymfilter/map repeat bug
 

 Key: SOLR-1670
 URL: https://issues.apache.org/jira/browse/SOLR-1670
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir
 Attachments: SOLR-1670_test.patch


 as part of converting tests for SOLR-1657, I ran into a problem with 
 synonymfilter
 the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
 which does not really validate that two lists of token are equal, it just 
 stops at the shorted one.
 {code}
 // repeats
 map.add(strings(a b), tokens(ab), orig, merge);
 map.add(strings(a b), tokens(ab), orig, merge);
 assertTokEqual(getTokList(map,a b,false), tokens(ab));
 /* in reality the result from getTokList is ab ab ab! */
 {code}
 when converted to assertTokenStreamContents this problem surfaced. attached 
 is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1670) synonymfilter/map repeat bug

2009-12-19 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792928#action_12792928
 ] 

Koji Sekiguchi commented on SOLR-1670:
--

Robert, sorry, I wanted to say I agree with you regarding the test for 
'repeats' has a flaw. Then boost TF was just an input, though I don't know 
it is intentional feature or side effect.

Why don't you fix the flaws in SynonymFilter test in this ticket first, then 
fix SOLR-1674? (I've not looked into SOLR-1674 yet.)

 synonymfilter/map repeat bug
 

 Key: SOLR-1670
 URL: https://issues.apache.org/jira/browse/SOLR-1670
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir
 Attachments: SOLR-1670_test.patch


 as part of converting tests for SOLR-1657, I ran into a problem with 
 synonymfilter
 the test for 'repeats' has a flaw, it uses this assertTokEqual construct 
 which does not really validate that two lists of token are equal, it just 
 stops at the shorted one.
 {code}
 // repeats
 map.add(strings(a b), tokens(ab), orig, merge);
 map.add(strings(a b), tokens(ab), orig, merge);
 assertTokEqual(getTokList(map,a b,false), tokens(ab));
 /* in reality the result from getTokList is ab ab ab! */
 {code}
 when converted to assertTokenStreamContents this problem surfaced. attached 
 is an additional assertion to the existing testcase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1653) add PatternReplaceCharFilter

2009-12-15 Thread Koji Sekiguchi (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1653.
--

Resolution: Fixed

Committed revision 890798. Thanks Shalin and Noble for taking time to review 
the patch.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order 
of the groups|


 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:27 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the 
order of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)-(\d+)-(\d+)|3,{-},1,{-},2|5678-abc-1234|change the order 
of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:29 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order 
of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order 
of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:28 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678-abc-1234|change the order 
of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)--(\d+)--(\d+)|3,{--},1,{--},2|5678-abc-1234|change the 
order of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790056#action_12790056
 ] 

Koji Sekiguchi edited comment on SOLR-1653 at 12/14/09 9:30 AM:


Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc=1234=5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order 
of the groups|


  was (Author: koji):
Ok. I'll show you same samples ;-)

||INPUT||groupedPattern||replaceGroups||OUTPUT||comment||
|see-ing looking|(\w+)(ing)|1|see-ing look|remove ing from the end of word|
|see-ing looking|(\w+)ing|1|see-ing look|same as above. 2nd parentheses can be 
omitted|
|No.1 NO. no.  543|[nN][oO]\.\s*(\d+)|{#},1|#1  NO. #543|sample for 
literal. do not forget to set blockDelimiters other than period when you use 
period in groupedPattern|
|abc-1234-5678|(\w+)=(\d+)=(\d+)|3,{=},1,{=},2|5678=abc=1234|change the order 
of the groups|

  
 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790127#action_12790127
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

bq. I guess this can be achieved with the matcher#replaceAll() directly 

You're right if we don't correct offset of the output char stream. I need to 
process one match at a time.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter


 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1653:
-

Attachment: SOLR-1653.patch

Excuse myself, because I tried to correct offset per group in a match when I 
started the first patch, I introduced my own syntax. But, yes, now I've 
implemented the offset correction per match, so I can use standard syntax. Here 
is the new patch.

Usage:
{code:title=schema.xml}
fieldType name=textCharNorm class=solr.TextField 
positionIncrementGap=100 
  analyzer
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=([nN][oO]\.)\s*(\d+)
replaceWith=$1$2/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType
{code}

If there is no objections, I'll commit later today.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790572#action_12790572
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

I see that existing PatternReplaceFilter (not CharFilter) is using pattern. 
But it uses replacement, not replaceWith. I think I use pattern and 
replacement.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1653) add PatternReplaceCharFilter

add PatternReplaceCharFilter


 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5


Add a new CharFilter that uses a regular expression for the target of replace 
string in char stream.

Usage:
{code:title=schema.xml}
fieldType name=textCharNorm class=solr.TextField 
positionIncrementGap=100 
  analyzer
charFilter class=solr.PatternReplaceCharFilterFactory
groupedPattern=([nN][oO]\.)\s*(\d+)
replaceGroups=1,2 blockDelimiters=:;/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1653) add PatternReplaceCharFilter


 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1653:
-

Attachment: SOLR-1653.patch

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (SOLR-1653) add PatternReplaceCharFilter


 [ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi reassigned SOLR-1653:


Assignee: Koji Sekiguchi

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12789957#action_12789957
 ] 

Koji Sekiguchi commented on SOLR-1653:
--

I'll commit in a few days.

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Upgrading Lucene jars

2009-12-08 Thread Koji Sekiguchi


Shalin Shekhar Mangar wrote:

I need to upgrade contrib-spellcheck jar for SOLR-785. Should I go ahead and
upgrade all Lucene jars to the latest 2.9 branch code?

  

+1.

Koji

--
http://www.rondhuit.com/en/

[jira] Commented: (SOLR-1606) Integrate Near Realtime

2009-12-05 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12786448#action_12786448
 ] 

Koji Sekiguchi commented on SOLR-1606:
--

Jason, I got a failure when running TestRefreshReader.

 Integrate Near Realtime 
 

 Key: SOLR-1606
 URL: https://issues.apache.org/jira/browse/SOLR-1606
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1606.patch


 We'll integrate IndexWriter.getReader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1607) use a proper key other than IndexReader for ExternalFileField and QueryElevationCompenent to work properly when reopenReaders is set to true

2009-11-28 Thread Koji Sekiguchi (JIRA)

use a proper key other than IndexReader for ExternalFileField and 
QueryElevationCompenent to work properly when reopenReaders is set to true


 Key: SOLR-1607
 URL: https://issues.apache.org/jira/browse/SOLR-1607
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5


As introducing reopenReaders feature in 1.4, this prevent reload 
external_[fieldname] and elevate.xml files in dataDir when commit is submitted.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1489) A UTF-8 character is output twice (Bug in Jetty)


 [ 
https://issues.apache.org/jira/browse/SOLR-1489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1489:
-

Attachment: SOLR-1489.patch

Attached patch fixes the above failure, but I got another failure (no expires 
header):

{code}
Testcase: testCacheVetoHandler took 3.29 sec
Testcase: testCacheVetoException took 1.395 sec
FAILED
We got no Expires header
junit.framework.AssertionFailedError: We got no Expires header
at 
org.apache.solr.servlet.CacheHeaderTest.checkVetoHeaders(CacheHeaderTest.java:73)
at 
org.apache.solr.servlet.CacheHeaderTest.testCacheVetoException(CacheHeaderTest.java:59)

Testcase: testLastModified took 1.485 sec
Testcase: testEtag took 1.577 sec
Testcase: testCacheControl took 1.035 sec
{code}


 A UTF-8 character is output twice (Bug in Jetty)
 

 Key: SOLR-1489
 URL: https://issues.apache.org/jira/browse/SOLR-1489
 Project: Solr
  Issue Type: Bug
 Environment: Jetty-6.1.3
 Jetty-6.1.21
 Jetty-7.0.0RC6
Reporter: Jun Ohtani
Assignee: Koji Sekiguchi
Priority: Critical
 Attachments: error_utf8-example.xml, jetty-6.1.22.jar, 
 jetty-util-6.1.22.jar, jettybugsample.war, jsp-2.1.zip, 
 servlet-api-2.5-20081211.jar, SOLR-1489.patch


 A UTF-8 character is output twice under particular conditions.
 Attach the sample data.(error_utf8-example.xml)
 Registered only sample data, click the following URL.
 http://localhost:8983/solr/select?q=*%3A*version=2.2start=0rows=10omitHeader=truefl=attr_jsonwt=json
 Sample data is only Ｂ, but response is ＢＢ.
 When wt=phps, error occurs in PHP unsrialize() function.
 This bug is like a bug in Jetty.
 jettybugsample.war is the simplest one to reproduce the problem.
 Copy example/webapps, and start Jetty server, and click the following URL.
 http://localhost:8983/jettybugsample/filter/hoge
 Like earlier, B is output twice. Sysout only B once.
 I have tested this on Jetty 6.1.3 and 6.1.21, 7.0.0rc6.
 (When testing with 6.1.21or 7.0.0rc6, change bufsize from 128 to 512 in 
 web.xml. )

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1601) Schema browser does not indicate presence of charFilter


 [ 
https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1601:
-

  Component/s: Schema and Analysis
Affects Version/s: 1.4
Fix Version/s: 1.5
 Assignee: Koji Sekiguchi

 Schema browser does not indicate presence of charFilter
 ---

 Key: SOLR-1601
 URL: https://issues.apache.org/jira/browse/SOLR-1601
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Jake Brownell
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.5


 My schema has a field defined as:
 {noformat}
 fieldType name=text class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 analyzer type=query
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /  
  
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 /fieldType
 {noformat}
 and when I view the field in the schema browser, I see:
 {noformat}
 Tokenized:  true
 Class Name:  org.apache.solr.schema.TextField
 Index Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 1 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 Query Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt 
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 {noformat}
 It's not a big deal, but I expected to see some indication of the charFilter 
 that is in place.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1601) Schema browser does not indicate presence of charFilter


 [ 
https://issues.apache.org/jira/browse/SOLR-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated SOLR-1601:
-

Attachment: SOLR-1601.patch

Will commit shortly.

 Schema browser does not indicate presence of charFilter
 ---

 Key: SOLR-1601
 URL: https://issues.apache.org/jira/browse/SOLR-1601
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Jake Brownell
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1601.patch


 My schema has a field defined as:
 {noformat}
 fieldType name=text class=solr.TextField 
 positionIncrementGap=100
 analyzer type=index
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=1 catenateNumbers=1 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 analyzer type=query
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory /
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt enablePositionIncrements=true /  
  
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1
 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=1 /
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt /
 filter class=solr.RemoveDuplicatesTokenFilterFactory /
 
 /analyzer
 /fieldType
 {noformat}
 and when I view the field in the schema browser, I see:
 {noformat}
 Tokenized:  true
 Class Name:  org.apache.solr.schema.TextField
 Index Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 1 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 1 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 Query Analyzer: org.apache.solr.analysis.TokenizerChain 
 Tokenizer Class:  org.apache.solr.analysis.WhitespaceTokenizerFactory
 Filters:  
 org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt 
 expand: true ignoreCase: true }
 org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt 
 ignoreCase: true enablePositionIncrements: true }
 org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange: 
 1 generateNumberParts: 1 catenateWords: 0 generateWordParts: 1 catenateAll: 0 
 catenateNumbers: 0 }
 org.apache.solr.analysis.LowerCaseFilterFactory args:{}
 org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: 
 protwords.txt }
 org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{}
 {noformat}
 It's not a big deal, but I expected to see some indication of the charFilter 
 that is in place.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1601) Schema browser does not indicate presence of charFilter