[jira] [Commented] (SOLR-3350) TextField's parseFieldQuery method not using analyzer's enablePosIncr parameter

2012-04-12 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252285#comment-13252285
 ] 

Tommaso Teofili commented on SOLR-3350:
---

Hi Robert,
For TextField having enablePositionIncrements just set to true and then 
evaluating an always true condition seems just wrong (code wise) so we should 
discuss if the issue is either in the true constant or in the code switching on 
it.
It should be clear how a mixed configuration like the one above should result 
in terms of an overall enablePositionIncrements property (true, false, not 
set) if that's needed in the field type implementation (maybe traversing 
objects from the QParser to the SchemaField or in some more convenient way, if 
it exists).
Depending on the choice taken on how to fix the code, if a Solr type using 
TextField has a tokenizer/some filters with enablePositionIncrements set to 
false then there would be different options:
- option 1: it should raise a configuration error
- option 2: log a warning message
- option 3: don't care (like it is now)

 TextField's parseFieldQuery method not using analyzer's enablePosIncr 
 parameter
 ---

 Key: SOLR-3350
 URL: https://issues.apache.org/jira/browse/SOLR-3350
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5, 4.0
Reporter: Tommaso Teofili
Priority: Minor

 parseFieldQuery method of TextField class just set 
 {code}
   ...
   boolean enablePositionIncrements = true;
   ...
 {code}
 while that should be taken from Analyzer's configuration.
 The above condition is evaluated afterwards in two points:
 {code}
   ...
   if (enablePositionIncrements) {
 mpq.add((Term[]) multiTerms.toArray(new Term[0]), position);
   } else {
 mpq.add((Term[]) multiTerms.toArray(new Term[0]));
   }
   return mpq;
   ...
   ...
   if (enablePositionIncrements) {
 position += positionIncrement;
 pq.add(new Term(field, term), position);
   } else {
  pq.add(new Term(field, term));
   }
   ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-03-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241049#comment-13241049
 ] 

Tommaso Teofili commented on SOLR-2983:
---

and merged to branch_rx on r1306733

 Unable to load custom MergePolicy
 -

 Key: SOLR-2983
 URL: https://issues.apache.org/jira/browse/SOLR-2983
 Project: Solr
  Issue Type: Bug
Reporter: Mathias Herberts
Assignee: Tommaso Teofili
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2983.patch, SOLR-2983_2.patch


 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
 our use of LinkedIn's ZoieMergePolicy.
 It seems the code that loads a custom MergePolicy was at some point moved 
 into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
 copied verbatim it now contains a bug:
 try {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName, null, new 
 Class[]{IndexWriter.class}, new Object[]{this});
 } catch (Exception e) {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName);
 }
 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
 to newInstance will always throw an exception and the catch clause will be 
 executed. If the custom MergePolicy does not have a default constructor 
 (which is the case of ZoieMergePolicy), the second attempt to create the 
 MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-03-22 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13235413#comment-13235413
 ] 

Tommaso Teofili commented on SOLR-2983:
---

I just noticed also the toIndexWriter method should be explicitly tested, going 
to work on it and attach a new patch

 Unable to load custom MergePolicy
 -

 Key: SOLR-2983
 URL: https://issues.apache.org/jira/browse/SOLR-2983
 Project: Solr
  Issue Type: Bug
Reporter: Mathias Herberts
Assignee: Tommaso Teofili
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2983.patch


 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
 our use of LinkedIn's ZoieMergePolicy.
 It seems the code that loads a custom MergePolicy was at some point moved 
 into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
 copied verbatim it now contains a bug:
 try {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName, null, new 
 Class[]{IndexWriter.class}, new Object[]{this});
 } catch (Exception e) {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName);
 }
 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
 to newInstance will always throw an exception and the catch clause will be 
 executed. If the custom MergePolicy does not have a default constructor 
 (which is the case of ZoieMergePolicy), the second attempt to create the 
 MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-03-22 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236111#comment-13236111
 ] 

Tommaso Teofili commented on SOLR-2983:
---

committed on trunk at r1304098

 Unable to load custom MergePolicy
 -

 Key: SOLR-2983
 URL: https://issues.apache.org/jira/browse/SOLR-2983
 Project: Solr
  Issue Type: Bug
Reporter: Mathias Herberts
Assignee: Tommaso Teofili
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2983.patch, SOLR-2983_2.patch


 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
 our use of LinkedIn's ZoieMergePolicy.
 It seems the code that loads a custom MergePolicy was at some point moved 
 into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
 copied verbatim it now contains a bug:
 try {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName, null, new 
 Class[]{IndexWriter.class}, new Object[]{this});
 } catch (Exception e) {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName);
 }
 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
 to newInstance will always throw an exception and the catch clause will be 
 executed. If the custom MergePolicy does not have a default constructor 
 (which is the case of ZoieMergePolicy), the second attempt to create the 
 MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-03-20 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13233731#comment-13233731
 ] 

Tommaso Teofili commented on SOLR-2983:
---

Agreed, I'm going to do the needed updates to changes.txt and 
upgrade/backcompat.

 Unable to load custom MergePolicy
 -

 Key: SOLR-2983
 URL: https://issues.apache.org/jira/browse/SOLR-2983
 Project: Solr
  Issue Type: Bug
Reporter: Mathias Herberts
Assignee: Tommaso Teofili
Priority: Minor
 Fix For: 3.6, 4.0

 Attachments: SOLR-2983.patch


 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
 our use of LinkedIn's ZoieMergePolicy.
 It seems the code that loads a custom MergePolicy was at some point moved 
 into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
 copied verbatim it now contains a bug:
 try {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName, null, new 
 Class[]{IndexWriter.class}, new Object[]{this});
 } catch (Exception e) {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName);
 }
 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
 to newInstance will always throw an exception and the catch clause will be 
 executed. If the custom MergePolicy does not have a default constructor 
 (which is the case of ZoieMergePolicy), the second attempt to create the 
 MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest

2012-03-16 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13230987#comment-13230987
 ] 

Tommaso Teofili commented on LUCENE-3869:
-

Ok thanks Robert, let me know if you find a specific scenario where this 
happens more frequently so that I can try it out as well. 

 possible hang in UIMATypeAwareAnalyzerTest
 --

 Key: LUCENE-3869
 URL: https://issues.apache.org/jira/browse/LUCENE-3869
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Just testing an unrelated patch, I was hung (with 100% cpu) in 
 UIMATypeAwareAnalyzerTest.
 I'll attach stacktrace at the moment of the hang.
 The fact we get a seed in the actual stacktraces for cases like this is 
 awesome! Thanks Dawid!
 I don't think it reproduces 100%, but I'll try beasting this seed to see if i 
 can reproduce the hang:
 should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest 
 -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' 
 from what I can see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest

2012-03-15 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229939#comment-13229939
 ] 

Tommaso Teofili commented on LUCENE-3869:
-

I tried to reproduce that many times (same command/seed) but with no luck so 
far, which environment are you running Robert?

 possible hang in UIMATypeAwareAnalyzerTest
 --

 Key: LUCENE-3869
 URL: https://issues.apache.org/jira/browse/LUCENE-3869
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Just testing an unrelated patch, I was hung (with 100% cpu) in 
 UIMATypeAwareAnalyzerTest.
 I'll attach stacktrace at the moment of the hang.
 The fact we get a seed in the actual stacktraces for cases like this is 
 awesome! Thanks Dawid!
 I don't think it reproduces 100%, but I'll try beasting this seed to see if i 
 can reproduce the hang:
 should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest 
 -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' 
 from what I can see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3869) possible hang in UIMATypeAwareAnalyzerTest

2012-03-14 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229241#comment-13229241
 ] 

Tommaso Teofili commented on LUCENE-3869:
-

Thanks Robert, I'm taking a look

 possible hang in UIMATypeAwareAnalyzerTest
 --

 Key: LUCENE-3869
 URL: https://issues.apache.org/jira/browse/LUCENE-3869
 Project: Lucene - Java
  Issue Type: Bug
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Robert Muir

 Just testing an unrelated patch, I was hung (with 100% cpu) in 
 UIMATypeAwareAnalyzerTest.
 I'll attach stacktrace at the moment of the hang.
 The fact we get a seed in the actual stacktraces for cases like this is 
 awesome! Thanks Dawid!
 I don't think it reproduces 100%, but I'll try beasting this seed to see if i 
 can reproduce the hang:
 should be 'ant test -Dtestcase=UIMATypeAwareAnalyzerTest 
 -Dtests.seed=-262aada3325aa87a:-44863926cf5c87e9:5c8c471d901b98bd' 
 from what I can see.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-13 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13228941#comment-13228941
 ] 

Tommaso Teofili commented on SOLR-3013:
---

yes, this is committed but it's not resolved yet as it needs to be adapted to 
3.x as well.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3204) solr-commons-csv must not use the org.apache.commons.csv package

2012-03-07 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224678#comment-13224678
 ] 

Tommaso Teofili commented on SOLR-3204:
---

bq. Well the carrot is similar to the uima case, in both cases we have their 
committers also as committers working on integrations within on our project, 
and they have voiced no problem with how things work so far, so why break it?

also starting from 3.5.0 the UIMA dependencies' jars are released artifacts 
(see SOLR-2746 and thus 
http://mvnrepository.com/artifact/org.apache.solr/solr-uima/3.5.0 )

 solr-commons-csv must not use the org.apache.commons.csv package
 

 Key: SOLR-3204
 URL: https://issues.apache.org/jira/browse/SOLR-3204
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 3.5
Reporter: Emmanuel Bourg
Priority: Blocker
 Fix For: 3.6

 Attachments: SOLR-3204.patch, SOLR-3204.patch, SOLR-3204.patch, 
 apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar, rule.txt, rule.txt, 
 solr-csv.patch


 The solr-commons-csv artifact reused the code from the Apache Commons CSV 
 project but the package wasn't changed to something else than 
 org.apache.commons.csv in the process. This creates a compatibility issue as 
 the Apache Commons team works toward an official release of Commons CSV. It 
 prevents Commons CSV from using its own org.apache.commons.csv package, or 
 forces the renaming of all the classes to avoid a classpath conflict.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3204) solr-commons-csv must not use the org.apache.commons.csv package

2012-03-07 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224688#comment-13224688
 ] 

Tommaso Teofili commented on SOLR-3204:
---

bq. The issue we had caused by our separate release of unreleased package under 
the solr group-id was that maven is seeing our repackaged dependency under 
another artifact id - so it cannot prevent that a project adds solr-commons-xx, 
version-foo and commony-xx, version-bar, because it is two different things.

yes, that's also my understanding of this issue with unreleased dependencies' 
jars.

 solr-commons-csv must not use the org.apache.commons.csv package
 

 Key: SOLR-3204
 URL: https://issues.apache.org/jira/browse/SOLR-3204
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 3.5
Reporter: Emmanuel Bourg
Priority: Blocker
 Fix For: 3.6

 Attachments: SOLR-3204.patch, SOLR-3204.patch, SOLR-3204.patch, 
 apache-solr-commons-csv-1.0-SNAPSHOT-r966014.jar, rule.txt, rule.txt, 
 solr-csv.patch


 The solr-commons-csv artifact reused the code from the Apache Commons CSV 
 project but the package wasn't changed to something else than 
 org.apache.commons.csv in the process. This creates a compatibility issue as 
 the Apache Commons team works toward an official release of Commons CSV. It 
 prevents Commons CSV from using its own org.apache.commons.csv package, or 
 forces the renaming of all the classes to avoid a classpath conflict.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2693) Solr 4.0 - start and rows parameter addup together

2012-03-07 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224708#comment-13224708
 ] 

Tommaso Teofili commented on SOLR-2693:
---

bq. I can't reproduce this using the current trunk example.

same here, Marcin could you please say something more about the configuration 
used where this is happening?

 Solr 4.0 - start and rows parameter addup together
 --

 Key: SOLR-2693
 URL: https://issues.apache.org/jira/browse/SOLR-2693
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.0
 Environment: centos 5.4, tomcat 7, Java 1.6.0_14
Reporter: Marcin
Priority: Blocker

 Hi guys,
 I have a weird problem with rows and start parameters, start simply is not 
 working but any number of it is being added to rows so i.e. when used like 
 that start=10 rows=20 then 30 rows will be returned beginning from the 1st 
 result.
 any ideas ?
 cheers

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2253) Solr should be able to keep on truckin' if a shard fails during a distributed search

2012-03-07 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224760#comment-13224760
 ] 

Tommaso Teofili commented on SOLR-2253:
---

I think we can mark this one as duplicate of SOLR-1143

 Solr should be able to keep on truckin' if a shard fails during a distributed 
 search
 

 Key: SOLR-2253
 URL: https://issues.apache.org/jira/browse/SOLR-2253
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4.1
 Environment: All
Reporter: Rich Cariens
Priority: Critical
 Attachments: SOLR-2253.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 Solr 1.4.x currently abandons searches if a shard fails during a distributed 
 search.  A trivial patch to the SearchHandler class would allow the user to 
 tell Solr to keep on trucking in these cases.  Solr can indicate that the 
 search response is partial via existing response header conventions, as 
 well as include details about which shard failed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3197) Allow firstSearcher and newSearcher listeners to run in multiple threads

2012-03-02 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221539#comment-13221539
 ] 

Tommaso Teofili commented on SOLR-3197:
---

An alternative would be to use the CachedThreadPool as default as it makes it 
possible to reuse cached threads (see 
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/Executors.html#newCachedThreadPool()
 )
 

 Allow firstSearcher and newSearcher listeners to run in multiple threads
 

 Key: SOLR-3197
 URL: https://issues.apache.org/jira/browse/SOLR-3197
 Project: Solr
  Issue Type: Improvement
Reporter: Lance Norskog

 SolrCore submits all listeners (firstSearcher and newSearcher) to a java 
 ExecutorService, but uses a single-threaded one. 
 line 965 in the trunk: 
 {code}
 SolrCore.java around line 965: final ExecutorService searcherExecutor = 
 Executors.newSingleThreadExecutor(); 
 line 1280 in the trunk: 
 SolrCore.java around line 1280 runs first the, and then the first and new 
 searchers, all with the searcherExecutor object created at line 965. 
 Would it work if we changed this ExecutorService to a thread pool version? 
 This seems like it should work:
 {code}
 java.util.concurrent.Executors.newFixedThreadPool(int nThreads, ThreadFactory 
 threadFactory);
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219906#comment-13219906
 ] 

Tommaso Teofili commented on SOLR-3013:
---

thanks Steven, now fixing

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219962#comment-13219962
 ] 

Tommaso Teofili commented on SOLR-3013:
---

it should be ok now.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218975#comment-13218975
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

I think we can mark this one as resolved, just I'd keep this only for trunk and 
backport the whole thing to 3.x once SOLR-3013 is resolved and committed to 
trunk too.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3140) Make omitNorms default for all numeric field types

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219612#comment-13219612
 ] 

Tommaso Teofili commented on SOLR-3140:
---

bq. Is there a better place to set this default than in init() in the new base 
class?

I agree that's the method responsible for doing this kind of stuff

bq. I don't think so? if you search on a multivalued string field like 
keywords or tags it's reasonable to want length normalization to be a 
factor to prevent keyword stuffing.

good point

 Make omitNorms default for all numeric field types
 --

 Key: SOLR-3140
 URL: https://issues.apache.org/jira/browse/SOLR-3140
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: omitNorms
 Fix For: 4.0

 Attachments: SOLR-3140.patch


 Today norms are enabled for all Solr field types by default, while in Lucene 
 norms are omitted for the numeric types.
 Propose to make the Solr defaults the same as in Lucene, so that if someone 
 occasionally wants index-side boost for a numeric field type they must say 
 omitNorms=false. This lets us simplify the example schema too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219615#comment-13219615
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Now that LUCENE-3731 has been resolved I'll proceed with adding the needed 
factories for the Tokenizers in Solr.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219624#comment-13219624
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Solr factories committed in r1295330

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3140) Make omitNorms default for all numeric field types

2012-02-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219625#comment-13219625
 ] 

Tommaso Teofili commented on SOLR-3140:
---

maybe something like PrimitiveFieldType (that should recall Java primitive 
types)

 Make omitNorms default for all numeric field types
 --

 Key: SOLR-3140
 URL: https://issues.apache.org/jira/browse/SOLR-3140
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: omitNorms
 Fix For: 4.0

 Attachments: SOLR-3140.patch


 Today norms are enabled for all Solr field types by default, while in Lucene 
 norms are omitted for the numeric types.
 Propose to make the Solr defaults the same as in Lucene, so that if someone 
 occasionally wants index-side boost for a numeric field type they must say 
 omitNorms=false. This lets us simplify the example schema too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3140) Make omitNorms default for all numeric field types

2012-02-28 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218600#comment-13218600
 ] 

Tommaso Teofili commented on SOLR-3140:
---

yes big +1 

 Make omitNorms default for all numeric field types
 --

 Key: SOLR-3140
 URL: https://issues.apache.org/jira/browse/SOLR-3140
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jan Høydahl
  Labels: omitNorms
 Fix For: 4.0


 Today norms are enabled for all Solr field types by default, while in Lucene 
 norms are omitted for the numeric types.
 Propose to make the Solr defaults the same as in Lucene, so that if someone 
 occasionally wants index-side boost for a numeric field type they must say 
 omitNorms=false. This lets us simplify the example schema too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3174) Visualize Cluster State

2012-02-28 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13218602#comment-13218602
 ] 

Tommaso Teofili commented on SOLR-3174:
---

yes this'd be a very nice improvement

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley

 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-25 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13216459#comment-13216459
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

the two methods analyzeText() and analyzeInput() are confusing so the first one 
should just be renamed as initializeIterator() as its main purpose is to 
prepare the FSIterator which holds the annotations that will be used inside the 
incrementToken() method.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-22 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13214093#comment-13214093
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

After some more testing I think the CasPool is good just for scenarios where 
the pool serves different CAS to different clients (the tokenizers), so not 
really helpful in the current implementation, however it may be useful if we 
abstract the operation of obtaining and releasing a CAS outside the 
BaseTokenizer.

In the meantime I noticed the AEProviderFactory getAEProvider() methods have a 
keyPrefix parameter that came from Solr implementation and was intended to hold 
the core name, so, at the moment I think it'd be better to have (also) methods 
which don't need that paramater for the Lucene uses.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-16 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209247#comment-13209247
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

Right, everything seems ok now.
I also tried to comment the 
{noformat}
property name=tests.threadspercpu value=0 /
{noformat}
line in build.xml in order to execute tests in parallel.
Multiple parallel tests executions, with also -Dtests.multiplier=100, with 
Java6 passed flawlessly; will see if that is the case for Java7 too.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-16 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209301#comment-13209301
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

some improvement in performance came out releasing the CAS and AE on close() 
call

{noformat}
  @Override
  public void close() throws IOException {
super.close();
// release UIMA resources
cas.release();
ae.destroy();
  }
{noformat}

Now investigating the use of CASPool for improving throughput on high usages 
scenarios.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-16 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13209490#comment-13209490
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

bq. But the question is: is it safe to use CAS/AE after you call 
release()/destroy() on them?

no it isn't, so you're right: those methods should not be inside the close() 
method.




 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_rsrel.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-15 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208393#comment-13208393
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

Ok, I noticed this was due to an issue on the UIMA side.
I think the best option (as those are used just for testing) is to use a dummy 
implementation of both UIMA based whitespace tokenizer and PoS tagger thus also 
avoiding the log lines when executing tests using Maven.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-15 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208460#comment-13208460
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

fix for the issues reported by Steven committed in r1244474

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-15 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208753#comment-13208753
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

Thanks Robert for taking care of this, nice improvement :)
I agree on the OverridingParams extending the base one, it was also my intent 
to do that.


 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-15 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208766#comment-13208766
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

bq. OK, if there is no objection I will commit this one.

+1, I'll post my progress on other possible improvements in performances I'm 
testing later.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch, LUCENE-3731_speed.patch, 
 LUCENE-3731_speed.patch, LUCENE-3731_speed.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-14 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208076#comment-13208076
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

committed on trunk in r1244236

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-14 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208145#comment-13208145
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

Thank you very much Steven for reporting.

The 
{noformat}
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer initialize
INFO: Whitespace tokenizer successfully initialized
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer typeSystemInit
INFO: Whitespace tokenizer typesystem initialized
{noformat}

messages are due to UIMA WhitespaceTokenizer Annotator which logs the 
initialization/processing/etc. calls.
That is printed out many times because the testRandomStrings test method just 
does lots of tricky tests on the UIMATokenizer which require the above calls to 
be executed repeatedly.

I'll take a look to the other failures which didn't show up on the tests I had 
done till now.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch, LUCENE-3731_4.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-13 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13206997#comment-13206997
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

bq. Hi Tommaso, I think it would be cleaner to set the final offset in end() 
instead?

ok, +1.

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch, 
 LUCENE-3731_3.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported

2012-02-04 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13200362#comment-13200362
 ] 

Tommaso Teofili commented on SOLR-3049:
---

Hi Harsh, I think there should be a more general way of mapping typed 
parameters, just need to dig a little deeper to find it.
However in the meantime I'll try and test your patch, thanks!

 UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types 
 supported
 -

 Key: SOLR-3049
 URL: https://issues.apache.org/jira/browse/SOLR-3049
 Project: Solr
  Issue Type: Bug
  Components: update
Reporter: Harsh P
Priority: Minor
  Labels: uima, update_request_handler
 Attachments: SOLR-3049.patch


 solrconfig.xml file has an option to override certain UIMA runtime
 parameters in the UpdateRequestProcessorChain section.
 There are certain UIMA annotators like RegexAnnotator which define
 runtimeParameters value as an Array which is not currently supported
 in the Solr-UIMA interface.
 In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java,
 private Object getRuntimeValue(AnalysisEngineDescription desc, String
 attributeName) function defines override for UIMA analysis engine
 runtimeParameters as they are passed to UIMA Analysis Engine.
 runtimeParameters which are currently supported in the Solr-UIMA interface 
 are:
  String
  Integer
  Boolean
  Float
 I have made a hack to fix this issue to add Array support. I would
 like to submit that as a patch if no one else is working on fixing
 this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3744) Add support for type whitelist in TypeTokenFilter

2012-02-03 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199603#comment-13199603
 ] 

Tommaso Teofili commented on LUCENE-3744:
-

applied on trunk r1240034
appliend on branch-3.x r1240035

 Add support for type whitelist in TypeTokenFilter
 -

 Key: LUCENE-3744
 URL: https://issues.apache.org/jira/browse/LUCENE-3744
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Santiago M. Mola
Assignee: Tommaso Teofili
Priority: Trivial
 Attachments: LUCENE-3744_2.patch, TypeTokenFilter-whitelist.patch, 
 TypeTokenFilter_whitelst_lucene_and_solr.patch


 A usual use case for TypeTokenFilter is allowing only a set of token types. 
 That is, listing allowed types, instead of filtered ones. I'm attaching a 
 patch to add a useWhitelist option for that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-03 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199771#comment-13199771
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

right Uwe, thanks so much for the quick review :)

 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3093) Remove unused features boolTofilterOptimizer and HashDocSet

2012-02-03 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199784#comment-13199784
 ] 

Tommaso Teofili commented on SOLR-3093:
---

bq. There is some code which tries to use it but I believe that since 1.4 there 
are more efficient ways to do the same. Should we also fail-fast if found in 
config or only print a warning?

IMHO we should print a warning for 3.x and fail fast from 4 on.

 Remove unused features boolTofilterOptimizer and HashDocSet
 ---

 Key: SOLR-3093
 URL: https://issues.apache.org/jira/browse/SOLR-3093
 Project: Solr
  Issue Type: Improvement
Reporter: Jan Høydahl
 Fix For: 3.6, 4.0


 SolrConfig.java still tries to parse boolTofilterOptimizer
 But the only user of this param was SolrIndexSearcher.java line 366-381 which 
 is commented out.
 Probably the whole logic should be ripped out, and we fail hard if we find 
 this config option in solrconfig.xml
 Also, the HashDocSet config option is old and no longer used or needed? 
 There is some code which tries to use it but I believe that since 1.4 there 
 are more efficient ways to do the same. Should we also fail-fast if found in 
 config or only print a warning?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3731) Create a analysis/uima module for UIMA based tokenizers/analyzers

2012-02-03 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199893#comment-13199893
 ] 

Tommaso Teofili commented on LUCENE-3731:
-

Hey Robert, that's super, thanks! I'm going to collect your suggestions in a 
new patch shortly.


 Create a analysis/uima module for UIMA based tokenizers/analyzers
 -

 Key: LUCENE-3731
 URL: https://issues.apache.org/jira/browse/LUCENE-3731
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3731.patch


 As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored 
 out in a separate module (modules/analysis/uima) as they can be used in plain 
 Lucene. Then the solr/contrib/uima will contain only the related factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3744) Add support for type whitelist in TypeTokenFilter

2012-02-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13197834#comment-13197834
 ] 

Tommaso Teofili commented on LUCENE-3744:
-

Hello Santiago,
would you mind also providing unit tests for the whitelist usage?


 Add support for type whitelist in TypeTokenFilter
 -

 Key: LUCENE-3744
 URL: https://issues.apache.org/jira/browse/LUCENE-3744
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Santiago M. Mola
Priority: Trivial
 Attachments: TypeTokenFilter-whitelist.patch


 A usual use case for TypeTokenFilter is allowing only a set of token types. 
 That is, listing allowed types, instead of filtered ones. I'm attaching a 
 patch to add a useWhitelist option for that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-31 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13196999#comment-13196999
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Considering the needed refactoring to put the tokenizers/analyzers in a 
dedicated Lucene analysis module I think the 'ae' package for creating 
AnalysisEngines should be moved to that module as well, so that there is a 
common mechanism for instantiating AnalysisEngines both in Lucene and Solr.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-30 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195986#comment-13195986
 ] 

Tommaso Teofili commented on SOLR-3013:
---

Chris, Robert, thanks for your comments, I'll integrate your suggestions in a 
new patch.
I agree with the module proposal as this was part of a following 
issue/discussion I'd be going to raise.
Maybe I can create a new issue in Lucene for creating a new module under 
modules/analysis/uima containing just the Lucene UIMA tokenizers and then 
create a new patch for this one which contains only the factories.


 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3049) UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types supported

2012-01-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195719#comment-13195719
 ] 

Tommaso Teofili commented on SOLR-3049:
---

Good catch, if you could provide that patch I will take care of review and 
commit it if that is ok.


 UpdateRequestProcessorChain for UIMA : runtimeParameters: not all types 
 supported
 -

 Key: SOLR-3049
 URL: https://issues.apache.org/jira/browse/SOLR-3049
 Project: Solr
  Issue Type: Bug
  Components: update
Reporter: Harsh P
Priority: Minor
  Labels: uima, update_request_handler

 solrconfig.xml file has an option to override certain UIMA runtime
 parameters in the UpdateRequestProcessorChain section.
 There are certain UIMA annotators like RegexAnnotator which define
 runtimeParameters value as an Array which is not currently supported
 in the Solr-UIMA interface.
 In java/org/apache/solr/uima/processor/ae/OverridingParamsAEProvider.java,
 private Object getRuntimeValue(AnalysisEngineDescription desc, String
 attributeName) function defines override for UIMA analysis engine
 runtimeParameters as they are passed to UIMA Analysis Engine.
 runtimeParameters which are currently supported in the Solr-UIMA interface 
 are:
  String
  Integer
  Boolean
  Float
 I have made a hack to fix this issue to add Array support. I would
 like to submit that as a patch if no one else is working on fixing
 this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-01-29 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195861#comment-13195861
 ] 

Tommaso Teofili commented on SOLR-3013:
---

If no one objects I'll commit this shortly.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3054) Add a TypeTokenFilterFactory

2012-01-22 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190643#comment-13190643
 ] 

Tommaso Teofili commented on SOLR-3054:
---

bq. I looked up the other TokenFilters that filter tokens, unfortunately all of 
then default to enablePosIncr=false. I am not sure what the right solution is 
here? Consistency or correctness?

in the first patch I went for consistency but then your comment made me realize 
the enablePosIncr should be true by default. I mean, as a user I'd expect it to 
be true by default.

bq. I would only remove the try-catch blocks in the test methods and let the 
test method declare the exception. It then gets reported by JUnit with a 
failure automatically.

ok

bq. The question is, the wordset is initialized to be empty if missing. Does it 
make sense? I would maybe make the types file mandatory, as without the filter 
makes no sense.

right, need to fix that

 Add a TypeTokenFilterFactory
 

 Key: SOLR-3054
 URL: https://issues.apache.org/jira/browse/SOLR-3054
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Tommaso Teofili
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3054.patch, SOLR-3054_2.patch


 Create a TypeTokenFilterFactory to make the TypeTokenFilter (filtering tokens 
 depending on token types, see LUCENE-3671) available in Solr too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3054) Add a TypeTokenFilterFactory

2012-01-22 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190749#comment-13190749
 ] 

Tommaso Teofili commented on SOLR-3054:
---

Thanks Uwe

 Add a TypeTokenFilterFactory
 

 Key: SOLR-3054
 URL: https://issues.apache.org/jira/browse/SOLR-3054
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Reporter: Tommaso Teofili
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3054.patch, SOLR-3054_2.patch, SOLR-3054_3.patch


 Create a TypeTokenFilterFactory to make the TypeTokenFilter (filtering tokens 
 depending on token types, see LUCENE-3671) available in Solr too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter

2012-01-21 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190425#comment-13190425
 ] 

Tommaso Teofili commented on LUCENE-3671:
-

Thanks Uwe for taking care of it :)

 Add a TypeTokenFilter
 -

 Key: LUCENE-3671
 URL: https://issues.apache.org/jira/browse/LUCENE-3671
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/queryparser
Reporter: Santiago M. Mola
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, 
 LUCENE-3671_3.patch


 It would be convenient to have a TypeTokenFilter that filters tokens by its 
 type, either with an exclude or include list. This might be a stupid thing to 
 provide for people who use Lucene directly, but it would be very useful to 
 later expose it to Solr and other Lucene-backed search solutions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter

2012-01-21 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13190544#comment-13190544
 ] 

Tommaso Teofili commented on LUCENE-3671:
-

Sure Uwe, I'll open a new one for the related Solr factory

 Add a TypeTokenFilter
 -

 Key: LUCENE-3671
 URL: https://issues.apache.org/jira/browse/LUCENE-3671
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/queryparser
Reporter: Santiago M. Mola
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3671.patch, LUCENE-3671_2.patch, 
 LUCENE-3671_3.patch


 It would be convenient to have a TypeTokenFilter that filters tokens by its 
 type, either with an exclude or include list. This might be a stupid thing to 
 provide for people who use Lucene directly, but it would be very useful to 
 later expose it to Solr and other Lucene-backed search solutions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3671) Add a TypeTokenFilter

2012-01-19 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13189206#comment-13189206
 ] 

Tommaso Teofili commented on LUCENE-3671:
-

A very basic TypeTokenFilter can be implemented extending a 
FilteringTokenFilter where the accept() method checks on a stopType set for the 
typeAttribute.type() matching.

 Add a TypeTokenFilter
 -

 Key: LUCENE-3671
 URL: https://issues.apache.org/jira/browse/LUCENE-3671
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/queryparser
Reporter: Santiago M. Mola
 Attachments: LUCENE-3671.patch


 It would be convenient to have a TypeTokenFilter that filters tokens by its 
 type, either with an exclude or include list. This might be a stupid thing to 
 provide for people who use Lucene directly, but it would be very useful to 
 later expose it to Solr and other Lucene-backed search solutions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-01-17 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187733#comment-13187733
 ] 

Tommaso Teofili commented on SOLR-2983:
---

it looks like it's a cyclic dependency problem since the SolrIndexWriter uses 
the SolrIndexConfig.toIndexWriterConfig method to create an IndexWriterConfig 
which is used to call the basic Lucene IndexWriter constructor while at the 
same time the SolrIndexConfig.toIndexWriterConfig may need an IndexWriter to 
instantiate the MergePolicy (try clause).

 Unable to load custom MergePolicy
 -

 Key: SOLR-2983
 URL: https://issues.apache.org/jira/browse/SOLR-2983
 Project: Solr
  Issue Type: Bug
Reporter: Mathias Herberts
Priority: Critical
 Fix For: 3.6, 4.0


 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
 our use of LinkedIn's ZoieMergePolicy.
 It seems the code that loads a custom MergePolicy was at some point moved 
 into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
 copied verbatim it now contains a bug:
 try {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName, null, new 
 Class[]{IndexWriter.class}, new Object[]{this});
 } catch (Exception e) {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName);
 }
 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
 to newInstance will always throw an exception and the catch clause will be 
 executed. If the custom MergePolicy does not have a default constructor 
 (which is the case of ZoieMergePolicy), the second attempt to create the 
 MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2983) Unable to load custom MergePolicy

2012-01-17 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187762#comment-13187762
 ] 

Tommaso Teofili commented on SOLR-2983:
---

that probably depends on migrations from old APIs, however, apart from that, I 
agree the setter (and SetOnce) facilities are the best way to inject an IW in 
the mergePolicy.
Therefore the above try/catch clause has little meaning and IMHO it may be 
better to just keep the policy instantiation like this:

MergePolicy policy = (MergePolicy) 
schema.getResourceLoader().newInstance(mpClassName);



 Unable to load custom MergePolicy
 -

 Key: SOLR-2983
 URL: https://issues.apache.org/jira/browse/SOLR-2983
 Project: Solr
  Issue Type: Bug
Reporter: Mathias Herberts
Priority: Critical
 Fix For: 3.6, 4.0


 As part of a recent upgrade to Solr 3.5.0 we encountered an error related to 
 our use of LinkedIn's ZoieMergePolicy.
 It seems the code that loads a custom MergePolicy was at some point moved 
 into SolrIndexConfig.java from SolrIndexWriter.java, but as this code was 
 copied verbatim it now contains a bug:
 try {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName, null, new 
 Class[]{IndexWriter.class}, new Object[]{this});
 } catch (Exception e) {
   policy = (MergePolicy) 
 schema.getResourceLoader().newInstance(mpClassName);
 }
 'this' is no longer an IndexWriter but a SolrIndexConfig, therefore the call 
 to newInstance will always throw an exception and the catch clause will be 
 executed. If the custom MergePolicy does not have a default constructor 
 (which is the case of ZoieMergePolicy), the second attempt to create the 
 MergePolicy will also fail and Solr won't start.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org