[jira] Commented: (LUCENE-1992) intermittent failure in TestIndexWriter. testExceptionDuringSync

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767223#action_12767223
 ] 

Uwe Schindler commented on LUCENE-1992:
---

From the patch:
{code}
-// We expect sync exceptions in the merge threads
-cms.setSuppressExceptions();
{code}

Should this also applied to bw branch (2.4 for) 2.9 and (2.9 for) 3.0? I do not 
know what this call really does and what effect it has.

 intermittent failure in TestIndexWriter. testExceptionDuringSync 
 -

 Key: LUCENE-1992
 URL: https://issues.apache.org/jira/browse/LUCENE-1992
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1992.patch


 {code}
 common.test:
 [mkdir] Created dir: C:\Projects\lucene\trunk-full1\build\test
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Tests run: 102, Failures: 0, Errors: 1, Time elapsed: 100,297sec
 [junit]
 [junit] Testcase: 
 testExceptionDuringSync(org.apache.lucene.index.TestIndexWriter): Caused an 
 ERROR
 [junit] _a.fnm
 [junit] java.io.FileNotFoundException: _a.fnm
 [junit] at 
 org.apache.lucene.store.MockRAMDirectory.openInput(MockRAMDirectory.java:226)
 [junit] at 
 org.apache.lucene.index.FieldInfos.init(FieldInfos.java:68)
 [junit] at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
 [junit] at 
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:620)
 [junit] at 
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:590)
 [junit] at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104)
 [junit] at 
 org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27)
 [junit] at 
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
 [junit] at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
 [junit] at 
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
 [junit] at 
 org.apache.lucene.index.IndexReader.open(IndexReader.java:307)
 [junit] at 
 org.apache.lucene.index.IndexReader.open(IndexReader.java:193)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testExceptionDuringSync(TestIndexWriter.java:2723)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206)
 [junit]
 [junit]
 [junit] Test org.apache.lucene.index.TestIndexWriter FAILED
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767225#action_12767225
 ] 

Uwe Schindler commented on LUCENE-1257:
---

bq. I did not touch StopFilter or StopAnalyzer due to some mixed CharArraySet / 
SetString usage... any ideas on this one Uwe? 

I am hanging on that, too. See also LUCENE-1987 and LUCENE-1989. As this set 
needs no type safety (when it is implemented by CharArraySet) it does not 
matter if the contains methods uses char[] or String or even Object. It always 
compares the string representation of the tested value. As CharArraySet is 
defined as SetObject, we should define all these as SetObject in 
StopFilter. Or declare them as CharArraySet and convert the anonyous Set? to 
CharArraySet in the ctor (I would prefer this).

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, java5.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, o.a.l.analysis.patch, 
 shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1257:
--

Attachment: (was: java5.patch)

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, o.a.l.analysis.patch, 
 shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1257:
--

Attachment: (was: o.a.l.analysis.patch)

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767240#action_12767240
 ] 

Uwe Schindler commented on LUCENE-1257:
---

I removed some unneeded patches.

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767241#action_12767241
 ] 

Uwe Schindler commented on LUCENE-1257:
---

Comitted:
   LUCENE-1257-CloseableThreadLocal.patch 2009-10-18 06:31 PM Kay Kay 4 kB 
   LUCENE-1257_analysis.patch 2009-10-18 05:41 PM Robert Muir 8 kB 

At revision: 826601

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767242#action_12767242
 ] 

Uwe Schindler commented on LUCENE-1257:
---

One note: I do not want to apply any test-related generics patches, as it makes 
it harder to port patches to the backwards branch currently.
As soon as all deprecations are removed, we could start with fixing the tests. 
Before removing all deprecations it may often be needed to also apply changes 
to the backwards branch, which is Java 1.4 for backwards testing with 2.9.

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1992) intermittent failure in TestIndexWriter. testExceptionDuringSync

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767258#action_12767258
 ] 

Michael McCandless commented on LUCENE-1992:


bq. Should this also applied to bw branch (2.4 for) 2.9 and (2.9 for) 3.0? 

No, it can only be applied on trunk.

That call tells ConcurrentMergeScheduler to expect exceptions during this test, 
which when autoCommit is true (which this test is doing everywhere except 
trunk) will happen because when a merge completes, it'll commit and call 
Directory.sync which throws the intentional exception.

 intermittent failure in TestIndexWriter. testExceptionDuringSync 
 -

 Key: LUCENE-1992
 URL: https://issues.apache.org/jira/browse/LUCENE-1992
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1992.patch


 {code}
 common.test:
 [mkdir] Created dir: C:\Projects\lucene\trunk-full1\build\test
 [junit] Testsuite: org.apache.lucene.index.TestIndexWriter
 [junit] Tests run: 102, Failures: 0, Errors: 1, Time elapsed: 100,297sec
 [junit]
 [junit] Testcase: 
 testExceptionDuringSync(org.apache.lucene.index.TestIndexWriter): Caused an 
 ERROR
 [junit] _a.fnm
 [junit] java.io.FileNotFoundException: _a.fnm
 [junit] at 
 org.apache.lucene.store.MockRAMDirectory.openInput(MockRAMDirectory.java:226)
 [junit] at 
 org.apache.lucene.index.FieldInfos.init(FieldInfos.java:68)
 [junit] at 
 org.apache.lucene.index.SegmentReader$CoreReaders.init(SegmentReader.java:116)
 [junit] at 
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:620)
 [junit] at 
 org.apache.lucene.index.SegmentReader.get(SegmentReader.java:590)
 [junit] at 
 org.apache.lucene.index.DirectoryReader.init(DirectoryReader.java:104)
 [junit] at 
 org.apache.lucene.index.ReadOnlyDirectoryReader.init(ReadOnlyDirectoryReader.java:27)
 [junit] at 
 org.apache.lucene.index.DirectoryReader$1.doBody(DirectoryReader.java:74)
 [junit] at 
 org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:704)
 [junit] at 
 org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
 [junit] at 
 org.apache.lucene.index.IndexReader.open(IndexReader.java:307)
 [junit] at 
 org.apache.lucene.index.IndexReader.open(IndexReader.java:193)
 [junit] at 
 org.apache.lucene.index.TestIndexWriter.testExceptionDuringSync(TestIndexWriter.java:2723)
 [junit] at 
 org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:206)
 [junit]
 [junit]
 [junit] Test org.apache.lucene.index.TestIndexWriter FAILED
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter.patch

Hallo Mike,

attached is a patch with all deprecated methods removed (only the 
setOverridesTokenStream is still there, making Analyzers final is another thing 
to do).

Also StopFilter and its stopWord ets were generified (to ?, which is ok for 
every type of set, as CharArraySet uses toString() to convert everything to 
string when testing, so any set is fine)

I only had the following problems and solution is here (StandardAnalyzer):
{code}
enableStopPositionIncrements = matchVersion.onOrAfter(Version.LUCENE_29);
replaceInvalidAcronym = matchVersion.onOrAfter(Version.LUCENE_23);
{code}

The setting defaultPosIncr was removed (static method, so there is no default 
anymore). Because of that, the pre 2.9 default was false (which is now not 
changeable). So I set the posIncr to false for all older versions (this was the 
default before, but is now fixed as no static setter/sysprop anymore)

For the invalid acronyms I added LUCENE_23 version constant, so for all 
versions =2.3 it is enabled. If you want old behaviour, use LUCENE_22 or below.

Mike: Can you review this?

If you're ok with it I have to change 175 new StandardAnalyzer() occurences 
in tests :(

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767262#action_12767262
 ] 

Uwe Schindler commented on LUCENE-1987:
---

If we are fine with that, I would backport the version constants and the 
default setting to 2.9.x

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter.patch

Correct patch.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: (was: LUCENE-1987-StopFilter.patch)

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767264#action_12767264
 ] 

Michael McCandless commented on LUCENE-1987:


I'll have a look, but one thing is invalid acronym replacement should be 
enabled if version = 2.4, not = 2.3.  Ie, if version is 2.3, the bug is still 
present.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767265#action_12767265
 ] 

Uwe Schindler commented on LUCENE-1987:
---

LUCENE-1068 says: Fix version 2.3

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter.patch

Javadocs fixes.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767267#action_12767267
 ] 

Michael McCandless commented on LUCENE-1987:


Why add 2.0, 2.1. 2.2 versions?  We don't anywhere emulate bugs based on those, 
right?  Otherwise, patch looks great!  Thanks Uwe.  Nice to see 
StandardAnalyzer clean again.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767269#action_12767269
 ] 

Uwe Schindler commented on LUCENE-1987:
---

I just added also 20 and 21. I can remove them again (20 and 21).
22 is needed because the invalidAcronym thing is there in 2.2 and fixed in 2.3 
(according to LUCENE-1068).

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767270#action_12767270
 ] 

Michael McCandless commented on LUCENE-1987:


bq. LUCENE-1068 says: Fix version 2.3

Right, that bug was fixed in 2.3, however with that fix the buggy behavior was 
kept by default.  In 2.4 we then fixed the default to be true, ie, the bug 
would be fixed by default.  So if I were to specify VERSION_23, I should get 
the buggy behavior, but if I specify VERSION_24, I should get the correct 
behavior.

Going forward, when we fix a bug but need to conditionally preserve the bug for 
back compat, we should use the version switching so that by default for new 
users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is 
fixed.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Call the authorities

2009-10-19 Thread Michael McCandless
Indeed!  It's doing nothing now.  Just creating Sort objects but not
in fact doing any searching with them.  Hmm.

Unfortunately, the test very much relied on the deprecated
setUseLegacySearch API, to compare old vs new sorting.  I suppose
its time has past, given that it has had a good amount of time, now,
to assert that old and new were producing identical results.

Should we just remove it?

Mike

On Sun, Oct 18, 2009 at 11:20 PM, Mark Miller markrmil...@gmail.com wrote:
 Mark Miller wrote:
 TestStressSort has been butchered.


 I suppose we could just pull it since it wouldn't check for much any
 more - looks awful funny as is.

 --
 - Mark

 http://www.lucidimagination.com




 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter.patch

Updated patch with LUCENE_24. I did not remove the other version constants, 
because then we have them and can use them anywhere else. And a user coming 
from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The 
user should be free to give his version he used before for this backwards 
compatibility.

Mike: Should I backport the setting for 2.4 to 2.9 to enable 
plugin-replacements from 2.9.1 to 3.0?

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767273#action_12767273
 ] 

Uwe Schindler commented on LUCENE-1987:
---

bq. Going forward, when we fix a bug but need to conditionally preserve the bug 
for back compat, we should use the version switching so that by default for new 
users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is 
fixed.

Do you mean I should add the default ctor of StandardAnalyzer() and rewire it 
to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the 
standard analyzer's default ctor now no longer behaves like 2.4, but always 
uses the newest features.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767273#action_12767273
 ] 

Uwe Schindler edited comment on LUCENE-1987 at 10/19/09 3:14 AM:
-

bq. Going forward, when we fix a bug but need to conditionally preserve the bug 
for back compat, we should use the version switching so that by default for new 
users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is 
fixed.

Do you mean I should add the default ctor of StandardAnalyzer() and rewire it 
to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the 
standard analyzer's default ctor now no longer behaves like 2.4, but always 
uses the newest features.

That would help me lot with the tests

  was (Author: thetaphi):
bq. Going forward, when we fix a bug but need to conditionally preserve the 
bug for back compat, we should use the version switching so that by default for 
new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is 
fixed.

Do you mean I should add the default ctor of StandardAnalyzer() and rewire it 
to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the 
standard analyzer's default ctor now no longer behaves like 2.4, but always 
uses the newest features.
  
 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Highlighting - catering for all query types

2009-10-19 Thread mark harwood
I've been putting together some code to support highlighting of opaque query 
clauses (cached filters, trie range, spatial etc etc) which shows some promise.

This is not intended as a replacement for the existing highlighter(s) which 
deal with free-text but is instead concentrating on the hard-to-highlight 
clauses and has the benefit of working in-line with the query process.
Summarisation is not a requirement here - I simply need to know if a given 
query clause matched on a result.

The approach I have come up with is to wrap query clauses with lightweight 
(processing and RAM-wise) instrumenting objects in order to record which 
clauses matched.
The recorded matches are encoded as a byte in the document score which 
unfortunately requires some loss of precision in the scores - more on this 
later.

The general approach for use looks like this:

//Wrap *any* type of query object for highlight flagging and allocate a 
flag number between 1 and 8 for the clauses of interest
FlagRecordingQuery frqA=new FlagRecordingQuery(new TermQuery(new 
Term(statusField,published)),1);
FlagRecordingQuery frqB=new FlagRecordingQuery(new 
XyzLtd3rdPartyQuery(imageDataField, unknown magic to find 'sunset')),2);

BooleanQuery bq=new BooleanQuery();
bq.add(new BooleanClause(frqA,Occur.SHOULD));
bq.add(new BooleanClause(frqB,Occur.SHOULD));

//Parent query must be a FlagCombiningQuery to encode child match info 
in the doc scores
FlagCombiningQuery fcq=new FlagCombiningQuery(bq);

//Run search
TopDocs td = s.search(fcq,10);
ScoreDoc[] sd = td.scoreDocs;
for (ScoreDoc scoreDoc : sd)
{
float score=scoreDoc.score;

//Check to see which flags are encoded in the score.
if(FlagCombiningQuery.hasFlag(1, score))
{
System.out.println(woot! +scoreDoc.doc+ matched clause 1 );
}
if(FlagCombiningQuery.hasFlag(2, score))
{
System.out.println(woot! +scoreDoc.doc+ matched clause 2 );
}
}


The FlagRecordingQuery child clauses introduce themselves to the 
FlagCombiningQuery through a thread local at rewrite time.
The FlagCombiningQuery at the root adjusts the scores as follows:

static final float DEFAULT_MULTIPLIER=1000f;
float multiplier=DEFAULT_MULTIPLIER;

public float score() throws IOException
{
float score = delegateScorer.score();
byte flags=0;
int d=doc();
//encode all matched child clauses into a flags byte.
for (FlagRecordingQuery frq : thisThreadsFlags)
{
if(frq.matched(d))
{
byte mask=flagMasks[frq.flag-1];
flags=setFlag(flags, mask);
}
}

//Multiply score to turn float into int with sufficient fractions 
in score.
int shiftedI=(int) (score*multiplier);
//Shift int to make space for byte holding flags
int iPlusSpaceForByte=shiftedI8;
//Add match flags
int iCombinedScoreAndFlags=iPlusSpaceForByte|flags;
System.out.println(combined score=+iCombinedScoreAndFlags+ for 
doc#+doc());
return iCombinedScoreAndFlags;
}

The mechanism works but relies on original score values that :
a) Are not too big - i.e. do not lose significant digits when multiplied by 
multiplier and then shifted left 8 bits.
b) Are not too similar - i.e. only differ in very small fractions e.g. all 
scores occur in the range 0.1234 to 0.1235

To give an indication of restrictions this imposes here are the usable score 
ranges for various settings of multiplier:

multiplier   max score   fraction precision
==      =
10   838860 0.x
100 83886  0.xx
1000   8388 0.xxx
1 838   0.

I would imagine the majority of Lucene query results would still rank sensibly 
with a 1,000 or 10,000 multiplier.

However, all this potentially dangerous bit twiddling could of course be 
avoided if the Lucene search API was expanded to include docid, score AND a 
completely seperate field for recording match flags. 


Thoughts?




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767299#action_12767299
 ] 

Michael McCandless commented on LUCENE-1987:


bq. I did not remove the other version constants, because then we have them and 
can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use 
LUCENE_22 to match his old behaviour. The user should be free to give his 
version he used before for this backwards compatibility.

OK I think that's reasonable.

bq. Mike: Should I backport the setting for 2.4 to 2.9 to enable 
plugin-replacements from 2.9.1 to 3.0?

+1

{quote}
bq. Going forward, when we fix a bug but need to conditionally preserve the bug 
for back compat, we should use the version switching so that by default for new 
users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is 
fixed.

Do you mean I should add the default ctor of StandardAnalyzer() and rewire it 
to LUCENE_CURRENT?
{quote}

Sorry, I wasn't clear...

No -- I don't think we should ever have a ctor that defaults to LUCENE_CURRENT. 
 That's a back compat trap (and it just gets us back to where we started when 
we had no explicit version).  Users must be explicit about which version they 
want.

What I meant was: when fixing some sneaky bug in the future, we should never 
set the default so that the bug is still present (as we did on the first go of 
invalid acronyms), expecting new users to realize they have to go out of 
their way to tell Lucene not to emulate the bug.  Instead, the default going 
forward (if version = next-release-version) should be the bug is fixed.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-19 Thread Christian Steinert (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Steinert updated LUCENE-1993:
---

Attachment: MoreLikeThis.java.patch

suggested patch against current SVN head

 MoreLikeThis - allow to exclude terms that appear in too many documents 
 (patch included)
 

 Key: LUCENE-1993
 URL: https://issues.apache.org/jira/browse/LUCENE-1993
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Christian Steinert
 Attachments: MoreLikeThis.java.patch

   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 The MoreLikeThis class allows to generate a likeness query based on a given 
 document. So far, it is impossible to suppress words from the likeness query, 
 that appear in almost all documents, making it necessary to use extensive 
 lists of stop words.
 Therefore I suggest to allow excluding words for which a certain absolute 
 document count or a certain percentage of documents is exceeded. Depending on 
 the corpus of text, words that appear in more than 50 or even 70% of 
 documents can usually be considered insignificant for classifying a document. 
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1993) MoreLikeThis - allow to exclude terms that appear in too many documents (patch included)

2009-10-19 Thread Christian Steinert (JIRA)
MoreLikeThis - allow to exclude terms that appear in too many documents (patch 
included)


 Key: LUCENE-1993
 URL: https://issues.apache.org/jira/browse/LUCENE-1993
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/*
Affects Versions: 2.9
Reporter: Christian Steinert
 Attachments: MoreLikeThis.java.patch

The MoreLikeThis class allows to generate a likeness query based on a given 
document. So far, it is impossible to suppress words from the likeness query, 
that appear in almost all documents, making it necessary to use extensive lists 
of stop words.

Therefore I suggest to allow excluding words for which a certain absolute 
document count or a certain percentage of documents is exceeded. Depending on 
the corpus of text, words that appear in more than 50 or even 70% of documents 
can usually be considered insignificant for classifying a document.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767304#action_12767304
 ] 

Uwe Schindler commented on LUCENE-1987:
---

OK, I fix the tests using find/grep/sed :-)

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987.patch, LUCENE-1987.patch, LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1994) EnwikiConentSource does not work with parallel tasks

2009-10-19 Thread Mark Miller (JIRA)
EnwikiConentSource does not work with parallel tasks


 Key: LUCENE-1994
 URL: https://issues.apache.org/jira/browse/LUCENE-1994
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1994) EnwikiConentSource does not work with parallel tasks

2009-10-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767335#action_12767335
 ] 

Shai Erera commented on LUCENE-1994:


I believe this was the original behavior of EnwikiDocMaker. But anyway, I think 
that if getNextDocData will be synchronized, this should do it?

 EnwikiConentSource does not work with parallel tasks
 

 Key: LUCENE-1994
 URL: https://issues.apache.org/jira/browse/LUCENE-1994
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1994) EnwikiConentSource does not work with parallel tasks

2009-10-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767341#action_12767341
 ] 

Mark Miller commented on LUCENE-1994:
-

bq. I believe this was the original behavior of EnwikiDocMaker

Probably - but we should make it work right?

bq. But anyway, I think that if getNextDocData will be synchronized, this 
should do it?

Thats actually what I did locally as a quick fix - seems to work out alright.

 EnwikiConentSource does not work with parallel tasks
 

 Key: LUCENE-1994
 URL: https://issues.apache.org/jira/browse/LUCENE-1994
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1994) EnwikiConentSource does not work with parallel tasks

2009-10-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767343#action_12767343
 ] 

Shai Erera commented on LUCENE-1994:


Yes I agree (to both comments). Basically for a ContentSource to be supported 
by parallel tasks, its getNextDocData should be made synchronized, or it finds 
another way to sync on the important stuff (for example TrecContentSource).

 EnwikiConentSource does not work with parallel tasks
 

 Key: LUCENE-1994
 URL: https://issues.apache.org/jira/browse/LUCENE-1994
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: 2.9.1

2009-10-19 Thread Yonik Seeley
On Wed, Oct 14, 2009 at 5:39 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 I can cut the 2.9.1 release, but... should we wait a bit to see
 whether other issues come up?  Or do it, now?

Other issues came up, and were quickly fixed - nice job guys!.
I don't see anything else serious lurking about... seems like the
2.9.1 release process could be started soon?

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: 2.9.1

2009-10-19 Thread Michael McCandless
On Mon, Oct 19, 2009 at 11:54 AM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, Oct 14, 2009 at 5:39 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 I can cut the 2.9.1 release, but... should we wait a bit to see
 whether other issues come up?  Or do it, now?

 Other issues came up, and were quickly fixed - nice job guys!.
 I don't see anything else serious lurking about... seems like the
 2.9.1 release process could be started soon?

+1, I'll try to get an RC out tomorrow.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Commented: (LUCENE-1994) EnwikiConentSource does not work with parallel tasks

2009-10-19 Thread Mark Miller
I don't think some of the stat tracking works right with parallel either
- to get the total time, its adding up when each thread finished - eg if
thread one finishes at second 30 and thread2 at second 32, its saying it
took 62 seconds total.

   [java]  algorithm:
 [java] Seq {
 [java] Rounds_2 {
 [java] ResetSystemErase
 [java] Populate {
 [java] CreateIndex
 [java] Par_8 [
 [java] MAddDocs_2500 {
 [java] AddDoc
 [java] } * 2500
 [java] ] * 8
 [java] Optimize
 [java] CommitIndex
 [java] CloseIndex
 [java] }
 [java] RepSumByPref MAddDocs
 [java] NewRound
 [java] } * 2
 [java] RepSumByNameRound
 [java] RepSumByName
 [java] RepSumByPrefRound MAddDocs
 [java] }
 [java]  starting task: Seq
 [java]  starting task: Rounds_2
 [java]  starting task: ResetSystemErase
 [java]  starting task: Populate
 [java] 55.84 sec -- Thread-2 added 2000 docs
 [java] 60.94 sec -- Thread-6 added 2000 docs
 [java] 74.82 sec -- Thread-0 added 2000 docs
 [java] 77.48 sec -- Thread-3 added 2000 docs
 [java] 81.21 sec -- Thread-1 added 2000 docs
 [java] 90.72 sec -- Thread-5 added 2000 docs
 [java] 96.46 sec -- Thread-7 added 2000 docs
 [java] 97.17 sec -- Thread-4 added 2000 docs
 [java]  Report Sum By Prefix (MAddDocs) (1 about 8 out
of 20016)
 [java] Operation round mrg flush cmpnd   runCnt  
recsPerRunrec/s  elapsedSecavgUsedMemavgTotalMem
 [java] MAddDocs_2500 0  20 48.00 false8
250028.01  713.99   135,359,120273,850,368

Shai Erera (JIRA) wrote:
 [ 
 https://issues.apache.org/jira/browse/LUCENE-1994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767343#action_12767343
  ] 

 Shai Erera commented on LUCENE-1994:
 

 Yes I agree (to both comments). Basically for a ContentSource to be supported 
 by parallel tasks, its getNextDocData should be made synchronized, or it 
 finds another way to sync on the important stuff (for example 
 TrecContentSource).

   
 EnwikiConentSource does not work with parallel tasks
 

 Key: LUCENE-1994
 URL: https://issues.apache.org/jira/browse/LUCENE-1994
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Affects Versions: 2.9
Reporter: Mark Miller
Priority: Minor

 


   


-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1986) NPE in NearSpansUnordered from PayloadNearQuery

2009-10-19 Thread Peter Keegan (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767373#action_12767373
 ] 

Peter Keegan commented on LUCENE-1986:
--

+  if (!more) {
+return false;
+  }
I was about to submit this same patch today, but I see you beat me to it :) 
Thanks Mark.

 NPE in NearSpansUnordered from PayloadNearQuery
 ---

 Key: LUCENE-1986
 URL: https://issues.apache.org/jira/browse/LUCENE-1986
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 2.9
Reporter: Peter Keegan
Assignee: Michael McCandless
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1986.patch, LUCENE-1986.patch, 
 TestPayloadNearQuery1.java


 The following query causes a NPE in NearSpansUnordered, and is reproducible 
 with the the attached unit test. The failure occurs on the last document 
 scored.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1955) Fix Hits deprecation notice

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767391#action_12767391
 ] 

Michael McCandless commented on LUCENE-1955:


Mark do you want to commit this?  Or I can.  Wanting to cut an RC tomorrow...

 Fix Hits deprecation notice
 ---

 Key: LUCENE-1955
 URL: https://issues.apache.org/jira/browse/LUCENE-1955
 Project: Lucene - Java
  Issue Type: Bug
  Components: Javadocs
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9.1


 Just needs to be committed to 2.9 branch since hits is now removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1929) Highlighter doesn't support NumericRangeQuery or deprecated RangeQuery

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767392#action_12767392
 ] 

Michael McCandless commented on LUCENE-1929:


Mark is this one reading to go into 2.9.1?

 Highlighter doesn't support NumericRangeQuery or deprecated RangeQuery
 --

 Key: LUCENE-1929
 URL: https://issues.apache.org/jira/browse/LUCENE-1929
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 2.9.1

 Attachments: LUCENE-1929.patch


 Sucks. Will throw a NullPointer exception. 
 Only NumericRangeQuery will throw the exception.
 RangeQuery just won't highlight.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1929) Highlighter doesn't support NumericRangeQuery or deprecated RangeQuery

2009-10-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767394#action_12767394
 ] 

Mark Miller commented on LUCENE-1929:
-

Yeah - sorry - has been for some time. I can commit it shortly.

 Highlighter doesn't support NumericRangeQuery or deprecated RangeQuery
 --

 Key: LUCENE-1929
 URL: https://issues.apache.org/jira/browse/LUCENE-1929
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 2.9.1

 Attachments: LUCENE-1929.patch


 Sucks. Will throw a NullPointer exception. 
 Only NumericRangeQuery will throw the exception.
 RangeQuery just won't highlight.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1955) Fix Hits deprecation notice

2009-10-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767397#action_12767397
 ] 

Mark Miller commented on LUCENE-1955:
-

Sorry again ;) I'm slowing everything up - feel free - if you don't, I'll do it 
when I commit the Highlighter fix in a bit. Just have to throw my noisy laptop 
out the window and into a brick wall first ...

 Fix Hits deprecation notice
 ---

 Key: LUCENE-1955
 URL: https://issues.apache.org/jira/browse/LUCENE-1955
 Project: Lucene - Java
  Issue Type: Bug
  Components: Javadocs
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9.1


 Just needs to be committed to 2.9 branch since hits is now removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Attachment: LUCENE-1987-StopFilter-backport29.patch
LUCENE-1987-StopFilter-BW.patch
LUCENE-1987-StopFilter.patch

Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1):

All core tests pass, all bw tests pass. Most contrib tests also pass, but we 
have the following problems and inconsistencies:

- benchmark does not work any longer, because StandardAnalyzer has no default 
ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer
- Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode 
(current) it fails because the position increments of stop words are not 
correctly respected. This fails in addition/combination with the following:
- Very bad inconsistency: The default of QueryParser is to ignore position 
increments, but the current version of StandardAnalyzer uses posIncr for stop 
words - bäng. We should change the default for QueryParser(+ contrib QP), too. 
There is march rework needed and much documentation. The tests in core now 
pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. 
And the special tests explicitely set the posIncr flag. This is totally 
disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new 
StandardAnalyzer with LUCENE_CURRENT). 
- XMLQueryParser also fails with latest StandardAnalyzer version, because it 
cannot set the flag in QueryParser. In my opinion, the query parser should take 
the flag from the analyzer, but this is not easy to fix.
- All contrib analyzers have stopWordPosIncr turned off (backwards 
compatibility). Maybe we need a Version Parameter in all analyzers there too!

What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and 
Paracetamol and beer is not enough to think clear again...

And please: next time when we deprecate APIs: remove all deprecated calls from 
tests and contrib and mark all deprecated-test as such!

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: 2.9.1

2009-10-19 Thread Uwe Schindler
Please wait and look at https://issues.apache.org/jira/browse/LUCENE-1987

We have some inconsistencies between QueryParser and the new
StandardAnalyzer with stop word posIncr.

There is also a patch for 2.9 there!

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Monday, October 19, 2009 6:03 PM
 To: java-dev@lucene.apache.org; yo...@lucidimagination.com
 Subject: Re: 2.9.1
 
 On Mon, Oct 19, 2009 at 11:54 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
  On Wed, Oct 14, 2009 at 5:39 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
  I can cut the 2.9.1 release, but... should we wait a bit to see
  whether other issues come up?  Or do it, now?
 
  Other issues came up, and were quickly fixed - nice job guys!.
  I don't see anything else serious lurking about... seems like the
  2.9.1 release process could be started soon?
 
 +1, I'll try to get an RC out tomorrow.
 
 Mike
 
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1987:
--

Fix Version/s: 2.9.1

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767435#action_12767435
 ] 

Robert Muir commented on LUCENE-1987:
-

bq. All contrib analyzers have stopWordPosIncr turned off (backwards 
compatibility). Maybe we need a Version Parameter in all analyzers there too! 

Personally I would not be against this, not sure yet... downside would be more 
complexity and maintenance
Upside would be that we could improve these analyzers in various ways, without 
annoying users

bq. benchmark does not work any longer, because StandardAnalyzer has no default 
ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer 
I also personally like having default ctor... its convienient and nice to be 
able to look at what these analyzers do in Luke, etc
But I think this goes against the version flag concept? (because if users just 
set it to LUCENE_CURRENT then its doing nothing?)
But I wonder if users do this anyway... maybe the default should really be 
LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on 
you as the user to set the flag right if you don't want to reindex?



 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767449#action_12767449
 ] 

Michael McCandless commented on LUCENE-1987:


bq. All contrib analyzers have stopWordPosIncr turned off (backwards 
compatibility). Maybe we need a Version Parameter in all analyzers there too!

Ugh, this is because they embed StopFilter, right?  One option might be to 
simply keep StopFilter's deprecated static methods for setting the default?  
Though I think adding Version to them over time is the right thing to do 
(though more work, today).

bq. benchmark does not work any longer, because StandardAnalyzer has no default 
ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer

When the no-arg ctor is unavailable, can we fallback to looking for a ctor that 
takes Version?  For now we should just pass LUCENE_CURRENT; a future 
enhancement to benchmark can allow specifying version compat.

bq. The default of QueryParser is to ignore position increments, but the 
current version of StandardAnalyzer uses posIncr for stop words

Hmm.  How about adding Version to QP ctor?

bq. And please: next time when we deprecate APIs: remove all deprecated calls 
from tests and contrib and mark all deprecated-test as such!

OK, I agree.  I'll try to do this in the future!


 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: 2.9.1

2009-10-19 Thread Michael McCandless
OK, so now we're up to 3 2.9.1 issues to be resolved.

Mike

On Mon, Oct 19, 2009 at 1:56 PM, Uwe Schindler u...@thetaphi.de wrote:
 Please wait and look at https://issues.apache.org/jira/browse/LUCENE-1987

 We have some inconsistencies between QueryParser and the new
 StandardAnalyzer with stop word posIncr.

 There is also a patch for 2.9 there!

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Monday, October 19, 2009 6:03 PM
 To: java-dev@lucene.apache.org; yo...@lucidimagination.com
 Subject: Re: 2.9.1

 On Mon, Oct 19, 2009 at 11:54 AM, Yonik Seeley
 yo...@lucidimagination.com wrote:
  On Wed, Oct 14, 2009 at 5:39 PM, Michael McCandless
  luc...@mikemccandless.com wrote:
  I can cut the 2.9.1 release, but... should we wait a bit to see
  whether other issues come up?  Or do it, now?
 
  Other issues came up, and were quickly fixed - nice job guys!.
  I don't see anything else serious lurking about... seems like the
  2.9.1 release process could be started soon?

 +1, I'll try to get an RC out tomorrow.

 Mike

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767450#action_12767450
 ] 

Michael McCandless commented on LUCENE-1987:


bq. maybe the default should really be LUCENE_CURRENT, and if you want the back 
compat-buggy behavior, the onus is on you as the user to set the flag right if 
you don't want to reindex?

The problem is that this is not very different from saying the onus is on the 
user to call the setXYZ method to get back  to the old buggy behavior, which 
at least last time we discussed back-compat was controversial (ie, it's a 
change to our drop-in back-compat policy).

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767453#action_12767453
 ] 

Robert Muir commented on LUCENE-1987:
-

bq. Ugh, this is because they embed StopFilter, right? One option might be to 
simply keep StopFilter's deprecated static methods for setting the default? 
Though I think adding Version to them over time is the right thing to do 
(though more work, today).

not just this. Many use StandardTokenizer, so they have same invalid acronym, 
etc issues StandardAnalyzer has. But, this versioning/etc is all managed at 
StandardAnalyzer level (system properties, version numbers, etc)... when it 
also affects these other analyzers too.

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Mark Miller
Uwe Schindler (JIRA) wrote:

 And please: next time when we deprecate APIs: remove all deprecated calls 
 from tests and contrib and mark all deprecated-test as such!

   
Its the nature of open source. Each of us takes the work that other
contributors are willing/able/havetime to provide - and fill in the rest
ourselves or decide its too much work and don't. I agree that its a nice
idea, but I don't think the issue is going away so easily myself ;) In
which case it falls to the poor soul who decides to help later and
remove the deprecated methods. Or perhaps it keeps someone from stepping
up and doing that - nature of the beast.

But as long as we are making such requests, please no one commit any
more funky source formatting either :) It hurts my eyes.

-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767463#action_12767463
 ] 

Robert Muir commented on LUCENE-1987:
-

bq. The problem is that this is not very different from saying the onus is on 
the user to call the setXYZ method to get back to the old buggy behavior, 
which at least last time we discussed back-compat was controversial (ie, it's a 
change to our drop-in back-compat policy).

Michael, yes I agree with you. What I am wondering is: is it really working in 
practice/in spirit? Forcing the user to supply the version, well it does make 
them look at the warning in the Version class, which is good.  But nothing 
stops them from just using CURRENT.

{noformat}
Use this to get the latest  greatest settings, bug fixes, etc, for Lucene.
{noformat}

followed by the big bold warning about backwards compatibility. just curious 
what most users are doing, sacrificing drop-in for latest and greatest?

I do think we should do things to improve contrib analyzers that are still 
stuck with this buggy behavior at some point: i.e LUCENE-1373.
But maybe we don't need the Version with contrib analyzers, since you should be 
able to use an older lucene-analyzers jar file with new lucene if you want the 
back compat

(sorry to stray somewhat off-topic)


 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Yonik Seeley (JIRA)
ArrayIndexOutOfBoundsException during indexing
--

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
 Fix For: 2.9.1


http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1929) Highlighter doesn't support NumericRangeQuery or deprecated RangeQuery

2009-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved LUCENE-1929.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [New])

 Highlighter doesn't support NumericRangeQuery or deprecated RangeQuery
 --

 Key: LUCENE-1929
 URL: https://issues.apache.org/jira/browse/LUCENE-1929
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/highlighter
Affects Versions: 2.9
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 2.9.1

 Attachments: LUCENE-1929.patch


 Sucks. Will throw a NullPointer exception. 
 Only NumericRangeQuery will throw the exception.
 RangeQuery just won't highlight.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-1996) EnwikiContentSource isn't thread safe

2009-10-19 Thread Michael McCandless (JIRA)
EnwikiContentSource isn't thread safe
-

 Key: LUCENE-1996
 URL: https://issues.apache.org/jira/browse/LUCENE-1996
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.1


When I run this alg:
{code}
analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer

content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
doc.tokenized = false
ram.flush.mb=32.0


doc.stored = false
doc.term.vector = false
log.step.AddDoc=1

directory=FSDirectory
autocommit=false
compound=false

work.dir=/lucene/work.wiki.nd0.02M

{ BuildIndex
  - CreateIndex
  [ { AddDocs AddDoc  : 1 } : 2
  - CloseIndex
}

RepSumByPrefRound BuildIndex
{code}

I hit exceptions in each thread like this:

{code}
Exception in thread Thread-2 java.lang.RuntimeException: 
org.xml.sax.SAXParseException: Open quote is expected for attribute msxi 
associated with an  element type  mdiiki.
at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
at java.lang.Thread.run(Thread.java:613)
Caused by: org.xml.sax.SAXParseException: Open quote is expected for attribute 
msxi associated with an  element type  mdiiki.
at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
at 
com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
at 
com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
at 
org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
... 1 more
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767467#action_12767467
 ] 

Yonik Seeley commented on LUCENE-1995:
--

The point at the exception uses a signed shift instead of unsigned, but that 
shouldn't matter unless the buffer pool is huge?
Aaron, what are your index settings (like ramBufferSizeMB?)


 ArrayIndexOutOfBoundsException during indexing
 --

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
 Fix For: 2.9.1


 http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1955) Fix Hits deprecation notice

2009-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved LUCENE-1955.
-

Resolution: Fixed

 Fix Hits deprecation notice
 ---

 Key: LUCENE-1955
 URL: https://issues.apache.org/jira/browse/LUCENE-1955
 Project: Lucene - Java
  Issue Type: Bug
  Components: Javadocs
Reporter: Mark Miller
Assignee: Mark Miller
Priority: Minor
 Fix For: 2.9.1


 Just needs to be committed to 2.9 branch since hits is now removed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1996) EnwikiContentSource isn't thread safe

2009-10-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767474#action_12767474
 ] 

Mark Miller commented on LUCENE-1996:
-

dupe? LUCENE-1994

 EnwikiContentSource isn't thread safe
 -

 Key: LUCENE-1996
 URL: https://issues.apache.org/jira/browse/LUCENE-1996
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.1


 When I run this alg:
 {code}
 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
 content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
 docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
 doc.tokenized = false
 ram.flush.mb=32.0
 doc.stored = false
 doc.term.vector = false
 log.step.AddDoc=1
 directory=FSDirectory
 autocommit=false
 compound=false
 work.dir=/lucene/work.wiki.nd0.02M
 { BuildIndex
   - CreateIndex
   [ { AddDocs AddDoc  : 1 } : 2
   - CloseIndex
 }
 RepSumByPrefRound BuildIndex
 {code}
 I hit exceptions in each thread like this:
 {code}
 Exception in thread Thread-2 java.lang.RuntimeException: 
 org.xml.sax.SAXParseException: Open quote is expected for attribute msxi 
 associated with an  element type  mdiiki.
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
   at java.lang.Thread.run(Thread.java:613)
 Caused by: org.xml.sax.SAXParseException: Open quote is expected for 
 attribute msxi associated with an  element type  mdiiki.
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
   at 
 com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
   at 
 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
   ... 1 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



RE: [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler
 Uwe Schindler (JIRA) wrote:
 
  And please: next time when we deprecate APIs: remove all deprecated
 calls from tests and contrib and mark all deprecated-test as such!
 
 
 Its the nature of open source. Each of us takes the work that other
 contributors are willing/able/havetime to provide - and fill in the rest
 ourselves or decide its too much work and don't. I agree that its a nice
 idea, but I don't think the issue is going away so easily myself ;) In
 which case it falls to the poor soul who decides to help later and
 remove the deprecated methods. Or perhaps it keeps someone from stepping
 up and doing that - nature of the beast.

Sorry, I was disappointed and somehow angry because nothing worked as
expected when I removed the deprecated parts. I fixed one thing and 5 other
problems appeared.

 But as long as we are making such requests, please no one commit any
 more funky source formatting either :) It hurts my eyes.

What was funky?

I think I should stop working today and do something other...

Uwe


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2009-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned LUCENE-1486:
---

Assignee: (was: Mark Miller)

 Wildcards, ORs etc inside Phrase queries
 

 Key: LUCENE-1486
 URL: https://issues.apache.org/jira/browse/LUCENE-1486
 Project: Lucene - Java
  Issue Type: Improvement
  Components: QueryParser
Affects Versions: 2.4
Reporter: Mark Harwood
Priority: Minor
 Fix For: 3.0, 3.1

 Attachments: ComplexPhraseQueryParser.java, 
 junit_complex_phrase_qp_07_21_2009.patch, 
 junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
 field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
 LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java


 An extension to the default QueryParser that overrides the parsing of 
 PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
 The implementation feels a little hacky - this is arguably better handled in 
 QueryParser itself. This works as a proof of concept  for much of the query 
 parser syntax. Examples from the Junit test include:
   checkMatches(\j*   smyth~\, 1,2); //wildcards and fuzzies 
 are OK in phrases
   checkMatches(\(jo* -john)  smith\, 2); // boolean logic 
 works
   checkMatches(\jo*  smith\~2, 1,2,3); // position logic 
 works.
   
   checkBadQuery(\jo*  id:1 smith\); //mixing fields in a 
 phrase is bad
   checkBadQuery(\jo* \smith\ \); //phrases inside phrases 
 is bad
   checkBadQuery(\jo* [sma TO smZ]\ \); //range queries 
 inside phrases not supported
 Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Mark Miller
Uwe Schindler wrote:
 Uwe Schindler (JIRA) wrote:
 
 And please: next time when we deprecate APIs: remove all deprecated
   
 calls from tests and contrib and mark all deprecated-test as such!
 
   
 Its the nature of open source. Each of us takes the work that other
 contributors are willing/able/havetime to provide - and fill in the rest
 ourselves or decide its too much work and don't. I agree that its a nice
 idea, but I don't think the issue is going away so easily myself ;) In
 which case it falls to the poor soul who decides to help later and
 remove the deprecated methods. Or perhaps it keeps someone from stepping
 up and doing that - nature of the beast.
 

 Sorry, I was disappointed and somehow angry because nothing worked as
 expected when I removed the deprecated parts. I fixed one thing and 5 other
 problems appeared.
   
Ha - no reason to be sorry - I agree it would be nice - just saying good
luck getting everyone to fall in line in the future :)
   
 But as long as we are making such requests, please no one commit any
 more funky source formatting either :) It hurts my eyes.
 

 What was funky?

 I think I should stop working today and do something other...
   
Ha again :) I actually reworded that because the first time I wrote it I
thought it sounded like I was saying you did it - guess I failed :) I
was commenting in general, not about you - I don't think anything to bad
has gotten in in some time - but there is some old source code here and
there that really bugs me - totally unrelated to your comment - just
adding a wish of my own - no more ugly source code :) !
 Uwe


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org

   


-- 
- Mark

http://www.lucidimagination.com




-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-1996) EnwikiContentSource isn't thread safe

2009-10-19 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1996.


Resolution: Duplicate

Duh, yes, dup.  Must read email before opening issues ;)

 EnwikiContentSource isn't thread safe
 -

 Key: LUCENE-1996
 URL: https://issues.apache.org/jira/browse/LUCENE-1996
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.1


 When I run this alg:
 {code}
 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
 content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
 docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
 doc.tokenized = false
 ram.flush.mb=32.0
 doc.stored = false
 doc.term.vector = false
 log.step.AddDoc=1
 directory=FSDirectory
 autocommit=false
 compound=false
 work.dir=/lucene/work.wiki.nd0.02M
 { BuildIndex
   - CreateIndex
   [ { AddDocs AddDoc  : 1 } : 2
   - CloseIndex
 }
 RepSumByPrefRound BuildIndex
 {code}
 I hit exceptions in each thread like this:
 {code}
 Exception in thread Thread-2 java.lang.RuntimeException: 
 org.xml.sax.SAXParseException: Open quote is expected for attribute msxi 
 associated with an  element type  mdiiki.
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
   at java.lang.Thread.run(Thread.java:613)
 Caused by: org.xml.sax.SAXParseException: Open quote is expected for 
 attribute msxi associated with an  element type  mdiiki.
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
   at 
 com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
   at 
 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
   ... 1 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1996) EnwikiContentSource isn't thread safe

2009-10-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767487#action_12767487
 ] 

Mark Miller commented on LUCENE-1996:
-

The scary part is that its been around for some time and we both independently 
hit it today ... quantum mechanics in action I guess ... 

 EnwikiContentSource isn't thread safe
 -

 Key: LUCENE-1996
 URL: https://issues.apache.org/jira/browse/LUCENE-1996
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.1


 When I run this alg:
 {code}
 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
 content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
 docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
 doc.tokenized = false
 ram.flush.mb=32.0
 doc.stored = false
 doc.term.vector = false
 log.step.AddDoc=1
 directory=FSDirectory
 autocommit=false
 compound=false
 work.dir=/lucene/work.wiki.nd0.02M
 { BuildIndex
   - CreateIndex
   [ { AddDocs AddDoc  : 1 } : 2
   - CloseIndex
 }
 RepSumByPrefRound BuildIndex
 {code}
 I hit exceptions in each thread like this:
 {code}
 Exception in thread Thread-2 java.lang.RuntimeException: 
 org.xml.sax.SAXParseException: Open quote is expected for attribute msxi 
 associated with an  element type  mdiiki.
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
   at java.lang.Thread.run(Thread.java:613)
 Caused by: org.xml.sax.SAXParseException: Open quote is expected for 
 attribute msxi associated with an  element type  mdiiki.
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
   at 
 com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
   at 
 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
   ... 1 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless
On Mon, Oct 19, 2009 at 3:11 PM, Mark Miller markrmil...@gmail.com wrote:
 Uwe Schindler (JIRA) wrote:

 And please: next time when we deprecate APIs: remove all deprecated calls 
 from tests and contrib and mark all deprecated-test as such!


 Its the nature of open source. Each of us takes the work that other
 contributors are willing/able/havetime to provide - and fill in the rest
 ourselves or decide its too much work and don't. I agree that its a nice
 idea, but I don't think the issue is going away so easily myself ;) In
 which case it falls to the poor soul who decides to help later and
 remove the deprecated methods. Or perhaps it keeps someone from stepping
 up and doing that - nature of the beast.

I do agree this is the nature of the beast.

Also, thinking more about it... I think a good approach, for an issue
with a large number of deprecations, might be to open a separate issue
to fix the deprecations in contrib/test, and fix it after some delay.
This way we confirm that deprecated usage of the APIs is working, for
at least some time, before removing them all from the tests.

EG in LUCENE-1458 I waited until quite late to cutover usage to the flex API.

 But as long as we are making such requests, please no one commit any
 more funky source formatting either :) It hurts my eyes.

+1!

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1996) EnwikiContentSource isn't thread safe

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767492#action_12767492
 ] 

Michael McCandless commented on LUCENE-1996:


That IS really crazy.

 EnwikiContentSource isn't thread safe
 -

 Key: LUCENE-1996
 URL: https://issues.apache.org/jira/browse/LUCENE-1996
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Michael McCandless
Priority: Minor
 Fix For: 3.1


 When I run this alg:
 {code}
 analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
 content.source=org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource
 docs.file=/x/lucene/enwiki-20090724-pages-articles.xml.bz2
 doc.tokenized = false
 ram.flush.mb=32.0
 doc.stored = false
 doc.term.vector = false
 log.step.AddDoc=1
 directory=FSDirectory
 autocommit=false
 compound=false
 work.dir=/lucene/work.wiki.nd0.02M
 { BuildIndex
   - CreateIndex
   [ { AddDocs AddDoc  : 1 } : 2
   - CloseIndex
 }
 RepSumByPrefRound BuildIndex
 {code}
 I hit exceptions in each thread like this:
 {code}
 Exception in thread Thread-2 java.lang.RuntimeException: 
 org.xml.sax.SAXParseException: Open quote is expected for attribute msxi 
 associated with an  element type  mdiiki.
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:189)
   at java.lang.Thread.run(Thread.java:613)
 Caused by: org.xml.sax.SAXParseException: Open quote is expected for 
 attribute msxi associated with an  element type  mdiiki.
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
   at 
 com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1441)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(XMLScanner.java:802)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(XMLNSDocumentScannerImpl.java:578)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:222)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl$NSContentDispatcher.scanRootElementHook(XMLNSDocumentScannerImpl.java:779)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1794)
   at 
 com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
   at 
 com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
   at 
 com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
   at 
 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
   at 
 org.apache.lucene.benchmark.byTask.feeds.EnwikiContentSource$Parser.run(EnwikiContentSource.java:166)
   ... 1 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Yonik Seeley
On Mon, Oct 19, 2009 at 3:45 PM, Mark Miller markrmil...@gmail.com wrote:
 but there is some old source code here and
 there that really bugs me

Is it Doug's

  if (foo)
 bar()
  else {
baz();
  }

or is it my single line

  if (a==null) return 0;

;-)

One of my personal pet peeves is more indentation than necessary for
large blocks of code, rather than just immediately handling the
exception cases and escaping. Example:

void doSomething(MyObj obj) {
  if (obj != null) {// at this point, I'm wondering... hmmm, is
there code that executes *after* this huge if in the event that obj
is null?
  [...]
  // same with this one... ya gotta go and try to match up braces
to see if there is code that executes in the opposite case...
  // and if it also falls through to execute the obj==null case or
simply returns.
  if (some other condition) {
  [ tons of code ]
  [ tons of code ]
  }
  }

A much more readable version (regardless of if one likes the
single-line syntax or not):

void doSomething(MyObj obj) {
  if (obj==null) return;  // immediately obvious handling of the exception case
  [...]
  if (!some other condition) return;  // again, immediately obvious
how the exception case was handled

   [ tons of code ]
   [ tons of code ]
  }


-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: [jira] Updated: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Michael McCandless
On Mon, Oct 19, 2009 at 4:00 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Mon, Oct 19, 2009 at 3:45 PM, Mark Miller markrmil...@gmail.com wrote:
 but there is some old source code here and
 there that really bugs me

 Is it Doug's

  if (foo)
     bar()
  else {
    baz();
  }

 or is it my single line

  if (a==null) return 0;

 ;-)

Or my always doing this up until a while ago:

  if (foo)
something;

but then suddenly [trying to] switch to the correct:

  if (foo) {
something;
  }

?

 One of my personal pet peeves is more indentation than necessary for
 large blocks of code, rather than just immediately handling the
 exception cases and escaping. Example:

Hmm I think I tend to do this :)

But I agree, your way IS more readable so I'll try to switch!

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: LUCENE-1257-FieldCacheImpl.patch

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: LUCENE-1257-FieldValueHitQueue.patch

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldValueHitQueue.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-WordListLoader.patch, 
 LUCENE-1257_analysis.patch, LUCENE-1257_BooleanFilter_Generics.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: LUCENE-1257-TopDocsCollector.patch

* FieldValueHitQueue
* TopDocsCollector
* TopScoreDocsCollector
* TopFieldHitsCollector


 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: (was: LUCENE-1257-FieldValueHitQueue.patch)

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-NormalizeCharMap.patch, LUCENE-1257-o.a.l.util.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767532#action_12767532
 ] 

Michael McCandless commented on LUCENE-1995:


Spooky!  It does look likely we overflowed int, because (1 + Integer.MAX_VALUE) 
 15 is -65536.

 ArrayIndexOutOfBoundsException during indexing
 --

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
 Fix For: 2.9.1


 http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Aaron McKee (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767541#action_12767541
 ] 

Aaron McKee commented on LUCENE-1995:
-

I make no claims to the reasonableness of these settings, I only recently began 
efforts to tune our prototype. =)

useCompoundFile: false
mergeFactor: 10
maxBufferedDocs: 500
ramBufferSizeMB: 8192 
maxFieldLength: 1
reopenReaders: true

My system has 24gb and my index is typically ~16gb, so I set some of these 
values a bit high. If the ram buffer is being indexed with an int, that could 
certainly be my issue; I feel a bit silly for not having thought of that, 
already.  I'll try setting it down to 2048 and see if the problem disappears.

 ArrayIndexOutOfBoundsException during indexing
 --

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
 Fix For: 2.9.1


 http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1987) Remove rest of analysis deprecations (Token, CharacterCache)

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767548#action_12767548
 ] 

Uwe Schindler commented on LUCENE-1987:
---

To move back to my other problem:
How to handle the problem with LUCENE_29 setting and the posIncr of stopwords 
together with QueryParser that has a default setting of ignoring posIncr?:

This leads to the problem, that a phrase query does not hit anything if you 
index with StandardAnalyzer=LUCENE_29 and QueryParser using the same analyzer 
but with setEnablePositionIncrements(false) [the current default for 
QueryParser].

 Remove rest of analysis deprecations (Token, CharacterCache)
 

 Key: LUCENE-1987
 URL: https://issues.apache.org/jira/browse/LUCENE-1987
 Project: Lucene - Java
  Issue Type: Task
  Components: Analysis
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 2.9.1, 3.0

 Attachments: LUCENE-1987-StopFilter-backport29.patch, 
 LUCENE-1987-StopFilter-BW.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987-StopFilter.patch, 
 LUCENE-1987-StopFilter.patch, LUCENE-1987.patch, LUCENE-1987.patch, 
 LUCENE-1987.patch


 These removes the rest of the deprecations in the analysis package:
 - -Token's termText field-- (DONE)
 - -eventually un-deprecate ctors of Token taking Strings (they are still 
 useful) - if yes remove deprec in 2.9.1- (DONE)
 - -remove CharacterCache and use Character.valueOf() from Java5- (DONE)
 - Stopwords lists
 - Remove the backwards settings from analyzers (acronym, posIncr,...). They 
 are deprecated, but we still have the VERSION constants. Do not know, how to 
 proceed. Keep the settings alive for index compatibility? Or remove it 
 together with the version constants (which were undeprecated).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Parameter class and Java 5 Enums

2009-10-19 Thread DM Smith
Should the Parameter class be replaced with Java 5 enums? My only 
concern is backward compatibility. I noticed that Parameter is 
serializable. Is this used by Lucene? I wasn't able to see any place 
that depended on it. The only public method, Parameter.toString() 
results in the same value as a Java 5 Enum.


It seems that an advanced form of enums would be helpful, too. I'm 
seeing a lot of switch statements on their value:

e.g.
In AbstractField:
if (store == Field.Store.YES){
  this.isStored = true;
}
else if (store == Field.Store.NO){
  this.isStored = false;
}
else
  throw new IllegalArgumentException(unknown store parameter  + 
store);


if (index == Field.Index.NO) {
  this.isIndexed = false;
  this.isTokenized = false;
} else if (index == Field.Index.ANALYZED) {
  this.isIndexed = true;
  this.isTokenized = true;
} else if (index == Field.Index.NOT_ANALYZED) {
  this.isIndexed = true;
  this.isTokenized = false;
} else if (index == Field.Index.NOT_ANALYZED_NO_NORMS) {
  this.isIndexed = true;
  this.isTokenized = false;
  this.omitNorms = true;
} else if (index == Field.Index.ANALYZED_NO_NORMS) {
  this.isIndexed = true;
  this.isTokenized = true;
  this.omitNorms = true;
} else {
  throw new IllegalArgumentException(unknown index parameter  + 
index);

}

This could be reduced to:
this.stored = store.isStored();
this.isIndexed = index.isIndexed();
this.isTokenized = index.isTokenized();
this.omitNorms = index.omitNorms();

With the following:
public enum Store {
  YES   { public boolean isStored() { return true; } },
  NO{ public boolean isStored() { return false; } };

  // Determine whether this is stored or not
  abstract boolean isStored();
}

public enum Index {
ANALYZED {
   public boolean isIndexed() { return true; }
   public boolean isTokenized() { return true; }
   public boolean omitNorms() { return false; }
   ...
},
...

abstract boolean isIndexed();
abstract boolean isTokenized();
abstract boolean omitNorms();
...
}

What I like about this pattern is that it clearly documents what each 
member does. As it is it is spread around in the files.


One can add a picker method to these to serve as a factory. E.g. given 
indexed = true, tokenized = false, ... what is the appropriate value 
from the Index enum.




-- DM


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767556#action_12767556
 ] 

Yonik Seeley commented on LUCENE-1995:
--

lol - well, there we go.  Looks like perhaps a JavaDoc fix (and a comment in 
solrconfig.xml)?  The buffered size was never meant to be quite so large :-)

Mike - I think keeping the signed shift is the right thing to do... a zero-cost 
check against silent corruption.
But I'm not sure if 2048MiB is safe either... I'm not sure of one could 
overflow the number of buffers somehow as well (is every buffer except the last 
fully utilized?)


 ArrayIndexOutOfBoundsException during indexing
 --

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
 Fix For: 2.9.1


 http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-1995:
--

Assignee: Michael McCandless

 ArrayIndexOutOfBoundsException during indexing
 --

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
Assignee: Michael McCandless
 Fix For: 2.9.1


 http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1995) ArrayIndexOutOfBoundsException during indexing

2009-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767564#action_12767564
 ] 

Michael McCandless commented on LUCENE-1995:


That's a nice large RAM buffer :)

bq. Mike - I think keeping the signed shift is the right thing to do... a 
zero-cost check against silent corruption.

Ahh good point, OK we'll keep it as is.

bq. But I'm not sure if 2048MiB is safe either

2048 probably won't be safe, because a large doc just as the buffer is filling 
up could still overflow.  (Though, RAM is also used eg for norms, so you might 
squeak by).

I'll update the javadocs to note the limitation!

 ArrayIndexOutOfBoundsException during indexing
 --

 Key: LUCENE-1995
 URL: https://issues.apache.org/jira/browse/LUCENE-1995
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9
Reporter: Yonik Seeley
 Fix For: 2.9.1


 http://search.lucidimagination.com/search/document/f29fc52348ab9b63/arrayindexoutofboundsexception_during_indexing

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1257:
--

Attachment: LUCENE-1257-MTQWF.patch

better generification of MultiTermQueryWrapperFilter (no more casts in 
sub-classes).

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767603#action_12767603
 ] 

Uwe Schindler commented on LUCENE-1257:
---

Committed:
   LUCENE-1257-MTQWF.patch 2009-10-19 10:55 PM Uwe Schindler 5 kB 
   LUCENE-1257-TopDocsCollector.patch 2009-10-19 08:47 PM Kay Kay 8 kB 
   LUCENE-1257-FieldCacheImpl.patch 2009-10-19 08:23 PM Kay Kay 8 kB 

(with some modifications in FieldCacheImpl, where Class was not generified to 
Class?).

At revision: 826857

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767611#action_12767611
 ] 

Kay Kay commented on LUCENE-1257:
-

| I updated the parser generator task to use Java 1.5. If you want to generify 
the other parts of QueryParser, update the .jj file and regenerate the java 
files. I will do this tomorrow. Will go to bed now.

What's the version of javacc being used/suggested currently ( the latest 
release seems to be 5.0 ) .

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-1257:
--

Attachment: LUCENE-1257-FieldCacheRangeFilter.patch

FieldCacheRangeFilter generified + type safe accessor methods.

Committed revision: 826883

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1257) Port to Java5

2009-10-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12767617#action_12767617
 ] 

Uwe Schindler commented on LUCENE-1257:
---

bq. What's the version of javacc being used/suggested currently ( the latest 
release seems to be 5.0 ) .

*From BUILD.txt* (I suggest to use this version 4.1, e.g. 4.2 has a bug that 
corrupts the parser somehow):

Step 3) Install JavaCC

Building the Lucene distribution from the source does not require the JavaCC
parser generator, but if you wish to regenerate any of the pre-generated
parser pieces, you will need to install JavaCC. Version 4.1 is tested to
work correctly.

  http://javacc.dev.java.net

Follow the download links and download the zip file to a temporary
location on your file system.

After JavaCC is installed, create a build.properties file
(as in step 2), and add the line

  javacc.home=/javacc

where this points to the root directory of your javacc installation
(the directory that contains bin/lib/javacc.jar).


 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, lucene1257surround1.patch, 
 lucene1257surround1.patch, shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: LUCENE-1257_queryParser_jj.patch

QueryParser.jj patch separately for generics 

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
 lucene1257surround1.patch, lucene1257surround1.patch, 
 shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: LUCENE-1257_MultiFieldQueryParser.patch

MultiFieldQueryParser 

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_messages.patch, 
 LUCENE-1257_MultiFieldQueryParser.patch, LUCENE-1257_o.a.l.queryParser.patch, 
 LUCENE-1257_o.a.l.store.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_search.patch, 
 LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
 lucene1257surround1.patch, lucene1257surround1.patch, 
 shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1257) Port to Java5

2009-10-19 Thread Kay Kay (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kay Kay updated LUCENE-1257:


Attachment: LUCENE-1257_javacc_upgrade.patch

common-build.xml , build comments match those in build.txt 

 Port to Java5
 -

 Key: LUCENE-1257
 URL: https://issues.apache.org/jira/browse/LUCENE-1257
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Analysis, Examples, Index, Other, Query/Scoring, 
 QueryParser, Search, Store, Term Vectors
Affects Versions: 3.0
Reporter: Cédric Champeau
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.0

 Attachments: instantiated_fieldable.patch, 
 LUCENE-1257-BooleanQuery.patch, LUCENE-1257-BooleanScorer_2.patch, 
 LUCENE-1257-BufferedDeletes_DocumentsWriter.patch, 
 LUCENE-1257-CheckIndex.patch, LUCENE-1257-CloseableThreadLocal.patch, 
 LUCENE-1257-CompoundFileReaderWriter.patch, 
 LUCENE-1257-ConcurrentMergeScheduler.patch, 
 LUCENE-1257-DirectoryReader.patch, 
 LUCENE-1257-DisjunctionMaxQuery-more_type_safety.patch, 
 LUCENE-1257-DocFieldProcessorPerThread.patch, LUCENE-1257-Document.patch, 
 LUCENE-1257-FieldCacheImpl.patch, LUCENE-1257-FieldCacheRangeFilter.patch, 
 LUCENE-1257-IndexDeleter.patch, 
 LUCENE-1257-IndexDeletionPolicy_IndexFileDeleter.patch, LUCENE-1257-iw.patch, 
 LUCENE-1257-MTQWF.patch, LUCENE-1257-NormalizeCharMap.patch, 
 LUCENE-1257-o.a.l.util.patch, LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, 
 LUCENE-1257-org_apache_lucene_document.patch, LUCENE-1257-SegmentInfos.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-StringBuffer.patch, 
 LUCENE-1257-StringBuffer.patch, LUCENE-1257-TopDocsCollector.patch, 
 LUCENE-1257-WordListLoader.patch, LUCENE-1257_analysis.patch, 
 LUCENE-1257_BooleanFilter_Generics.patch, LUCENE-1257_javacc_upgrade.patch, 
 LUCENE-1257_messages.patch, LUCENE-1257_MultiFieldQueryParser.patch, 
 LUCENE-1257_o.a.l.queryParser.patch, LUCENE-1257_o.a.l.store.patch, 
 LUCENE-1257_o_a_l_index_test.patch, LUCENE-1257_o_a_l_index_test.patch, 
 LUCENE-1257_o_a_l_search.patch, LUCENE-1257_o_a_l_search_spans.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, 
 LUCENE-1257_org_apache_lucene_index.patch, LUCENE-1257_queryParser_jj.patch, 
 lucene1257surround1.patch, lucene1257surround1.patch, 
 shinglematrixfilter_generified.patch


 For my needs I've updated Lucene so that it uses Java 5 constructs. I know 
 Java 5 migration had been planned for 2.1 someday in the past, but don't know 
 when it is planned now. This patch against the trunk includes :
 - most obvious generics usage (there are tons of usages of sets, ... Those 
 which are commonly used have been generified)
 - PriorityQueue generification
 - replacement of indexed for loops with for each constructs
 - removal of unnececessary unboxing
 The code is to my opinion much more readable with those features (you 
 actually *know* what is stored in collections reading the code, without the 
 need to lookup for field definitions everytime) and it simplifies many 
 algorithms.
 Note that this patch also includes an interface for the Query class. This has 
 been done for my company's needs for building custom Query classes which add 
 some behaviour to the base Lucene queries. It prevents multiple unnnecessary 
 casts. I know this introduction is not wanted by the team, but it really 
 makes our developments easier to maintain. If you don't want to use this, 
 replace all /Queriable/ calls with standard /Query/.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: lucene 2.9 sorting algorithm

2009-10-19 Thread John Wang
Hi Michael:
 Was wondering if you got a chance to take a look at this.

 Since deprecated APIs are being removed in 3.0, I was wondering if/when
we would decide on keeping the ScoreDocComparator API and thus would be kept
for Lucene 3.0.

Thanks

-John

On Fri, Oct 16, 2009 at 9:53 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 Oh, no problem...

 Mike

 On Fri, Oct 16, 2009 at 12:33 PM, John Wang john.w...@gmail.com wrote:
  Mike, just a clarification on my first perf report email.
  The first section, numHits is incorrectly labeled, it should be 20
 instead
  of 50. Sorry about the possible confusion.
  Thanks
  -John
 
  On Fri, Oct 16, 2009 at 3:21 AM, Michael McCandless
  luc...@mikemccandless.com wrote:
 
  Thanks John; I'll have a look.
 
  Mike
 
  On Fri, Oct 16, 2009 at 12:57 AM, John Wang john.w...@gmail.com
 wrote:
   Hi Michael:
   I added classes: ScoreDocComparatorQueue
 and OneSortNoScoreCollector
   as
   a more general case. I think keeping the old api for
 ScoreDocComparator
   and
   SortComparatorSource would work.
 Please take a look.
   Thanks
   -John
  
   On Thu, Oct 15, 2009 at 6:52 PM, John Wang john.w...@gmail.com
 wrote:
  
   Hi Michael:
It is open,
 http://code.google.com/p/lucene-book/source/checkout
I think I sent the https url instead, sorry.
   The multi PQ sorting is fairly self-contained, I have 2 versions,
 1
   for string and 1 for int, each are Collector impls.
I shouldn't say the Multi Q is faster on int sort, it is within
   the
   error boundary. The diff is very very small, I would stay they are
 more
   equal.
If you think it is a good thing to go this way, (if not for the
   perf,
   just for the simpler api) I'd be happy to work on a patch.
   Thanks
   -John
   On Thu, Oct 15, 2009 at 5:18 PM, Michael McCandless
   luc...@mikemccandless.com wrote:
  
   John, looks like this requires login -- any plans to open that up,
 or,
   post the code on an issue?
  
   How self-contained is your Multi PQ sorting?  EG is it a standalone
   Collector impl that I can test?
  
   Mike
  
   On Thu, Oct 15, 2009 at 6:33 PM, John Wang john.w...@gmail.com
   wrote:
BTW, we are have a little sandbox for these experiments. And all
 my
testcode
are at. They are not very polished.
   
https://lucene-book.googlecode.com/svn/trunk
   
-John
   
On Thu, Oct 15, 2009 at 3:29 PM, John Wang john.w...@gmail.com
wrote:
   
Numbers Mike requested for Int types:
   
only the time/cputime are posted, others are all the same since
 the
algorithm is the same.
   
Lucene 2.9:
numhits: 10
time: 14619495
cpu: 146126
   
numhits: 20
time: 14550568
cpu: 163242
   
numhits: 100
time: 16467647
cpu: 178379
   
   
my test:
numHits: 10
time: 14101094
cpu: 144715
   
numHits: 20
time: 14804821
cpu: 151305
   
numHits: 100
time: 15372157
cpu time: 158842
   
Conclusions:
The are very similar, the differences are all within error
 bounds,
especially with lower PQ sizes, which second sort alg again
slightly
faster.
   
Hope this helps.
   
-John
   
   
On Thu, Oct 15, 2009 at 3:04 PM, Yonik Seeley
yo...@lucidimagination.com
wrote:
   
On Thu, Oct 15, 2009 at 5:33 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Though it'd be odd if the switch to searching by segment
 really was most of the gains here.
   
I had assumed that much of the improvement was due to ditching
MultiTermEnum/MultiTermDocs.
Note that LUCENE-1483 was before LUCENE-1596... but that only
helps
with queries that use a TermEnum (range, prefix, etc).
   
-Yonik
http://www.lucidimagination.com
   
   
   
 -
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:
 java-dev-h...@lucene.apache.org
   
   
   
   
  
  
 -
   To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
   For additional commands, e-mail: java-dev-h...@lucene.apache.org
  
  
  
  
 
  -
  To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
  For additional commands, e-mail: java-dev-h...@lucene.apache.org
 
 
 

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org